Neuropeptidergic Signaling in the American Lobster Homarus americanus: New Insights from High-Throughput Nucleotide Sequencing

Peptides are the largest and most diverse class of molecules used for neurochemical communication, playing key roles in the control of essentially all aspects of physiology and behavior. The American lobster, Homarus americanus, is a crustacean of commercial and biomedical importance; lobster growth and reproduction are under neuropeptidergic control, and portions of the lobster nervous system serve as models for understanding the general principles underlying rhythmic motor behavior (including peptidergic neuromodulation). While a number of neuropeptides have been identified from H. americanus, and the effects of some have been investigated at the cellular/systems levels, little is currently known about the molecular components of neuropeptidergic signaling in the lobster. Here, a H. americanus neural transcriptome was generated and mined for sequences encoding putative peptide precursors and receptors; 35 precursor- and 41 receptor-encoding transcripts were identified. We predicted 194 distinct neuropeptides from the deduced precursor proteins, including members of the adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin C, bursicon, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone (CHH), CHH precursor-related peptide, diuretic hormone 31, diuretic hormone 44, eclosion hormone, FLRFamide, GSEFLamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, proctolin, pyrokinin, SIFamide, sulfakinin and tachykinin-related peptide families. While some of the predicted peptides are known H. americanus isoforms, most are novel identifications, more than doubling the extant lobster neuropeptidome. The deduced receptor proteins are the first descriptions of H. americanus neuropeptide receptors, and include ones for most of the peptide groups mentioned earlier, as well as those for ecdysis-triggering hormone, red pigment concentrating hormone and short neuropeptide F. Multiple receptors were identified for most peptide families. These data represent the most complete description of the molecular underpinnings of peptidergic signaling in H. americanus, and will serve as a foundation for future gene-based studies of neuropeptidergic control in the lobster.


Introduction
Due to its culinary appeal, the American lobster, Homarus americanus, is arguably one of the world's most iconic crustaceans. In addition to being a mainstay for the economies of New England and Atlantic Canada, this decapod is a premiere model species for studies directed at understanding the general principles governing the generation, maintenance and modulation of rhythmically active behaviors, which include walking, chewing and breathing in humans. Specifically, the numerically simple neural circuits that drive the movements of the foregut musculature (the stomatogastric neural circuit) and the neurogenic heart (the cardiac circuit) have been used since the 1960s to investigate rhythmic pattern generators [1][2][3][4][5][6][7][8]. One of the most significant findings that has come from work conducted on the stomatogastric and cardiac systems is that an almost infinite number of distinct motor patterns can be generated from a "simple," hard-wired neural circuit via the actions of neuromodulators, the largest single class of which is peptides [1][2][3][4][5][6][7][8][9][10].
While no peptide receptors have been characterized from H. americanus, a number of neuropeptides have been identified from this species [10]. Early peptide discovery in the lobster relied on the one-by-one biochemical isolation/purification and subsequent sequencing of these molecules [11,12]. Later, molecular cloning was employed for peptide identification in this species [13]. However, the vast majority of the known H. americanus neuropeptides have been identified via mass spectrometry using accurate mass matching and/or de novo tandem mass spectrometric sequencing [14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. The power of the mass spectral approach for peptide discovery in the lobster is exemplified by a study in which 84 peptides, 57 of which were novel, were identified from neural tissues collected from H. americanus [23].
Recently, a new methodology, in silico genome/transcriptome mining, has been used for the discovery and characterization of peptides and peptide receptors in crustaceans [29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46]. Using this approach, large numbers of peptides can rapidly be predicted for a species. For example, 176 peptides were recently deduced from a crayfish, Procambarus clarkii, transcriptome [36]. Similarly, peptide receptors can be identified and characterized using in silico genome/transcriptome mining, e.g., the 18 peptide receptors recently predicted from the transcriptome of the copepod, Calanus finmarchicus [40]. Although the peptidergic signaling systems of H. americanus have not previously been investigated using large-scale in silico transcriptome mining, the results obtained using this methodology on other crustaceans suggest that a wealth of information could be obtained if such a study were conducted. The resulting data would provide a strong foundation for gene-based studies of peptidergic neuromodulation at the molecular level, for understanding lobster physiology more broadly, and possibly for advancing aquaculture efforts for H. americanus, for example, by increasing our understanding of the control of growth and reproduction by peptide hormones.
In the study presented here, a transcriptome de novo assembled from sequences derived from H. americanus neural tissues was mined for putative neuropeptide precursor-and receptor-encoding transcripts using several well-vetted in silico workflows [29][30][31][32][33][34][35][36][37][38][40][41][42][43][47][48][49][50][51][52][53][54][55]. Thirty-five putative neuropeptide precursor-encoding transcripts were identified, enabling the prediction of 194 distinct mature peptide structures. Transcripts encoding putative receptors allowed for the prediction and characterization of 41 distinct neuropeptide receptor proteins. While some of the predicted peptides are known H. americanus isoforms, most are new discoveries, at least for the lobster. The receptors identified here are the first peptide receptors described from H. americanus and are among just a handful currently known from any crustacean. Taken collectively, these data represent the most complete description of the molecular components of the peptidergic signaling systems of H. americanus, and provide the first, and thus far only, large-scale resource for initiating gene-based studies of neuropeptidergic control in the American lobster.

Development of a Homarus americanus neural transcriptome
Despite the commercial and biomedical importance of the American lobster, few genomic/ transcriptomic resources have been generated for this species. In fact, prior to the present study, these data were limited to a modest collection of expressed sequence tags [56,57]. To help fill the void in the extant H. americanus molecular resources, a transcriptome was generated from RNA obtained from multiple neural tissues, which included brain, ventral nerve cord, cardiac ganglion and stomatogastric nervous system. Sequencing of this transcriptome was done using the Illumina HiSeq platform, with 452,237,240 raw reads generated from the collective set of neural libraries. De novo assembly using a CLC Genomics Server 5.0.1 (CLC Bio) produced 60,273 distinct contigs with an average length of 1,656 bp. This Transcriptome Shotgun Assembly (TSA) project has been deposited at DDBJ/EMBL/GenBank under the Accession No. GEBG00000000 (BioProject No. PRJNA300643; BioSample No. SAMN 04230440). The version described in this paper is the first version, GEBG01000000; it is by far the largest single collection of nucleotide sequence data for H. americanus, and being derived solely from neural tissues, it provides the first significant resource for identifying nervous system transcripts of interest in this species, including those encoding neuropeptide precursors and receptors.
Strategy and rational for the discovery of Homarus transcript involved in peptidergic signaling Many peptidergic systems are highly conserved within the Arthropoda, with isoforms of most peptide families having been identified from one or more species in three of the four subphyla (i.e., the Crustacea, Hexapoda, Chelicerata and Myriapoda) that comprise this phylum; peptide sequence data for members of the Myriapoda are extremely limited, with only one study of myriapod peptides having been completed [51]. Even with limited data from the Myriapoda, current evidence suggests that members of the adipokinetic hormone (AKH)/red pigment concentrating hormone (RPCH), adipokinetic hormone-corazonin-like peptide (ACP), allatostatin A (AST-A), allatostatin C (AST-C), allatotropin, CCHamide, crustacean cardioactive peptide (CCAP), GSEFLamide, insulin-like peptide (ILP), intocin, proctolin, pyrokinin, short neuropeptide F (sNPF), SIFamide and sulfakinin families, or closely related peptides, are present in at least some members of all four arthropod subphyla [34][35][36]47,51,52,58].
Given the paucity of neuropeptide receptor sequences known for members of the Crustacea, a slightly different strategy was used for identifying transcripts encoding these types of proteins. For these BLAST searches, insect proteins [72][73][74][75][76][77], primarily those from the fruit fly Drosophila melanogaster [72], were used as the tblastn input queries; for one receptor, the allatostatin B (AST-B) receptor, a crustacean protein [65] was used as the tblastn query. The use of Drosophila proteins as query sequences allowed for confirmation of our identifications of the deduced H. americanus proteins (and hence transcripts) via reciprocal blastp (search of a protein database using a protein query) comparison to the annotated D. melanogaster proteins curated in FlyBase [78], one of the largest, most complete, and most thoroughly characterized single-species arthropod protein databases extant. The receptor identifications were further vetted by protein BLASTs against the non-redundant arthropod protein dataset curated in GenBank, which allowed for broad species comparisons, and by structural motif analysis using the online program InterPro [79][80][81]. Again, this general strategy is one that has proven highly successful for identifying transcripts encoding a variety of large proteins, including neuropeptide receptors [40], from other crustaceans [82][83][84][85].
Identification of peptide-precursor encoding transcripts and prediction of putative mature peptide structures Nearly 40 distinct peptide families are theorized to be present in crustaceans (for a review of most crustacean peptide groups see Christie et al. [10]). Many of these families are broadly conserved across the various taxa that comprise this subphylum. For example, isoforms of AST-A, AST-C, DH31, eclosion hormone (EH), neuropeptide F (NPF), proctolin, SIFamide, sulfakinin and TRP have been identified from one or more members of the classes Remipedia [29], Branchiopoda [39,41,67], Maxillopoda [30,31,40] and Malacostraca [33,36]. However, for at least some peptide groups, there appears to be much more limited phylogenetic conservation, e.g., DENamides have thus far been identified only from the cladoceran Daphnia pulex [67], with members of the DXXRLamide and FXGGXamide families currently known only from copepods [30,32,35].
Using known lobster pre/preprohormone sequences, as well as precursor protein sequences from the crayfish P. clarkii, and in a few cases sequences from hexapods, as templates, 35 putative peptide-encoding transcripts were identified within the H. americanus transcriptome (Table 1 and S1 Fig). These transcripts included ones for ACP, AST-A, AST-C, bursicon α, bursicon β, CCHamide, corazonin, CCAP, CHH, DH31, diuretic hormone 44 (DH44), EH, FLRFamide, GSEFLamide, ILP, intocin, leucokinin, myosuppressin, neuroparsin, NPF, orcokinin, pigment dispersing hormone (PDH), proctolin, pyrokinin, SIFamide, sulfakinin and TRP.  [30,31,35,36,39,40], genes encoding at least allatotropin, ETH and RYamide are also predicted to exist in this species. Given their apparent limited conservation within the Crustacea [30,32,35,67], H. americanus may well not possess genes for DENamide, DXXRLamide or FXGGXamide. The identification of the transcripts just described allowed for the prediction of a new neuropeptidome for H. americanus. Here, each transcript was translated (Fig 1 and S2 Fig), and the deduced protein was subjected to a well-vetted peptide prediction workflow (Fig 1). First, each deduced pre/preprohormone was assessed for its completeness, i.e., was it a full-length protein, an amino (N)-terminal partial protein, a carboxyl (C)-terminal partial protein or an internal fragment of a protein. Full-length proteins were defined by having stop codons bracketing the open reading frame (ORF) in their transcript and possessing a functional signal peptide; putative full-length proteins did not have a stop codon located before the theorized "start" methionine, but a signal sequence was predicted starting with this residue. N-terminal partial proteins possessed no stop codon at the end of their ORFs, while C-terminal partial proteins lacked a stop codon before the ORF and did not display a "start" methionine (that produced a signal peptide). Internal protein fragments possessed no stop codon prior to their ORF and there was no evident start methionine; additionally, they lacked a stop codon at the end of their transcript's putative coding sequence. The completeness of all deduced proteins, as well as their lengths and the lengths of the transcripts that encode them, is provided in Table 1.
Regardless of completeness, prediction of the structures (amino acid sequence and predicted post-translational modifications) of the mature peptides likely liberated from each precursor protein was accomplished via a bioinformatics workflow that used both freeware programs and homology to known arthropod pre/preprohormone processing schemes. Specifically, this workflow involved signal peptide prediction, prohormone convertase cleavage site identification, and prediction of post-translational modifications (i.e., C-terminal amidation, N-terminal cyclization of glutamine or glutamic acid to pyroglutamic acid, sulfation of tyrosine residues, and disulfide bond formation between cysteines). Fig 1 shows the predicted processing scheme for two of the H. americanus preprohormones (prepro-ACP and prepro-FLRFamide). Using this workflow, the structures of 194 distinct H. americanus peptides were predicted; these neuropeptides include one ACP, 23 AST-As, two AST-Cs, one bursicon α, one bursicon β, two CCHamides, one corazonin, one CCAP, four CHHs (two full-length and two partial), three isoforms of CHH precursor-related peptides (CPRP), one DH31, one DH44, two EHs, nine FLRFamides, two GSEFLamides, two ILPs (one A-and one B-chain peptide), one intocin, 13 leucokinins (12 full-length and one partial), one myosuppressin, one neuroparsin, one NPF, three orcokinins, one PDH, one proctolin, seven pyrokinins, one SIFamide (a partial peptide), two sulfakinins and one TRP (Table 2), as well as a large number of linker/precursor-related peptides whose structures do not place them into a formally recognized peptide family (S1 Table).

Expansion of the neuropeptidome of Homarus americanus and identification of new peptide families in the American lobster
The 194 peptides identified here include both known H. americanus neuropeptides and ones described here for the first time (Table 2 and S1 Table). For example, four of the 23 predicted isoforms of AST-A (i.e., VGPYAFGLamide, AGPYAFGLamide, SGPYAFGLamide and SGPYSFGLamide) are peptides previously discovered using mass spectrometry [23]. Similarly, six of the nine FLRFamides predicted here (GYSDRNYLRFamide, SGRNFLRFamide, DQNRNFLRFamide, GAHKNYLRFamide, GNRNFLRFamide, GDRNFLRFamide) were identified in earlier mass spectral analyses [23]. Among the new discoveries for H. americanus are peptides previously described from other species, but unknown from the lobster, for example AMGSEFLamide and AVGSEFLamide, which were previously identified from several other decapods [30,36], and novel isoforms from known lobster peptide families [23], for example, the suite of pyrokinins described here. Of particular note were our identifications of isoforms processing scheme for prepro-adipokinetic hormone-corazonin-like peptide (ACP). The structure of the mature ACP isoform is shown in red, with the structures of two mature linker/precursor-related peptides shown in blue. In this schematic, the presence of a pyroglutamic acid in the putative mature ACP isoform is indicated by "pQ". (B) Predicted processing scheme for prepro-FLRFamide. In this schematic, the structures of nine mature FLRFamide-like peptides are shown in red, with those of nine mature linker/precursor-related peptides shown in blue. Sulfated tyrosine residues in two of the linker/precursorrelated sequences are indicated by "Y (SO3H) ". The presence of a disulfide bond between the cysteine residues in another of the linker/precursor-related peptides is indicated by an inverted blue bracket.  In the peptide structures shown, "pQ/pE" represents an amino (N)-terminal pyroglutamic acid, "a" represents a carboxyl (C)-terminal amide group, "Y(SO3H)" represents a sulfated tyrosine residue, and "C" represents a cysteine residue involved in a disulfide bond. A "+" at the N-and/or C-terminus of a sequence indicates that it is a partial peptide.
The sulfation state of tyrosine residues and disulfide bridging between cysteine residues was predicted only for putative full-length peptides. Disulfide bonding patterns in peptides with more than one disulfide bridge: the of ACP, bursicon α, CCHamide, DH44, EH, GSEFLamide, ILP, intocin, leucokinin, neuroparsin and neuropeptide F, all peptides from families previously unknown in H. americanus, and in the case of DH44, from any decapod species. While the peptidome predicted here for H. americanus is by far the largest collection of neuropeptides described for this species in any single study, some peptides previously identified from the lobster were not rediscovered in our investigation. For example, prior mass spectral analyses identified isoforms of AST-B, RPCH and sNPF in H. americanus [23], but no transcripts encoding members of these families were detected within the transcriptome mined here, even though it included RNA from tissues in which these peptides have been detected via mass spectrometry. Moreover, even for families for which transcripts were identified, they did not, in all cases, contain all of the known isoforms of the family in question, e. g., a number of the AST-As identified via mass spectrometry [23], e.g., EPYAFGLamide, TPSYAFGLamide and SQYTFGLamide, were not present within the preprohormone we discovered. Why a subset of the known H. americanus peptides was not re-identified remains an open question. As stated earlier, the transcriptome we used clearly does not have 100% coverage, and thus it is likely that for at least AST-B, RPCH and sNPF, transcripts encoding members of these families exist, but are simply not included in the assembly mined here. Moreover, not all portions of the nervous system were included in the set used for mRNA collection (e.g., the eyestalk ganglia), and thus it is possible that some peptide groups were missed for this reason. Similarly, it is possible that in the lobster some "neuropeptides" are produced primarily by non-neuronal tissues, and thus not discovered here. For peptide families in which transcripts encoding full-length precursors were identified, but previously identified peptides were not found, two possibilities exist. First, it is possible that multiple transcripts encoding members of the peptide family are present in the lobster, but those containing the peptides in question are simply not in the transcriptome we mined. Alternatively, the missing isoforms may be individual-or population-specific variants [86], and lobsters possessing these alleles were not among those used for RNA isolation. Regardless, by combining the peptides identified in our study (both reidentified and novel) with those previously known, but not rediscovered here [11][12][13][14][16][17][18][19][20]22,23,[26][27][28]38,59,60,63,87,88], a peptidome of over 250 sequences can be produced for American lobster (Table 2 and S1  Table). This peptidome consists of members of 32 distinct peptide families (Table 2), as well as linker/precursor-related and other peptides whose structures do not place them into any of the generally recognized groups (S1 Table). To the best of our knowledge, this composite H. americanus peptidome is the largest thus far generated for any crustacean, and is one of, if not the largest currently known for any member of the Arthropoda.

Identification of neuropeptide-receptor encoding transcripts
While a number of neuropeptides had been identified from H. americanus prior to our study [23], nothing was known about the identity of any lobster peptide receptors. In fact, with the exception of those from C. finmarchicus [40], a copepod, little information was available concerning the identity and diversity of neuropeptide receptors in any crustacean. Using known arthropod receptors, primarily those from the fruit fly D. melanogaster, as tblastn input queries, the H. americanus neural transcriptome was screened for transcripts encoding putative homologous proteins (Table 3). Via these searches, 41 putative receptor-encoding transcripts were identified (S3 Fig), including ones showing homology to known ACP, AST-A, AST-C, bursicon, CCHamide, corazonin, CCAP, DH31, DH44, ETH, FLRFamide, ILP, leucokinin, myosuppressin, NPF, PDH, proctolin, pyrokinin, RPCH, sNPF, SIFamide, sulfakinin and TRP receptors. It should be noted that searches for DENamide, DXXRLamide, EH, FXGGXamide and GSEFLamide receptors were not conducted (Table 3), as, to the best of our knowledge, no receptor proteins for these peptide families are known. Translation of the identified transcripts allowed for the prediction of one ACP, one AST-A, three AST-C, two bursicon, two CCHamide, one corazonin, one CCAP, three DH31, two DH44, three ETH, one FLRFamide, two ILP, one leucokinin, one myosuppressin, four NPF, two PDH, two proctolin, one pyrokinin, two RPCH, one sNPF, one SIFamide, one sulfakinin and three TRP receptors (Table 3, Fig 2 and S4 Fig). These predicted proteins included both full-length and partial proteins (Table 3 and S4 Fig). Ligand attributions are based on reciprocal protein BLAST searches against the annotated D. melanogaster proteins in FlyBase (when a homolog was known to be present in the dataset) and/or the non-redundant arthropod proteins curated in GenBank. For example, when the putative H. americanus AST-A receptor (Fig  2A) was used to search FlyBase for the most similar protein, allatostatin A receptor 1, isoform D (FlyBase No. FBpp0305932; Accession No. AAG22404 [72]) was returned as the top hit, and when this lobster protein was used to search the arthropod proteins curated in GenBank, an allatostatin receptor from the cockroach Periplaneta americana (Accession No. AAK52473 [89]) was identified as the most similar sequence. Similarly, the top FlyBase hit for H. americanus DH44 receptor I (Fig 2B)

Structural analyses of deduced receptor proteins
For all full-length putative receptor proteins for which no uncalled amino acids were present (i. e., one AST-A, two AST-C, two CCHamide, one DH31, one DH44, two ETH, one FLRFamide, one ILP, one leucokinin, one myosuppressin, four NPF, two PDH, one proctolin, one sNPF, one SIFamide, one sulfakinin, and one TRP receptor), amino acid sequence analysis and protein family classification were conducted using the online program InterPro [79][80][81]. Here, the assumption was that each protein would possess a structure typical of those of known peptide receptors, e.g., seven membrane-spanning regions, and, in some cases, hormone receptor and/ or other functional domains.  InterPro analysis of the AST-A (Fig 2A), AST-C I (Fig 3A), AST-C II (Fig 3A), CCHamide  I, CCHamide II, ETH I, ETH II, FLRFamide, leucokinin, myosuppressin, NPF I, NPF II, NPF  III, NPF IV, proctolin I, sNPF, SIFamide, sulfakinin and TRP I receptors placed each of these proteins into the rhodopsin-like G protein-coupled receptor (GPCR) superfamily (InterPro ID No. IPR000276), with each protein predicted to possess a single rhodopsin-like GPCR seven transmembrane domain (InterPro ID No. IPR017452) (highlighted in black in Figs 2A  and 3A and in S4 Fig). Analyses of the DH31 I, DH44 I (Fig 2B), PDH I ( Fig 3B) and PDH II (Fig 3B) (Fig 2C and S4 Fig). However, analysis using this program did predict a number of functional regions within this receptor's sequence, including two receptor L-domains (Inter-Pro ID No. IPR000494) (highlighted in light blue and green in Fig 2C and S4 Fig), one furinlike cysteine-rich domain (InterPro ID No. IPR006211) (highlighted in green and yellow in Fig 2C and S4 Fig) Fig  2C and S4 Fig). Interestingly, in the type-1 insulin-like growth-factor receptor, the first three domains of this protein's extracellular portion consist of two L-domains and a single cysteine rich region, which are hypothesized to form a binding pocket for insulin [90]; a similar situation may be at play in the lobster ILP I receptor identified here.

Comparisons of sequences in putative receptors for a common ligand
Multiple receptors appear to exist in H. americanus for at least 12 peptide families (Table 3); for five of these peptide groups, i.e., the AST-C, CCHamide, ETH, NPF, and PDH, multiple full-length receptor sequences were deduced. To determine the degree of conservation present among the full-length receptors for a given ligand, the proteins of each of the relevant families were aligned and identity/similarity scores calculated. Fig 3 shows the alignments of the two full-length AST-C receptors, i.e., AST-CR I and AST-CR II (Fig 3A), and the two full-length PDH receptors, i.e., PDHR I and PDHR II (Fig 3B), using the online program MAFFT [91]. As can be seen from panel A of Fig 3, the two AST-C receptors are quite similar in their amino acid sequences, particularly in their seven transmembrane domain regions. Specifically, the two proteins are 65.4% identical/83.6% similar overall, with 83.9% identity/95.3% similarity over their membrane spanning regions. Similarly, the two PDH receptors exhibit considerable sequence conservation (Fig 3B), though less than seen between the AST-CRs, being 52.4% identical/76.4% similar over their full lengths, and 51.1%/77.2% and 60.8%/85.4% identical/ similar in their hormone receptor and membrane spanning domains, respectively. Alignments of the full-length members of the CCHamide, ETH and NPF families are shown in S5 Fig; the levels of sequence conservation seen in these parings are similar to those reported for the AST-C and PDH receptors.

Integration of molecular and physiological data
In our study, numerous H. americanus peptide precursor-and receptor-encoding transcripts were identified. The discovery of these sequences allowed the prediction of a large and diverse neuropeptidome, as well as a large collection of receptor proteins. With respect to the peptides discovered here, many possess structures that place them into well known families, including some for which multiple isoforms are present, e.g., the 23 AST-As, 13 leucokinins, nine FLRFamides and seven pyrokinins that were predicted. Similarly, for most peptide families, multiple receptors were discovered, e.g., the four distinct proteins that putatively have NPF as their ligand. At present, the functional consequences of the diversity seen here in neurochemical signaling systems of H. americanus remain unknown. However, prior physiological investigations do allow for speculation and, in a few cases, at least partial support for existing hypotheses.
First, it is well known that the H. americanus stomatogastric and cardiac neuromuscular systems, while numerically simple (just nine neurons in the case of the lobster cardiac ganglion [2]), are able to produce a large number of distinct behavioral outputs. Much of this functional flexibility has been attributed to the actions of neuromodulators, including peptides, on the neurons and muscles that make up these central pattern generator-effector systems [1][2][3][4][5][6][7][8][9][10]. Given the enormous complexity that can be derived from the complements of neuropeptides and receptors discovered in our study, which is likely incomplete, it is not at all surprising that these systems are capable of producing diverse outputs.
The actions of neuromodulators on the lobster nervous system are well known to be highly state-dependent, with different individuals often responding differently to the same neuroactive compound. For example, the modulatory actions of the peptide pQIRYHQCYFNPISCF (disulfide bridging between the two cysteine residues), a member of the AST-C family (and a peptide rediscovered here), on the cardiac neuromuscular system vary considerably among individuals [92]. Specifically, perfusion of this peptide through the semi-intact heart consistently decreased the frequency of ongoing heart contractions, but showed varied effects on contraction amplitude, decreasing it in some lobsters and increasing it in others. This differential response was found to be due to actions of the peptide on the cardiac neural circuit itself; peptide applied to the cardiac ganglion of hearts that responded with increased contraction force showed marked increases in both motor neuron burst duration and the number of spikes per burst, whereas peptide application to cardiac ganglia from hearts that showed a decrease in contraction amplitude resulted in only marginal increases in these parameters, suggesting that decreased contraction amplitude of the heart was the result of the non-linear neuromuscular transform [93]. At the mechanistic level, it is possible that differences in the response to AST-C are due to differences in the complement of AST-C receptors present in the cardiac ganglion in preparations that respond to AST-C with increases vs. decreases in contraction amplitude; this would require at least two distinct AST-C receptors. Consistent with this hypothesis, our data support the existence of at least three distinct AST-C receptors in the lobster, as transcripts encoding two full-length and one partial protein with sequence homology to known AST-C receptors were discovered.
In H. americanus, immunohistochemical mapping suggests that pyrokinins are present in the two major neuroendocrine systems (the X-organ-sinus gland complex and the pericardial organ) and in the neuropil of both the cardiac and stomatogastric ganglia [94,95]; this distribution suggests that they could serve as both hormonal and locally-released neuromodulators in the cardiac and stomatogastric neuromuscular systems. Prior to our study, the sole member of the pyrokinin family known from the lobster was FSPRLamide [23], an atypical peptide in that all other members of the pyrokinin family are N-terminally extended relative to the FXPRLamide consensus motif (where X represents a variable residue). Moreover, in nearly all crustaceans thus far investigated, multiple pyrokinin isoforms have been detected [23,30,34,36,40,42,43,[96][97][98]; H. americanus and the crab Callinectes sapidus are the sole exceptions [23,96], and in the latter species, the known isoform is N-terminally extended [96]. Since FSPRLamide (and a number of other N-terminally extended native pyrokinins and synthetic analogs possessing-FSPRLamide C-termini) had no effect on the lobster cardiac neuromuscular system, while ADFAFNPRLamide (a pyrokinin from the shrimp, Litopenaeus vannamei [98]), and to a lesser extent the synthetic analog SDFAFNPRLamide, increased both the frequency and amplitude of heart contractions, it was hypothesized that: 1) FSPRLamide is a truncation of an N-terminally extended peptide or peptides present in the lobster; and 2) that multiple isoforms of pyrokinin exist in the lobster, with at least one ending in-FSPRLamide and another ending in-FNPRLamide. The peptides predicted from the two pyrokinin precursors deduced here support the first hypothesis as all of the full-length isoforms are N-terminally extended relative to the FXPRLamide consensus motif (or a close approximation thereof). Similarly, seven full-length pyrokinins were predicted, at least partially supporting the second hypothesis. Interestingly, none of the predicted lobster peptides possessed a-FNPRLamide Cterminus. Thus, the prediction of an N-terminally extended pyrokinin possessing this ending was not confirmed. This said, both of the pyrokinin precursors predicted here are partial proteins and it is possible that the "missing"-FNPRLamide peptide (or peptides) is present in the portions of these preprohormones that were not identified here.
Comparison of the effects of pyrokinins on the H. americanus cardiac system (where only one native crustacean isoform was found to be bioactive [94]) with those seen in the lobster stomatogastric ganglion (where all of the native crustacean pyrokinins tested produced essentially identical physiological responses [95]) has resulted in the hypothesis that there are at least two pyrokinin receptors in this species, one highly isoform-specific and the other promiscuous in its pyrokinin specificity. Moreover, it is predicted that the promiscuous receptor is absent from the cardiac neuromuscular system. However, only a single pyrokinin receptor was discovered here. At this point, it is not clear whether this protein is a promiscuous or isoform-specific receptor. Given the identification of just one pyrokinin receptor, at least two possible explanations exist for the distinct physiological effects of pyrokinins seen in the stomatogastric and cardiac systems. First, there may be an additional pyrokinin receptor that was not identified here. Second, the same receptor may utilize different second messenger pathways when activated by different pyrokinin isoforms. Such a situation exists in the responses to tachykinins in the stable fly, Stomoxys calcitrans, were four different tachykinins were able to activate the same receptor (Stomoxys calcitrans tachykinin-related peptide receptor (STKR), but elicited different effects via distinct second messenger systems [99]. Clearly, additional experimentation will be required to determine the pathways that underlie the differential responses to pyrokinins in these two lobster ganglia.

Transcriptome sequencing and assembly
Tissue collection and RNA preparation. Adult lobsters, H. americanus, (N = 2) were obtained from The Fresh Lobster Company (Gloucester, Massachusetts, USA) and maintained in artificial seawater at 12°C until used. Lobsters were anesthetized by packing them in ice for 30 minutes before dissection. Following anesthetization, the brain (supraoesophageal ganglion), ventral nerve cord, cardiac ganglion and complete stomatogastric nervous system (which includes the paired commissural ganglia and the single oesophageal and stomatogastric ganglia) were dissected out of each individual and pinned out in a Sylgard (Dow Corning)coated dish containing chilled (12-13°C) physiological saline. Any adherent connective tissue and muscle was removed to the extent possible, and the tissues were rinsed several times in physiological saline (composition in mM/l: 479.12 NaCl, 12.74 KCl, 13.67 CaCl 2 , 20.00 MgSO 4 , 3.91 Na 2 SO 4 , 11.45 Trizma base, and 4.82 maleic acid [pH = 7.45]) made with ultrapure, RNAse free water. After dissection, tissues were combined and homogenized in Trizol (Invitrogen). Insoluble tissues were pelleted by centrifugation and the supernatant removed and stored at -80°C until RNA extraction. Total RNA was isolated as per the protocol provided by the manufacturer (Invitrogen), and subsequently treated with DNAse prior to library construction.
All animal work has been conducted according to relevant national and international guidelines. No IACUC review was needed, as this study used an invertebrate species.
Library production, sequencing and de novo transcriptome assembly. Library construction and RNA-sequencing were performed for a fee by GENEWIZ, Inc. (South Plainfield, New Jersey, USA). In brief, RNA samples were quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, California, USA) and the RNA integrity was checked with RNA6000 Nano Assay using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, California, USA). cDNA library preparation and sequencing reactions were conducted by Genewiz, Inc. Illumina TruSeq RNA library prep, clustering, and sequencing reagents were used throughout the process following the manufacturer's recommendations (Illumina, San Diego, California, USA). Specifically, mRNAs were purified using poly-T oligo-attached magnetic beads and then fragmented. The first and the second strand cDNAs were synthesized and end repaired. Adaptors were ligated after adenylation at the 3'ends. Then cDNA templates were enriched by PCR. cDNA libraries were validated using a High Sensitivity Chip on the Agilent 2100 Bioanalyzer. The cDNA library was quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, California, USA) and by qPCR. The samples were clustered on a flow cell using the cBOT. After clustering, the samples were loaded on the Illumina HiSeq 2000 instrument for sequencing with a 2x100 paired-end configuration.
Raw sequence data generated from Illumina HiSeq 2000 was converted into fastq files and de-multiplexed using Illumina CASSAVA 1.8.2 program. Fastq files from the sample were imported into CLC Genomics Workbench Server 5.0.1. Sequence reads were trimmed to remove bases with low quality at ends. De novo assembly was conducted with the trimmed reads utilizing the CLC Genomics Server; 60,273 unique transcripts were obtained. The average length of the transcripts was 1,657 bp and the N50 was 2,357 bp. The total length of the assembled transcripts was 99,847,148 bp. The assembled sequences were blasted against NCBI nucleotide database and annotated using the top BLAST hit. In addition, open reading frames were predicted for the assembled transcript sequences.
In silico transcriptome mining, peptide prediction and protein structural analyses Transcriptome mining. Searches of the H. americanus transcriptome described above were conducted on a TimeLogic DeCypher server using a protocol modified from several recent publications [40,[82][83][84]. Specifically, the lobster assembly was selected as the database to be searched using the DeCypher Tera-BLASTP algorithm, and a known neuropeptide precursor or receptor was input to the program as the protein query. The complete list of pre/preprohormones searched for, as well as the specific queries used, is provided in Table 1; the full list of receptors searched for (and the query proteins used) is provided in Table 3. All hits returned by a given Tera-BLASTP search were fully translated using the "Translate" tool of ExPASy (http://web.expasy.org/translate/) and then checked manually for homology to the query sequence. The BLAST-generated maximum score and E-value for each of the transcripts identified as encoding a putative neuropeptide precursor or receptor are also provided in Tables 1  and 3.
Analysis of receptor conservation and structure. To determine the proteins most similar to the neuropeptide receptors identified in this study, each protein was used to query the annotated D. melanogaster proteins dataset present in FlyBase (version FB2015_04 [78]), as well as those present in the non-redundant arthropod protein dataset (taxid:6656) curated in Gen-Bank, using the blastp algorithm [104].
To determine amino acid identity/similarity between proteins (and structural motifs [see below]), the sequences in question were aligned using MAFFT version 7 (http://align.bmr. kyushu-u.ak.jp/mafft/online/server/ [91]), and amino acid identity/similarity was subsequently determined using the alignment output. Specifically, percent identity was calculated as the number of identical amino acids (denoted by " Ã " in the MAFFT output) divided by the total number of residues in the longest sequence (x100). Amino acid similarity was calculated as the number of identical and similar amino acids (the latter denoted by the ":" and "." symbols in the protein alignment) divided by the total number of residues in the longest sequence (x100).