Novel Antimicrobial Peptides EeCentrocins 1, 2 and EeStrongylocin 2 from the Edible Sea Urchin Echinus esculentus Have 6-Br-Trp Post-Translational Modifications

The global problem of microbial resistance to antibiotics has resulted in an urgent need to develop new antimicrobial agents. Natural antimicrobial peptides are considered promising candidates for drug development. Echinoderms, which rely on innate immunity factors in the defence against harmful microorganisms, are sources of novel antimicrobial peptides. This study aimed to isolate and characterise antimicrobial peptides from the Edible sea urchin Echinus esculentus. Using bioassay-guided purification and cDNA cloning, three antimicrobial peptides were characterised from the haemocytes of the sea urchin; two heterodimeric peptides and a cysteine-rich peptide. The peptides were named EeCentrocin 1 and 2 and EeStrongylocin 2, respectively, due to their apparent homology to the published centrocins and strongylocins isolated from the green sea urchin Strongylocentrotus droebachiensis. The two centrocin-like peptides EeCentrocin 1 and 2 are intramolecularly connected via a disulphide bond to form a heterodimeric structure, containing a cationic heavy chain of 30 and 32 amino acids and a light chain of 13 amino acids. Additionally, the light chain of EeCentrocin 2 seems to be N-terminally blocked by a pyroglutamic acid residue. The heavy chains of EeCentrocins 1 and 2 were synthesised and shown to be responsible for the antimicrobial activity of the natural peptides. EeStrongylocin 2 contains 6 cysteines engaged in 3 disulphide bonds. A fourth peptide (Ee4635) was also discovered but not fully characterised. Using mass spectrometric and NMR analyses, EeCentrocins 1 and 2, EeStrongylocin 2 and Ee4635 were all shown to contain post-translationally brominated Trp residues in the 6 position of the indole ring.


Introduction
Increasing numbers of pathogenic bacterial strains are becoming resistant to antibiotics. More people in US hospitals now die from methicillin-resistant Staphylococcus aureus (MRSA) infections than HIV/AIDS and tuberculosis combined [1]. It is therefore a pressing need to find and develop antimicrobial agents as alternatives to classical antibiotics. Antimicrobial peptides (AMPs) are part of the immune system in both plants and animals and they are considered to constitute an evolutionarily ancient response to fight invading pathogenic microorganisms [2]. AMPs are evolutionary conserved and gene-encoded peptides, usually cationic, short  amino acid chains [3]. Most AMPs exhibit broad-spectrum activity towards both Gram-positive and Gram-negative bacteria. In contrast to commercial antibiotics where the development of resistance is a problem, bacterial resistance towards AMPs is much less pronounced [1,4]. Because of their propensity to be rapidly metabolised in the gastrointestinal tract, peptides have been considered poor drug candidates. This problem has diminished somewhat in recent years with the development of new synthetic strategies to improve bioavailability and reduce metabolism of peptides, and bolstered by the development of alternative routes of administration [5,6]. A large number of peptide-based drugs are now in clinical trials or being marketed, including AMPs [5,7]. Today, more than 2600 peptides have been registered in the Antimicrobial Peptide Database [8], mainly from terrestrial sources. Marine invertebrates, although less studied, have proven to be a promising source for discovering AMPs with novel scaffolds [9]. Echinoderms are exposed to relatively high bacterial levels because they are often found in the photic zone where conditions for microbial growth are optimal. The survival of these organisms relies on the production of efficient antimicrobial components to defend themselves against microbial infections and fouling. As invertebrates in general [10], echinoderms do not have an adaptive immune system like the one recognised in vertebrates where specific memory towards pathogens is developed. Their innate defence system is mediated by the coelomocytes and compounds like complement factors, lectins, lysozymes and AMPs [11][12][13].
A number of AMPs have previously been found in echinoderms [13][14][15][16][17][18][19][20]. Examples include lysozymes that catalyse the hydrolysis of the peptidoglycans of the bacterial cell wall on Grampositive bacteria and act as non-specific innate immunity molecules [17,18]. A 6 kDa AMP in the coelomic fluid of the orange-footed sea cucumber, Cucumaria frondosa, was discovered but no sequence was reported [20]. Several antibacterial peptides from the coelomic fluid of the starfish Asterias rubens with masses around 2 kDa have also been discovered [19,21]. Two of these peptides were identified as fragments of the histone H2A molecule, two peptides were identified as fragments of actin, and one was identified as a fragment of filamin A. A 5 kDa peptide having antistaphylococcal biofilm properties was discovered in the coelomocytes of the sea urchin Paracentrotus lividus [16]. The antibiofilm activity was suggested to be ascribed to beta-thymosin like fragments [22]. From the coelomocytes of the sea urchin Strongylocentrotus droebachiensis, two novel AMP families were characterised-the strongylocins [14] and the centrocins [23]. The strongylocins are cysteine rich peptides containing three disulphide bonds with MW in the 5.6-5.8 kDa range. Homologous genes have been discovered in S. purpuratus and their deduced peptide sequences named SpStrongylocins. Recombinantly produced SpStrongylocin analogues were also shown to be antibacterial [24]. The centrocins are a family of heterodimeric AMPs ranging between 4.4 and 4.5 kDa in mass. The peptides consist of two peptide chains: a 30 amino acid residue heavy chain (HC) and a 12 amino acid residue light chain (LC) connected by a single disulphide bond. Bioactivity studies have shown that the cationic HC is responsible for the antimicrobial activity of these peptides [23]. The HC of centrocin 1 displays potent activity against both bacteria and fungi and display antiinflammatory properties [25].
The Edible sea urchin, Echinus esculentus (Fig 1) has been reported to contain antimicrobial compounds [26], of which the quinone echinochrome-A has been identified [15,27]. No AMPs have yet been discovered. The aim of the present study was to search for, isolate and characterise AMPs from the coelomocytes of the sea urchin E. esculentus. In this paper, we present the discovery of new AMPs belonging to the centrocin and strongylocin family of AMPs.

Ethics statement
All experiments performed in the present study were conducted in accordance with national and international guidelines and the ethical guidelines of UiT The Arctic University of Norway. For the haemolytic assay the collection of blood from a healthy donor was approved by the regional committee for medical research (REK 2014/1653).
In Norway, collecting wild E. esculentus for research purposes does not require specific permits. Our study does not involve endangered or protected species and sea urchins are not subject to any ethical animal use restrictions.
Coelomic fluid was sampled from the animals 1-4 weeks after animal collection by penetrating the peristome with a scalpel and pouring the content into Ca 2+ /Mg 2+ free anti-coagulating buffer [28] containing 70 mM EDTA and 50 mM imidazole in a 2:1 v/v ratio in 50 ml Falcon tubes (BD Biosciences, CA, USA) on ice. In total, approximately 5500 ml coelomic fluid was obtained. The mixture was subsequently centrifuged for 20 min at 4°C and 800 g. The pellet (coelomocytes) was collected by pouring off the cell-free supernatant and kept at -70°C until lyophilisation on a VirTis Genesis 35 EL freeze dryer (SP Industries, PA, USA) for 24 h. A total amount of 12.2 g (dry weight) of coelomocytes was sampled. All sample weighing was performed on a Sartorius Cubis MSA scale, Sartorius AG, Gottingen, Germany.

Extraction and purification
Lyophilised coelomocytes were extracted according to a previous protocol [29] with one modification: liquid-liquid extraction was carried out twice with 5% (w/v) lyophilised coelomocytes in 60% acetonitrile (ACN) containing 0.1% trifluoroacetic acid (TFA) (both from Sigma-Aldrich, MO, USA) for 24 h at 4°C. The combined extracts were partitioned into an aqueous phase (approximate total of 100 ml) at the bottom and an ACN-rich phase (approximate total of 150 ml) at the top by leaving it in a -20°C freezer for approximately 1 h. Both phases were dried in a ScanSpeed 40 vacuum centrifuge (Labogene ApS, Denmark) for 24 h. The ACN-rich phase was reconstituted in MQ-H 2 O (Millipore MA, USA) to a concentration of 10 mg/ml and subjected to antibacterial activity testing.
The aqueous phase (5.49 g) was reconstituted to 10 mg/ml (549 ml) in 0.05% TFA/H 2 O (v/ v) and further subjected to solid phase extraction (SPE) on a reverse phase C 18 35cc Sep-Pak cartridge (Waters, MA, USA) according to [29] as follows. The extract was loaded onto the cartridge, previously conditioned with ACN, and equilibrated with 0.05% TFA/H 2 O (v/v). After washing of the loaded extract with 0.05% TFA/H 2 O (v/v), four stepwise elutions were performed with 10%, 40%, 80% and 100% ACN containing 0.05% TFA (v/v). The different fractions collected were dried under vacuum, reconstituted to 10 mg/ml and tested for antibacterial activity.
Due to its pronounced antibacterial activity (S1 Table), the 40% SPE eluate was further fractionated by reversed-phase high-performance liquid chromatography (RP-HPLC) using a preparative XBridge C 18 (5 μm, 19 × 250 mm) column, a 717 autosampler, 600E pump system, 2996 photodiode array detector and an in-line degasser (Waters, MA, USA), all controlled by the Millennium 32, v4.00 software (Scientific Equipment Source, Ontario, Canada). The flow rate was set to 8 ml/min with an optimised HPLC protocol containing 0.05% TFA/H 2 O (v/v) and 0.05% TFA/ACN (v/v). The protocol started with 10 min of 0.05% TFA/H 2 O (v/v) followed by linear gradients increasing the 0.05% TFA/ACN (v/v) concentration from 0% to 18% over 8 minutes, 18% to 32% over 32 minutes, 32% to 50% over four minutes, and finally washing with 0.05% TFA/ACN (v/v) for five min and re-equilibration with 0.05% TFA/H 2 O (v/v) for five min. One minute (8 ml) fractions were collected automatically with a Gilson FC 204 fraction collector (Gilson, WI, USA), dried in a ScanSpeed 40 vacuum centrifuge for 24 h, and reconstituted in 500 μl MQ-H 2 O before antibacterial activity testing. Active fractions were analysed for purity, and bioactive peptides were detected using liquid chromatography UV-Vis mass spectrometry (LC-PDA-MS, see section 2.6). Impure, active fractions were fractionated again using the same experimental conditions but with fractions collected manually until pure peptides (estimated purity >90%) were obtained.
To perform extended bioactivity screening and structural analyses of individual peptides, more material of each peptide was needed. Multiple HPLC-runs, 23 injections in total, using the same conditions as described above, and by collecting fractions manually, where therefore performed. Pure peptide fractions (>90% as estimated with LC-MS) were pooled, lyophilised and weighed. Test solutions for bioactivity were prepared with MQ-H 2 O.
Cultures stored at -80°C were smeared onto agar plates and cultured for 24 h at 35°C. One colony of each bacterial strain was transferred to 5 ml liquid Müller-Hinton (MH, Difco, Lawrence, KS, USA) medium in a glass tube and left shaking at room temperature overnight at 600 rpm. The cultures of actively growing bacteria (20 μl) were further inoculated in 5 ml MH medium and left shaking for 2 h at room temperature. The antibacterial assays were performed as previously described by [31]. Briefly, the bacterial cultures were diluted with medium to 1.3-1.5 × 10 4 bacteria/ml final concentrations and an aliquot of 50 μl was added to each well in 96-well Nunclon TM microtiter plates (Nagle Nunc Int., Denmark) preloaded with 50 μl test sample solutions, i.e. extracts, eluates or peptides.
The test plates were incubated for 24 h at 35°C with optical density (OD 595 ) recorded every hour using an Envision 2103 multilabel reader, controlled by the Wallac Envision manager (PerkinElmer, CT, USA). Antibacterial activity was defined as a sample showing >90% inhibition (as measured by optical density) compared to the negative (growth) controls, consisting of bacteria and water. Oxytetracycline (20 μM) served as a positive (inhibition) control. The minimum inhibitory concentration (MIC) was defined as the lowest concentration of a sample displaying >90% inhibition.
The synthetic peptides (see section 2.11) were also screened for antifungal activity against Candida albicans (ATCC 10231), Saccharomyces cerevisiae, Rhodotorula sp., Aureobasidium pullulans and Cladosporium sp. The antifungal assay was performed as described previously [32]. Briefly, fungal spores were dissolved in potato dextrose broth (Difco, Lawrence, KS, USA) to a concentration of 4 × 10 5 spores/ml. The spores (50 μl) were inoculated on 96-well Nunclon TM microtiter plates containing the synthetic peptides (50 μl) dissolved in MQ-H 2 O. The fungal growth and MIC were determined visually after incubation for 24 h at room temperature. MIC was defined as the lowest concentration of peptide giving no visible fungal growth. The negative (growth) control contained medium and fungal solution.

Haemolytic activity assay
Synthesised peptide analogues of EeCentrocin 1 and 2 were screened for eukaryotic cell-toxicity with a haemolytic activity assay using human red blood cells as described previously [32]. The assay was performed on 96-well U-shaped microtiter plates (Nagle Nunc) with 50 μl peptide sample, 40 μl phosphate-buffered saline (PBS) and 10 μl red blood cells. The final peptide concentrations ranged from 100 to 0.1 μM in two-fold serial dilutions. After one hour of incubation at 37°C in a shaker, the plate was centrifuged at 200 g for 5 min and the supernatants (60 μl) were carefully transferred to a new flat-bottomed polycarbonate microtiter plate (Nagle Nunc) and absorbance at 550 nm was measured on a Synergy H1 multimode reader (BioTek, VT, USA). Cell suspension added 0.05% Triton X-100 (Sigma-Aldrich, MO, USA) in PBS served as positive (100% haemolysis) control and cell suspension added PBS served as negative (0% haemolysis) control. The percent haemolysis was calculated using the formula [(A sample -A baseline )/(A triton -A baseline )]×100. The experiment was performed in duplicates.

Two-dimensional liquid chromatography-mass spectrometry analyses
Antibacterial HPLC fractions and SPE eluates were analysed by liquid chromatography UV-Vis mass spectrometry (LC-PDA-MS) to identify antibacterial compounds and to perform purity determinations. The LC-PDA-MS system consisted of a 2695 separation module, a Sunfire C 18 (5μm, 2.1 × 100 mm) column, a 2998 PDA detector reading from 190 to 500 nm in 1.2 nm increments, and a Micromass ZQ mass spectrometer controlled by Masslynx v4.1 software (all from Waters, MA, USA). Compounds were eluted by running a linear gradient of increasing ACN concentrations in water (both containing 0.05% TFA) from 5 to 50% over 16 min, using a flow rate of 0.2 ml/min. Samples (5-20 μl) were introduced to the MS and analysed in positive ESI mode. Ions were recorded in full scan mode in the 100-2000 m/z-range (See S2  Table for typical MS settings). The mobile phases were the same as when running HPLC (0.05% TFA/H 2 O (v/v) and 0.05% TFA/ACN (v/v)).
High-resolution mass spectrometry (HR-MS) was performed on a Thermo LTQ Orbitrap XL with an electrospray ion source (ION-MAX) coupled to an Accela HPLC-system (Thermo Fisher Scientific, MO, USA). A Supelco Ascentis Express (2.7 μm, 50 × 2.1 mm) C 18 reverse phase column was used. The datasets were deposited to The Mass Spectrometry Interactive Virtual Environment repository (MassIVE, http://massive.ucsd.edu/ProteoSAFe/static/ massive.jsp) database with accession number MSV000079515.

Peptide sequencing
Enzyme digestion with endoproteinase Arg-C and Edman degradation were performed by Eurosequence BV (Groningen, The Netherlands). Further sequence confirmation and elucidation of modified amino acids was achieved through trypsin digestion or reduction/alkylation and successive HR-MS. The protocol of Promega (Promega, WI, USA, available from http:// no.promega.com/resources/protocols/product-information-sheets/n/sequencing-grademodified-trypsin-frozen-protocol/) was followed regarding protease treatment. Briefly, peptides (700 μg) were dissolved in 6 M Guanidine HCl, 50 mM Tris HCl (pH 8) and 4 mM Dithiothreitol (DTT, Sigma-Aldrich, MO, USA) in a reaction volume of 100 μl. The reaction mixture was heated at 95°C for 20 min, cooled down to room temperature and added 550 μl 50 mM NH 4 HCO 3 (pH 7.8). Porcine trypsin (Promega, WI, USA) was added at a 33:1 ratio (peptide:trypsin, 700:21 μg, 15.000 u/mg) and incubated for 16 h at 37°C. High-resolution LC-MS of the digested peptide was performed as previously described in section 2.6.
Reduction and alkylation was performed by dissolving~20 nmol peptide in 100 μl 0.5 M Tris HCl/1 mM EDTA/6M Guanidine HCl and adding 5 μl 2.2 M DTT (Sigma-Aldrich, MO, USA). The peptide solution was flushed with N 2 to prevent oxidation and incubated for 16 h at 37°C. After incubation, 5 μl 4-vinylpyridine (Sigma-Aldrich, MO, USA) was added to the solution and incubated for 20 min at 37°C. The reaction was stopped using RP-SPE with a C 18 cartridge as previously described in section 2.3. The alkylated peptides were eluted with 80% ACN/H 2 O (v/v) containing 0.05% TFA.

Characterisation of full length cDNA
Total RNA was isolated from the pooled coelomocytes of three animals using the QIAZol TM reagent in accordance with the manufacturer's instructions (QIAGEN, MD, USA). Reverse transcription polymerase chain reaction (RT-PCR) was carried out using a rapid amplification of cDNA ends (RACE) kit (Clontech, CA, USA). Total RNA (1 μg) was used as a template to synthesise 5' Ready-to-Go cDNA or 3' Ready-to-Go cDNA according to the manufacturer's instructions.
In order to obtain partial cDNA sequence (3' region), degenerate oligonucleotide primed PCR (DOP-PCR) was performed as previously described [14]. Briefly, 0.5 μg of template (3' Ready-to-Go cDNA), 10 × Optimised DyNAzyme™ Buffer, 1 μM of the forward primer EeCen 1DF (for EeCentrocin 1), EeCen 2DF (for EeCentrocin 2) or EeStrong 2DR (for EeStrongylocin 2) and reverse primers nested universal primer (NUP), 0.2 mM dNTP, 0.4 units DyNAzyme™ II DNA polymerase (Finnzymes, Finland) and water were mixed to bring the reaction volume up to 25 μl. DOP-PCR was performed according to the following cycle: 94°C for 5 min, 35 cycles at 94°C for 30 sec, 55°C for 30 sec and 72°C for 2 min, followed by a final extension at 72°C for 10 min. The DOP-PCR products of 3' region were cloned into pGEM1-T vector and sequenced using primer Sp6 and T7. The correct sequences were confirmed by comparison of deduced amino acid sequences with sequences obtained by Edman degradation. The 5' region of gene was cloned by using gene specific primer EeCen 1R, EeCen 2R or EeStrong 2R from the 3' region and the primer NUP. The full-length nucleotide sequence was deduced based on the overlapping sequence of the obtained RACE product with existing partial cDNA sequence. Furthermore, the full length coding sequences were verified by PCR amplification using the primers (EeCen 1 AF/AR for EeCentrocin 1, EeCen 2 AF/AR for EeCentrocin 2 or EeStrong 2 AF/AR for EeStrongylocin 2) situated at the extreme ends of the open reading frame. An overview of all primers is presented in S3 Table. The sequences were submitted to GenBank with accession numbers KR494262, KR494263 and KR494264. The acquired spectra were referenced on the residual solvent signal δ H1 = 4.79 PPM and δ C13 from γH1:-γC13 = 3.976813 (water-d 2 ). Data processing and figures were made using the MestReNova v9.0.1 and NMRPipe v8.1 [33] software, and peptide assignment was made using CARA v1.8.4.2 [34].

Data analysis and interpretation
The potential presence of peptide homologues was examined using the BLAST search engine (http://blast.ncbi.nlm.nih.gov/Blast.cgi) [35] on the National Centre for Biotechnological Information (NCBI) homepage. The blastp and blastn algorithms were used, searching for non-redundant protein sequences. Additionally, the built-in BLAST search functionality of the LAMP database (http://biotechlab.fudan.edu.cn/database/lamp/) [36] was used. Predicted signal sequences were determined using the SignalP 4.1 server (http://www.cbs.dtu.dk/services/ SignalP/) [37] with the default setting for D-cutoff values and no TM regions selected. The cDNA sequence and deduced amino acid sequence of sea urchins were analysed using the BLAST program and the ExPASy Translate tool (http://web.expasy.org/translate/) with the genetic code set to standard. Alignments using ClustalW [38] and phylogenetic trees using the Neighbour-joining method [39] with evolutionary distances computed using the Poisson correction method [40] were constructed by the Mega 6.06 software (http://www.megasoftware. net/) [41]. Mass spectrum predictions were performed with the ChemCalc online prediction software (http://www.chemcalc.org/) [42] using the peptides tool with resolution set to 0.001. Graphs were made using Graphpad Prism v 6.00 for Windows (Graphpad Software, CA, USA). Row means, SD and linear regression was performed by the software.

Peptide synthesis
The non-brominated heavy chains (HC) of EeCentrocins 1 and 2 and the light chain (LC) of EeCentrocin 1 were synthesised commercially (GenicBio Ltd., Shanghai, China), as were the brominated HC of EeCentrocin 1 (HC-diBr, Isca Biochemicals, Devon, UK). The synthetic peptides were subjected to antimicrobial activity screening and haemolytic testing as previously described in sections 2.4 and 2.5.
The fragment, GW Br W Br R, of EeCentrocin 1 was synthesised (Isca Biochemicals) to perform mass spectrometric comparisons with the trypsinated N-terminal fragment of EeCentrocin 1. The brominated peptides were produced with both Trp residues substituted in the 6 position (i.e. 6-D/L-Trp) of the indole ring.

Isolation of AMPs
Antimicrobial compounds have previously been detected in the coelomic fluid and coelomocytes of various echinoderms [14-16, 20, 23, 43]. In the present study, four different SPE eluates obtained from an aqueous extract and one organic extract of E. esculentus coelomocytes were tested for antibacterial activity. Of the extracts, the 40% SPE eluate displayed the highest antibacterial activity (S1 Table). The antibacterial activity in this eluate ranged from 0.01 to 0.31 mg/ml, depending on test organism, and was therefore selected for further examinations. Out of the four bacterial strains tested (C. glutamicum, S. aureus, P. aeruginosa and E. coli), the Gram-positive C. glutamicum was the most sensitive overall. The 40% SPE eluate was fractionated by RP-HPLC and one-minute HPLC fractions were screened for activity against the same four bacterial strains. Growth-inhibiting properties were discovered in a series of fractions eluted with 20-30% ACN (Fig 2).
Three antibacterial fractions (eluting at 31, 32 and 38 min; Table 1) were further subjected to purification by RP-HPLC and manual fraction collection. Antibacterial activity was monitored on aliquots of the collected fractions during the purification process. Fraction 31 proved

Primary structure elucidation
Edman degradation. An aliquot of each peptide fraction was reduced, alkylated and sequenced by Edman degradation. The analysis revealed multiple signals of amino acids of similar intensity in each position of the peptide fractions Ee4835 and Ee4635/Ee5024, suggesting that these fractions contained peptides composed of more than one peptide chain. The reduced and alkylated peptides were therefore subjected to RP-HPLC purification and successive Edman degradation of each chain. The analysis revealed that the peptide Ee4835 seems to have a heterodimeric structure, containing a heavy chain (HC) composed of 30 amino acids (Seq 5, Table 1) and a light chain (LC) composed of 13 amino acids (Seq 6, Table 1), possibly connected via a single disulphide bond. Additionally, the analysis exposed uncommon or modified residues (X) in positions two and three of the HC. Analysis of Ee4635/Ee5024 returned two sequences with similarity to the HC of Ee4835, probably belonging to the two different peptides known to be in the sample. The main peptide (Ee4635 giving rise to the largest signal intensity both in MS analysis and during sequencing) consisted of 29 amino acids (Seq 1, Table 1). Minor but distinct signals were also recorded for a 32 amino acid peptide (Seq 2, Table 1). Both peptide sequences contained a single cysteine residue and had uncommon or otherwise modified amino acids in different positions: 1 and 6 for Ee4635 and 1 and 9 for Ee5024. Although the HPLC chromatogram of the alkylated peptide fraction displayed additional peaks, no other sequences were obtained. This indicates that additional peptide fragments were present, but N-terminally blocked. Edman degradation analysis of Ee5922 revealed a partial 17 amino acid N-terminal sequence (Seq 3, Table 1). Enzymatic treatment with endoproteinase Arg-C and subsequent purification and sequencing of cleavage products revealed an additional 9 amino acid sequence (Seq 4, Table 1). The obtained amino acid sequences formed the basis for cDNA library construction.
Characterization of cDNA sequences. To elucidate the complete peptide sequences, degenerated primers were designed according to the partial primary peptide sequences. The constructed 3' RACE-Ready cDNA library was employed as template to amplify the 3'-end of the transcripts. Three partial cDNA clones of~320 bp,~400 bp and~370 bp were cloned and sequenced. These encoded the C-terminal end of the purified peptides Ee4835, Ee5024 and Ee5922 respectively. No cDNA sequence matching Ee4635 was found. Using the 5'RACE-PCR approach, 5' end cDNA sequences of Ee4835, Ee5024 and Ee5922 were cloned. The cDNA of Ee4835 was 660 bp in length with an open reading frame of 360 bp encoding a polypeptide of 119 amino acids (Fig 3). The theoretical pI and MW of the Ee4835 precursor were calculated to be 5.18 and 13040.0 Da respectively. The cDNA of Ee5024 was 676 bp in length with an open reading frame of 366 bp encoding a polypeptide of 121 amino acids (Fig 3). The theoretical pI and MW of the Ee5024 precursor were calculated to be 5.69 and 13151.1 Da respectively. The cDNA of Ee5922 was 675 bp in length with an open reading frame of 267 bp encoding a polypeptide of 89 amino acids (Fig 4). The theoretical pI and MW of the Ee5922 precursor was calculated to be 8.58 and 10297.96 Da respectively.  Based on the deduced amino acids from cDNA analysis, the non-identified amino acids during Edman degradation sequenced in the two peptides Ee4835 and Ee5024 were found to be Trp residues. The N-terminal amino acid in the peptide Ee5922 was also shown to be Trp. Since Trp is normally detected during Edman degradation, these Trp-residues are likely modified. The precursor molecules of the peptides Ee4835 and Ee5024 have preprosequences. Analysis with SignalP 4.1 using the neural network model with SignalP-noTM setting showed that the highest calculated cutoff value was located between positions 20 and 21 in the N-terminal sequence of both proteins. Therefore, the predicted signal peptides consist of the 20 N-terminal amino acids followed by a prosequence of 30 amino acids (Fig 3). The native peptides with a proposed dimeric structure start at Gly-51 in Ee4835 and Trp-51 in Ee5024. A 24 amino acid interchain sequence separates the HCs from the LCs and is not present in the mature peptides. The two interchains (belonging to Ee4835 and Ee5024) are very similar, differing in only one amino acid residue. The LCs of both peptides seem to consist of 15 amino acids and both peptides contain a C-terminal dipeptide (Gly-Arg) which, based on Edman degradation and MW data, is cleaved of. The theoretical pI of the deduced mature peptides was 10.04 for both, indicating a cationic character. In silico analysis of the peptide Ee5922 suggested that the first 22 amino acids represent a signal peptide followed by a prosequence containing 16 amino acids (Fig 4). The native, cysteine-rich peptide starts at Trp-39.
All peptides discovered in the present study have an abundance of positively charged amino acids (23-29%) and hydrophobic amino acids (29-42%). This indicates that these sequences are cationic and have the possibility to form an amphipathic structure, a feature which is common for most AMPs and considered important for their antimicrobial activity [2,[44][45][46][47][48]. Their positive charge will aid in the electrostatic attraction between the peptide and the anionic microbial membranes [49][50][51][52]. Distributing positively charged residues on one side and hydrophobic residues on the other side of the structure (i.e. amphipathic), allows the peptide to incorporate itself into and act on the bacterial membranes, as suggested by several authors [2,45,52]. All three peptides contain Trp residues which are considered especially important for peptide hydrophobicity and the interactions in the membrane-water interface [48].

Homology searches and bioinformatics
BLAST searches performed on the deduced full-length amino acid sequences of Ee4835, Ee5024 (Fig 3) and Ee5922 (Fig 4) revealed homology to the centrocins and strongylocins isolated from S. droebachiensis [14,23]. Several putative proteins in S. purpuratus were also found to be homologues of the Ee-peptides. Due to their apparent similarity with centrocins and strongylocins (strongylocin 2 primarily), the names EeCentrocin 1 (Ee4835) and 2 (Ee5024) and EeStrongylocin 2 (Ee5922) are proposed for the peptides characterised in the present study. For clarity, the originally characterised centrocins and strongylocins (from S. droebachiensis) will be referred to as SdCentrocins and SdStrongylocins for the remainder of this paper.
Alignment of EeCentrocins, SdCentrocins and predicted homologues in S. purpuratus displays high similarity in the preprosequence region (58-67%) and in the interchain regions (58-79%), but not in the HC or LC regions (Fig 3) which display a much greater individual diversity. This suggests that the HCs and LCs are subject to a much higher mutation rate and that the signal sequences and pro sequences are more conserved regions. The partial peptide sequence of Ee4635 did not return any homologous sequences using NCBI BLAST search analysis. However, analysis performed by the built-in BLAST search engine of the LAMP database revealed low (E-value of 0.59) but significant homology to SdCentrocin 1. Based on the fact that the sequence also contains one cysteine residue, unidentified or modified residues and an abundance of positively charged and hydrophobic amino acids, it could be hypothesised that this peptide too belongs to the diverse heavy chains of the centrocins.
In silico analysis of the complete EeCentrocin sequences indicates that the first 20 amino acids in the precursors function as signal peptides. Furthermore, the data show that the precursor molecules have a first prosequence region of 30 amino acids, followed by a heavy chain sequence (30 and 32 amino acids for EeCentrocin 1 and 2 respectively), a second prosequence region of 24 amino acids (interchain), a light chain sequence of 15 amino acids and finally a prosequence consisting of two amino acids. The function of these prosequences is unknown, but they might aid in proper folding of the active peptides and/or function as a target for sitespecific proteases [23,53].
Alignment of EeStrongylocin 2, SdStrongylocins and predicted homologues in S. purpuratus (Fig 4) displays similarities in the presequence region (50-59%), the prosequence region (12-50%) and in the mature peptide (40-84%). EeStrongylocin 2 displays greatest identity with SdStrongylocin 2a and 2b (68-69%). In silico analysis defined the first 22 N-terminal amino acids as the signal peptide, leaving a 16 amino acid prosequence before the mature peptide. As centrocin-like and strongylocin-like peptide sequences have been discovered in three species of sea urchins, the possibility of them being a trait of this class of echinoderms exist, increasing the possibility of discovering other homologous bioactive peptides in other species of sea urchins. Fig 5 represents the phylogenetic trees composed of all homologues found in S. droebachiensis and S. purpuratus by BLAST searches. It appears that the EeCentrocins (Fig 5A) are phylogenetically separated from the other peptides, sharing one common ancestor with all. The SdCentrocins are more closely related to the predicted centrocin-like proteins in S. purpuratus. The SdStrongylocin 1 and 2 AMPs seem to be separated phylogenetically (Fig 5B) but share a common ancestor. EeStrongylocin 2 aligns more closely with SdStrongylocin 2. The S. purpuratus genome has been completely sequenced [54] whereas only a few proteins from S. droebachiensis and E. esculentus have been sequenced. It is therefore unknown whether the two species E. esculentus and S. droebachiensis contain additional strongylocin-like and centrocinlike peptides or not, and explains the excess of sequences from S. purpuratus in the figures.

Characterisation of post-translational modifications
The theoretical monoisotopic mass of EeCentrocin 1 (deduced from cDNA) containing an intramolecular disulphide bond was calculated to be 4675.37 Da, leaving a gap of 154.83 Da to the native isolated peptide (4830.20 Da). In order to identify the modifications of the second and third amino acids (both being Trp residues according to cDNA), the peptide was degraded by trypsin and analysed by HR-MS. The mass value of the major fragment and its corresponding isotope distribution of [M+H] + ions (Fig 6A) agreed well with the ion distribution of the synthetically produced fragment GW Br W Br R ( Fig 6B) and a theoretical ion distribution of [GW Br W Br R+H] + (Fig 6C), indicated that the peptide contains two brominated Trp residues. Seven ion peaks are clearly visible in all three figures with similar relative abundances and minute differences in m/z values (See S7 Table for calculated and measured m/z values). The presence of two Br-Trp in EeCentrocin 1 leads to a theoretical monoisotopic mass of 4831. 19 Da. This exceeds the measured mass by approximately one Da, suggesting that one of the peptide chains is amidated at the C-terminal (4830.20 Da).
The deduced sequence of the EeCentrocin 1 LC holds a dipeptide "Gly-Arg" at the C-terminal end, which resembles previously published amidation signal sequences. As shown by HR-MS (S3 Fig), the LC of the isolated peptide is in fact amidated at the C-terminal. The tachyplesin precursor from the horseshoe crab (Tachypleus tridentatus) [56], aureins (excluding aurein 5.3) from the frog (Litoria aurea) [57] and astacidin 2 from the fresh water crayfish (Pacifastacus leniusculus) [58] all contain amidation signals such as "Gly-Lys" and "Gly-Lys-Arg" which leads to a C-terminal amidation. However in the centrocins [23] of S. droebachiensis, no amidation is observed despite the presence of the cleaved-off "Gly-Arg" C-terminally.
No MS data fits the theoretical monoisotopic mass (4881.56 Da) calculated from the deduced amino acid sequence of EeCentrocin 2. Based on the findings from EeCentrocin 1 and the bromination occurring in centrocins [23], we hypothesise that EeCentrocin 2 also has two The evolutionary history was inferred using the Neighbour-joining method [39] and the optimal trees are shown. The percentage of replicate trees in which the proteins clustered together during the bootstrap test (500 replicates) is given next to the nodes [55]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method [40]. Accession numbers are given in parentheses. post-translationally brominated Trp residues (positions 1 and 9) in the HC. This is also based on the inability of Edman degradation to identify any residue in these positions. Adding two bromines to the elemental composition leads to a theoretical mass of 5037.38. This deviates from the isolated peptide by +18.00 Da, indicating the presence of additional modifications on the isolated AMP. The deduced LC sequence of EeCentrocin 2 also contains a "Gly-Arg" amidation signal, which indicates that the peptide contains an amidated His residue C-terminally. Additionally, the LC contains an N-terminal Gln residue. The formation of pyroglutamic acid (Glp) is known to occur both enzymatically and spontaneously [59,60] when Gln or Glu is located N-terminally in a peptide sequence. HR-MS and MS/MS data of the alkylated LC of EeCentrocin 2 (S4 Fig) supports formation of both N-terminal Glp and C-terminal amidation, leading to a theoretical monoisotopic mass of 5019.37 Da, which corresponds to the measured mass. Glp-formation also explains why no sequence for the LC was obtained during Edman degradation as it sequences α-amino groups [61]. In The Antimicrobial Peptide Database, there are 19 entries with N-terminal Glp of which one is of marine origin [62]. The measured monoisotopic mass of EeStrongylocin 2 corresponds to the amino acid sequence deduced from cDNA. The theoretical MW of the peptide with 3 disulphide bonds is 5839.86 Da, and by replacing an indole hydrogen with a bromine, the theoretical monoisotopic mass (5917.80 Da) matches the experimental one (5917.77 Da). Additionally, a brominated ion at m/z 769.19  (Fig 7 and S7 Fig). This led to the confirmation of, not only 6-Br-Trp in EeCentrocin 1 and EeStrongylocin 2, but also two 6-Br-Trp residues in EeCentrocin 2 where only cDNA data (displaying Trp) existed for positions 1 and 9. One other 6-Br-Trp residue was identified, likely to belong to the 6 th amino acid in Ee4635 as the chemical shifts of a 6-Br-Trp in the first position of Ee4635 would perfectly overlap with EeCentrocin 2 due to identical N-terminals.
Marine species are well known to incorporate bromine in their secondary metabolites and peptides [63,64]. Bromination of Trp in the 6 position as a post-translational modification was first reported in 1997 in toxins isolated from cone snails Conus imperialis and C. radiatus [63]. Styelin D from the marine tunicate (Styela clava) also has a confirmed 6-bromination of Trp [65]. Several other marine organisms have brominated Trp residues but without confirmed positioning: cathelicidins from the Atlantic hagfish (Myxine glutinosa) [49], hedistin from the marine annelid (Nereis diversicolor) [66], strongylocin 2 and the centrocins from the green sea urchin (S. droebachiensis) [14,23]. The biological function of bromine-substitution of Trp is not known, but has been suggested to aid in the proteolytic protection of the peptides [49].
No ordered structure of the centrocin-peptides in MQ-H 2 O and with high salt concentrations has been detected during the NMR experiments. However, several peptides only adopt an ordered structure in the presence of membranes or membrane-mimics [52]. The secondary structure of EeStrongylocin 2 as dictated by its disulphide bonds is yet to be elucidated. The proposed structure of the two dimeric peptides EeCentrocins 1 and 2, and the primary structure of EeStrongylocin 2 can be viewed in

Bioactivities
The antimicrobial activity of the native EeCentrocin 1 was measured against four selected bacterial strains ( Table 2, raw data in S9 Table). The peptide displayed potent antibacterial activity against the Gram-positive bacteria, C. glutamicum and S. aureus (MIC = 0.78 μM against both) and the Gram-negative bacteria, E. coli and P. aeruginosa (MIC = 0.1 and 0.78 μM  respectively). The activity was in the same range as for the SdCentrocins [23]. Because of a challenging purification, we were not able to perform investigations of the antibacterial potency of the native EeCentrocin 2 and Ee4635. However, the mix of both these peptides was antibacterial. The antibacterial potency fluctuates enormously among amphipathic AMPs [67][68][69], but EeCentrocin 1 (and the HC of EeCentrocins 1 and 2) are antimicrobial in similar ranges as those of the SdCentrocins [23]. According to previous published work [51], this is a typical activity range for AMPs. It has also been stated that the MIC activity range of an AMP rarely falls below 0.5-1.0 μM [50], which is in the same MIC region as the most potent AMPs presented in this paper. The HC of the EeCentrocins was shown to be the antimicrobial portion of the peptides, displaying similar potency as the native peptide against some bacterial strains ( Table 2). Interestingly, the activity of the HC also appears to be independent of the bromination of the Trp residues. The MICs of EeCentrocin 1 HC-diBr towards bacteria ranged from 0.78 to 6.25 μM which is almost identical to the MIC displayed by EeCentrocin 1 HC (0.39-6.25 μM). The HC of EeCentrocin 2 displayed similar antibacterial activities (MIC ranging from 0.78-6.25 μM). The antifungal activities of the two EeCentrocin 1 HCs were also quite similar, differing only by one dilution step, whereas EeCentrocin 2 HC seems to be a slightly more potent antifungal agent. The LC (synthesised with a C-terminal carboxyl group) of EeCentrocin 1 is of no observable importance to antimicrobial activity, when comparing the native peptide with the dibrominated or nonbrominated HCs, and it is not antimicrobial when tested alone. This supports our previous studies on the LC of the SdCentrocins [23] and suggests other tasks for the LC. Whether a C-terminally amidated peptide (increasing the charge by +1) would have displayed antimicrobial activity is uncertain.
The MIC of native EeStrongylocin 2 was found to range from 0.78 to 3.13 μM against the tested strains. The SdStrongylocins displayed similar antibacterial activity with MICs ranging from 2.5 to 5 μM [14]. The two recombinantly produced peptides, SpStrongylocins 1 and 2 (originating from S. purpuratus) displayed MICs towards the same strains at 15 and 7.5 μM respectively [24].
The synthesised EeCentrocin 1 (HC, HC-diBr and LC) peptide analogues displayed no or minor haemolytic activity at a concentration of 100 μM (Fig 9, raw data in S10 Table), a concentration which is 16 times higher than the MIC against the least sensitive bacteria (Staphylococci) tested in this study. No or minor haemolytic activity is a prerequisite if the peptides are ever to be exploited clinically or as food additives [70]. EeCentrocin 2 HC was more haemolytic, displaying 11.7, 18.9 and 56.3% haemolysis at concentrations 25.0, 50.0 and 100 μM, respectively. The reason for this higher haemolytic activity of EeCentrocin 2 compared to EeCentrocin 1 is unclear. However, increasing physicochemical parameters like hydrophobicity and hydrophobic face are known to enhance the haemolytic activity of α-helical AMPs [71].

Conclusion
The widespread use of antibiotics and the associated development of microbial resistance to these drugs has emerged as a major global problem. The search for new antibacterial agents has therefore become an important area in natural product drug discovery. Cationic AMPs have previously been isolated from various species and phyla, and represent a novel class of antibiotics. In this study, potent AMPs were for the first time characterised from coelomocyte extracts of the Edible sea urchin, E. esculentus, collected from sub-Arctic waters. Three 5-6 kDa AMPs were shown to be novel members of the centrocin and strongylocin families of AMPs. The EeCentrocins have a heterodimeric structure composed of a heavy chain and a light chain connected by a single disulphide bond. Neither the presence of brominated amino acids in the heavy chain nor a light chain with C-terminal amidation and N-terminal pyroglutamic acid seem to be necessary for maintaining antibacterial activity, but these additional structures might aid in the protection against proteolytic degradation of the peptides. The secondary structure and the three-dimensional conformation of EeStrongylocin 2 as dictated by its three disulphide bonds remains unknown, but should be explored. Future research should also include mode of action studies, where cellular targets are identified, and structure-activity relationship studies where truncated analogues of the EeCentrocin HCs are constructed in order to pinpoint the pharmacophore. Additionally, as it appears that sea urchins in general are producers of strongylocins and centrocins, a genomic approach to discover homologues in other sea urchins (or echinoderms) for the discovery of novel AMPs could be a beneficial venture.
This study has demonstrated that marine invertebrates are a valuable resource for discovering unique bioactive peptides, providing promising leads for development of novel antimicrobial drugs.    Table. MS experiment and parameters. The experiments used for mass spectrometry are listed for low resolution above and high resolution below. (XLSX) S3