New Structural Variants of Aeruginosin Produced by the Toxic Bloom Forming Cyanobacterium Nodularia spumigena

Nodularia spumigena is a filamentous diazotrophic cyanobacterium that forms blooms in brackish water bodies. This cyanobacterium produces linear and cyclic peptide protease inhibitors which are thought to be part of a chemical defense against grazers. Here we show that N. spumigena produces structurally novel members of the aeruginosin family of serine protease inhibitors. Extensive chemical analyses including NMR demonstrated that the aeruginosins are comprised of an N-terminal short fatty acid chain, L-Tyr, L-Choi and L-argininal and in some cases pentose sugar. The genome of N. spumigena CCY9414 contains a compact 18-kb aeruginosin gene cluster encoding a peptide synthetase with a reductive release mechanism which offloads the aeruginosins as reactive peptide aldehydes. Analysis of the aeruginosin and spumigin gene clusters revealed two different strategies for the incorporation of N-terminal protecting carboxylic acids. These results demonstrate that strains of N. spumigena produce aeruginosins and spumigins, two families of structurally similar linear peptide aldehydes using separate peptide synthetases. The aeruginosins were chemically diverse and we found 11 structural variants in 16 strains from the Baltic Sea and Australia. Our findings broaden the known structural diversity of the aeruginosin peptide family to include peptides with rare N-terminal short chain (C2–C10) fatty acid moieties.


Introduction
N. spumigena is a filamentous diazotrophic cyanobacterium that forms extensive summer blooms in brackish water bodies. The ability to fix atmospheric nitrogen confers a competitive advantage on N. spumigena in nitrogen-poor and iron-limited brackish water ecosystems [1], [2], [3]. N. spumigena is responsible for a large part of the new nitrogen input in the Baltic Sea and is a source of environmental concern [1]. The consumption of water containing N. spumigena is associated with the death of wild and domestic animals [4], [5], [6]. These blooms are toxic through the production of nodularin, a cyclic pentapeptide toxin [4], [5]. Nodularin is the end-product of a complex hybrid non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) biosynthetic pathway [5]. N. spumigena produces other nonribosomal peptides in addition to nodularin, including spumigin and nodulapeptin [7], [8], [9].
Spumigins are linear peptides which contain an N-terminal hydroxyphenyl lactic acid, almost exclusively D-homotyrosine, proline or 4-methylproline (mPro), and a C-terminal lysine or arginine derivative [7], [8], [9]. Spumigins are assembled on a NRPS enzyme complex which offloads the peptides as reactive aldehydes [8]. The enzymatic steps necessary for the synthesis of the unusual mPro are encoded together with the peptide synthetases in the 21-kb spumigin gene cluster [8]. Spumigins are potent inhibitors of serine proteases, active in the micromolar to nanomolar range [7], [8], [10].
N. spumigena encodes a number of cryptic NRPS clusters for which end-products have not been characterized [3], [17]. It was proposed based on the presence of Choi biosynthetic genes that N.
spumigena may produce aeruginosins [3]. A recent study reported an incomplete peptide structure which contains Choi from N. spumigena [18]. Here we show that N. spumigena produces new members of the aeruginosin family of protease inhibitors using extensive chemical analyses including NMR studies ( Figure 1) and demonstrate that N. spumigena strains produce two classes of similar non-ribosomal peptides, aeruginosins and spumigins, simultaneously using separate peptide synthetases. These results broaden the structural diversity of the aeruginosin family of peptides to include peptides with fatty acid side chains.

Discovery of Aeruginosins
Two abundant peptides with a mass of m/z 587 and m/z 589 were identified from cell extracts of Nodularia spumigena AV1 by LC-MS analysis. They were initially suspected to be new variants of spumigin based on their mass and chromatographic behavior ( Figure S1 in File S1). However, fragmentation of the protonated ions m/z 587 and m/z 589 did not produce sufficient information for overall substructure elucidation and only the presence of argininal and argininol could be postulated (Figures S1-S2 in File S1). Surprisingly, MDA and DNPH derivatization of the compounds and subsequent MS 2 data suggested that the two peptides contained but differed by the presence of alcohol and aldehyde versions of arginine ( Figure S3 in File S1). The moiety is a unique component of the aeruginosin family of linear peptides and we hypothesized that N. spumigena make members of the aeruginosin family of peptides.

Aeruginosin Gene Cluster
A 17.6 kb aeruginosin (aer) gene cluster was subsequently identified on a single contig in the genome of N. spumigena CCY9414 (GenBank accession number CM001793) through tBLASTn searches using AerD, AerE and AerF protein sequences involved in the synthesis of the Choi moiety ( Figure 2). The aer gene cluster was located 133 kb apart from the spumigin gene cluster (Figure 2), which was located just 11 kb from the nodulapeptin gene cluster (Figure 2).
The aer gene cluster encodes 8 proteins organized in a single operon ( Figure 3; Table 1). The predicted substrate specificities of the AerM, AerB and AerG peptide synthetases were L-Arg, L-Tyr and Choi through comparison with other aeruginosin biosynthetic pathways ( Table 2). Aeruginosin biosynthesis was predicted to start by loading short-chain fatty acids using the AerB condensation domain, as in a number of other non-ribosomal biosynthetic pathways ( Figure 3). In order to test this we performed phylogenetic analyses to assign the condensation domains of AerB, AerM and AerG to different condensation domain subtypes ( Figure 4). Phylogenetic analysis of the AerB condensation domain showed a close relationship between the loading condensation domains of the nostopeptolide and cyanopeptolin biosynthetic pathways ( Figure 4). Genes encoding the AerD, AerE and AerF enzymes were also located in the aer gene cluster (Figure 3) as was a gene encoding a putative glycosyltransferase, AerI (Table 1).
Preliminary MS 2 fragmentation was not sufficient to resolve the first two sub-structural elements and suggested that the second amino acid could be Leu or Tyr while bioinformatic analyses suggested that the substrate could be L-Tyr (Table 2). In order to gain more information on the substructure of the aeruginosins an ATP-PPi exchange assay was performed which demonstrated that L-Tyr was activated by the AerB adenylation domain in vitro ( Figure 5). The presence of an epimerase domain suggested that the substrate of AerB was L-Tyr, which would be epimerized to D-Tyr as found in other aeuginosins.

Aeruginosin Chemical Structure
The main aeruginosin variant (m/z 587) was hydrolyzed in order to confirm these biochemical and bioinformatic predictions and to obtain further information on the structure of the new peptides. However, despite the presence of a full-length and apparently functional epimerase domain in AerB, the second amino acid was unambiguously determined to be L-Tyr according to chromatographic amino acid analysis of aeruginosins subjected to acid hydrolysis. The amino acids found at this position in other aeruginosins are almost exclusively D-amino acids. Reanalysis of the main aeruginosin (m/z 589) product ion spectrum also showed the presence of Tyr in position 2 ( Figure S4 in File S1). The third amino acid, Choi, was in L configuration based to the configuration of Choi in aeruginosin 298-A. The N-terminal moiety could not be detected by amino acid analysis. However, GC-MS analysis of the hydrolyzed aeruginosin allowed unequivocal identification of the N-terminal moiety as hexanoic acid based on retention time and spectra ( Figure 6).
The complete structure of the main aeruginosin (m/z 587) was confirmed by NMR analysis. Separation and purification of this peptide from the other aeruginosins and spumigins produced by the N. spumigena AV1 strain was hindered by the reactive aldehydic nature of the compound ( Figure S1 in File S1). NaBH 4 reduction of the peptides in the methanol extract converted the aldehyde to an alcohol which made it possible to purify reduced aeruginosin by HPLC. 1 H and 13 C NMR signals yielded four partial structures confirming the aeruginosin structure ( Figure 1A). The NMR spectral data are presented in the supplementary material (Table  S1 in File S1; Figures S5-S7 in File S1). Accurate mass measurement of the protonated molecules together with the 15 Nlabeling of the aeruginosins NAL2 and NOL3 were in full agreement with the other results.

Chemical Variation
Detailed inspection of N. spumigena AV1 and CH307 strains allowed the identification of 11 structural variants of aeruginosins (Table 3; Table S2 in File S1). These could be divided into aeruginosins containing either an aldehyde (NAL1-NAL4) or alcohol (NOL1-NOL7) functionality. A range of short straightchained carboxylic acids were found at the N-terminus (Table 1). Approximately 83% of the variants contained hexanoic acid in AV1 while 93% of the variants contained octanoic acid in CH307 but both strains produced small amounts of aeruginosins with shorter and longer chained carboxylic acids at this position. An Olinked pentose was identified in 4 of the 11 aeruginosins and located on the Choi moiety ( Figure 1B, Figure S8 in File S1). Inspection of 16 strains of N. spumigena revealed just a single strain which lacked detectable levels of aeruginosins ( Figure 7 and Table  S3 in File S1). Glycosylated aeruginosins could be detected in just 3 of the 16 strains (Table S3 in File S1). Three strains, CH307, CH311 and P38, produced glycosylated aeruginosins, of which CH307 produced variants containing octanoic acid as the main fatty acid.
The presence of aeruginosins as well as the distribution of genes encoding AerM, AerB, AerG, and AerI was determined in 16 strains of N. spumigena, 4 strains of N. harveyana, and 8 strains of N. sphaerocarpa (Table S3 in File S1). All 16 strains of N. spumigena contained aeruginosin biosynthetic genes and 15 of these strains produced aeruginosins (Figure 7). Aeruginosins were not found from N. spumigena AV45, which seems to encode only a partial aer gene cluster. All strains of N. spumigena from the Baltic Sea contained aer clusters encoding the AerI putative glycosyltransferase. The gene encoding this enzyme was not detected in any of the strains isolated from Australia. Neither aeruginosins nor aeruginosin biosynthetic genes could be found in any strains of the benthic N. harveyana or N. sphaerocarpa tested (Table S3 in File S1). The majority of the 16 strains of N. spumigena produced both spumigin and aeruginosin ( Figure 7). Some strains produced more aeruginosins and spumigins than others ( Figure 7). About half of the 16 strains produced mPro containing spumigins while the remainder produced spumigins containing Pro. Using UV (280 nm) to quantify the amount of aeruginosins in the AV1 strain demonstrated that spumigins comprise 7 % and aeruginosins 3-4 % of the dry weight. All 16 strains of Nodularia spumigena produced spumigins and the nodularin toxin. Almost all of the 16 strains of Nodularia spumigena produced nodulapeptins. Suomilide was identified only in N. sphaerocarpa strains (Table S3 in File S1).

Discussion
Aeruginosins are a chemically diverse family of peptides known to date from just the bloom-forming cyanobacterial genera Microcystis and Planktothrix [11], [12], [13], [14]. The relationship between aeruginosins and spumigins has been subject to speculation for some time [12,13,15,16]. Phylogenetic analyses suggest that the spumigin gene cluster of N. spumigena and the aeruginosin gene clusters of Microcystis and Planktothrix are unrelated [8]. Recent studies demonstrated that N. spumigena encodes a number of cryptic NRPS clusters for which the products were unknown [3], [17]. It was suggested based on the presence of Choi biosynthetic genes that it may produce an aeruginosin [3]. Interestingly a recent study has shown that N. spumigena isolated from variety of geographic locations produce a diversity of peptides including a partial peptide which contains Choi and may be assigned to the aeruginosin family [18]. This study suggests that N. spumigena produces bona fide aeruginosins in addition to spumigin [18]. However, the aeruginosin structures presented by Mazur-Marzec and coworkers were based on MS 2 data and are incomplete [18]. The identities of the N-terminal moiety and amino acid at position 2 were not resolved in their analyses [18]. Here we show that the complete structure of the main aeruginosin variant is comprised of an N-terminal short fatty acid chain, L-Tyr, L-Choi and L-argininal.
We found 11 structural variants of the aeruginosins in 16 strains from the Baltic Sea and Australia. The aeruginosins produced by N. spumigena contain fatty acids at the N-terminus including acetic acid, butyric acid, hexanoic acid, octanoic acid and decanoic acid. The N-termini of previously reported aeruginosins consists of either hydroxyphenyl lactic acid in Microcystis or phenyl lactic acid in Planktothrix [11]. Aeruginosins may also be modified to contain chlorine, sulfate or sugars [12] [13]. Glycosylation of aeruginosides, members of the aeruginosin family reported from P. agardhii, is catalyzed by the AerI glycosyltransferase [12]. However, the majority of aeruginosins detected here lacked the O-linked pentose despite the presence of the AerI glycosyltransferase in the genome of the producing strain.
It has been anticipated that aeruginosins and spumigins might be related compounds based on their structural similarities [12], [13], [15], [16]. Our results show that these two peptides are assembled on separate peptide synthetases ( Figure 2). The organization of catalytic domains in AerB, AerG and AerM suggests an orthodox model for aeruginosin assembly (Figure 2). N. spumigena CCY9414 lacks the reductive loading mechanism of the Microcystis and Planktothix aeruginosin biosynthetic pathways entirely [12], [13]. The organization of catalytic domains in the aer gene cluster ( Figure 3) and phylogenetic analyses (Figure 4) suggests that the condensation domain of AerB is responsible for lipidation of the aeruginosins. N-terminal condensation domains have been proposed to prime the synthetase with short-chain carboxylic acids in lichenysin [19], daptomycin [20], nostopeptolide [21] and cyanopeptolin [22] [23] biosynthesis. Our results suggest substrate specificities ranging from C 2 to C 10 fatty acid moieties. However, the exact lipidation mechanism remains unknown. The reductase domain of AerM releases the C-terminal arginine as a reactive aldehyde.
Members of the aeruginosin family of natural products commonly have strong inhibitory activity against serine proteases [11], [15]. Serine proteases are involved in a number of important physiological processes, and their importance in the complex blood coagulation cascade is well established [11]. Planktonic bloom-forming cyanobacteria produce a range of protease inhibitors [15]. The function of these peptides is unclear but they   are widely believed to be part of a chemical defense system, acting as a grazing deterrent [24], [25]. Our results show that N. spumigena strains produce a complex cocktail of protease inhibitors comprising up to 1% of the dry weight of the organism, which may explain in part its ecological success.

Strain Growth
Sixteen strains of N. spumigena, 4 strains of N. harveyana and 8 strains of N. sphaerocarpa (Table S3 in File S1) were grown at a photon irradiance of 15 mmol m 22 s 21 in saline Z8 medium lacking a source of combined nitrogen for 21 days [8]. 15 Nlabeling of N. spumigena AV1 was performed in similar way, except that medium was buffered with 10 mM HEPES (pH 8.0). 15 Nurea (98+ % 15 N, ISOTEC, USA) was used as nitrogen source and nitrogen-free argon (with 20.9% O 2 and 0.45% CO 2 ; quality 5.7; AGA Gas Ab, Sweden) was bubbled into the medium to prevent nitrogen fixation from air.

Gene Cluster Annotation
The aer gene cluster was identified on a single 5,462,271 bp scaffold in the genome of N. spumigena CCY9414 (GenBank accession number CM001793) through BLASTp searches using AerD, AerE and AerF proteins. The genes in the aer gene cluster were predicted with Artemis using Glimmer. The starting sites were refined manually. The amino acid sequences of the genes were used to query the non-redundant database at NCBI in order to predict a function for the genes ( Table 2). The substrate specificity of the activated adenylation domains in the NRPS modules was predicted by using the 10 amino acid binding pocket signature [26].

Frequency of aer gene Clusters in N. spumigena Strains
Genomic DNA was extracted from the cultivated strains as previously described [8]. We amplified four genes from the aer gene cluster, aerM, aerB, aerG and aerI, by PCR using oligonucleotide primers designed from the N. spumigena CCY9414 genome sequence (Table S4 in File S1). The PCR reactions were performed in a 20 ml final volume containing 1 ml of DNA, 16 DyNAzyme II PCR buffer, 100 mM of each deoxynucleotide, 0.4 mM of each oligonucleotide primer, and 0.4 units of DyNAzyme II DNA polymerase (Finnzymes, Espoo, Finland). The following protocol was used: 94uC, 3 min; 25 cycles of 94uC, 30 s; 63uC, 30 s; 72uC, 1 min; and 72uC, 10 min. PCR to confirm the deletion of the aerI gene was performed as before but with an annealing temperature of 58uC. PCR products were visualized on 1.5% agarose gels containing 0.56 TAE run at 120 V for 20-25 min and scored for the presence or absence of PCR products of the expected length. The 16S rRNA gene was amplified and sequenced from N. spumigena CH307, P38 and AV45 as previously described [27] and the sequence data was deposited in GenBank (KF360086-KF360088). An alignment of 16 strains of N. spumigena, 8 strains of N. sphaerocarpa and 4 strains of N. harveyana was made using Bioedit. Gaps and ambiguous regions were excluded and a   total of 1340 bp of sequence was considered for phylogenetic analysis. A neighbor-joining tree was constructed using DNADIST and NEIGHBOR as implemented in the PHYLIP package. The tree was midpoint rooted using RETREE. 1000 bootstrap replicates were constructed using SEQBOOT, DNADIST, NEIGHBOR and CONSENSE in the PHYLIP package. The production of aeruginosin, spumigin, nodularin, nodulaopeptin and suomilide was mapped to this tree.

Derivatization
Aeruginosins were derivatized with malondialdehyde (MDA) using 100 ml of methanol extract from N. spumigena AV1 evaporated to dryness in vacuum centrifuge. The resultant residue was dissolved in 100 ml of 12 M H 3 PO 4 and 2.4 ml of 1,1,3,3tetraethoxypropane (Sigma) was added. The sample was evaporated in a vacuum centrifuge after 1 h at room temperature and dissolved in 100 ml of methanol. DNPH derivatives were prepared as previously described [8].

NMR Analysis
Aldehydes were converted to alcohols in the methanol extract using NaBH 4 reduction which made it possible to purify aeruginosin NOL1 by HPLC. One gram of freeze dried AV1 cells was extracted with 70 ml of methanol using a tip homogenizer (SilentCrusher M, Heidolph, Germany) in three 30 sec cycles at ambient temperature with a speed of 16000 rpm 3630 sec. The suspension was centrifuged (10000 g, 5 min) and dichloromethane and water was added to the supernatant in volume ratio of 1:1:1. Phases were separated by centrifugation (5000 g, 5 min). The upper water/methanol phase was collected and vacuum evaporated to dryness. The residue was dissolved in 4 ml of methanol, 50 mg of NaBH 4 was added and after 5 min reaction time the solution was vacuum evaporated to dryness. The residue was dissolved in 1 ml of 15% acetonitrile. 100 ml portions of the solution were injected 10 times into a Luna C8 (2) (106150 mm, 5 mm, 100 Å , Phenomenex) column which was eluted isocratically with 0.1% TFA in 15% acetonitrile. Pooled fractions containing aeruginosin NOL1 were evaporated in a vacuum and dissolved in CD 3 OD for NMR. 1 H and 13 C NMR spectra were obtained with a Varian Unity Inova 600 MHz NMR spectrometer equipped with cryogenically cooled triple-resonance 1 H, 13 C, 15 N probe head and actively shielded z-gradient system. DQF-COSY, TOCSY (120 ms mixing time) experiments were collected using 2048 and 512 data points in F 1 and F 2 dimensions, corresponding to acquisition times of 0.34 and 0.085 s, respectively. The corresponding acquisition times in 13 C HSQC and 13 C HMBC experiments were 0.02 ( 13 C dimension) and 0.17 ( 1 H dimension), and 0.014 ( 13 C dimension) and 0.34 ( 1 H dimension), respectively. The average one-and three-bond 1 H-13 C couplings were estimated to be 140 Hz and 8 Hz, and 1 H-13 C transfer delays for HSQC and HMBC were set to 3.57 and 62.5 ms, respectively. All spectra were collected at 25uC. Spectra were processed and analyzed using VNMRJ 2.1 version B and ACD/ SpecManager version 11.03 software packages.

ATP-pyrophosphate Exchange Assay
The region of the aerB gene encoding the adenylation domain was amplified by PCR from N. spumigena CCY9414 using oligonucleotide primers designed to anneal to the substrateconferring portion of each adenylation domain. Primer design and PCR reactions were performed as described previously [8]. PCR products were digested with NcoI and PmeI restriction enzymes, gel excised and ligated to pFN18A (HaloTagH 7) T7 FlexiH vector (Promega, WI, USA) opened with the same enzymes. Ligation mix was transformed into Escherichia coli (KRX) competent cells following the manufacturer's instructions. Colonies were grown in shaker (160 rpm) at 37uC overnight in 3 ml of LB medium supplemented with 100 mg ml 21 ampicillin. In the following day, 400 ml was used to inoculate 20 ml of TB medium containing 50 mg ml 21 carbenicillin and incubated with shaking at 37uC (160 rpm) for 1.5 h and then induced by the addition of 0.1% of rhamnose and the culture was grown overnight (16-18 h) in shaker (100 rpm) at 24uC. E. coli cells were collected and sonicated as described previously [8]. The expression of soluble protein was observed in 10% SDS PAGE gel. The soluble adenylation domains were purified using HaloTagH Protein Purification System (Promega). Protein concentration of the preparations was measured with the BCA protein assay kit (Pierce). ATPpyrophosphate exchange assay was performed as described previously [9].

Supporting Information
File S1 The Combined Supporting Information File S1 contains detailed data on the discovery and identification of aeruginosins by LC-MS (Figures S1-S4) and NMR ( Figures S5-S7, Table S1), chemical variation of aeruginosins ( Figure S8 and Table S2) and the results of the screening of individual Nodularia strains for various peptides and their biosynthetic genes (Table S3) and the PCR primers used (Table S4). (DOCX)