Metagenomic Analysis of the Sponge Discodermia Reveals the Production of the Cyanobacterial Natural Product Kasumigamide by ‘Entotheonella’

Sponge metagenomes are a useful platform to mine cryptic biosynthetic gene clusters responsible for production of natural products involved in the sponge-microbe association. Since numerous sponge-derived bioactive metabolites are biosynthesized by the symbiotic bacteria, this strategy may concurrently reveal sponge-symbiont produced compounds. Accordingly, a metagenomic analysis of the Japanese marine sponge Discodermia calyx has resulted in the identification of a hybrid type I polyketide synthase-nonribosomal peptide synthetase gene (kas). Bioinformatic analysis of the gene product suggested its involvement in the biosynthesis of kasumigamide, a tetrapeptide originally isolated from freshwater free-living cyanobacterium Microcystis aeruginosa NIES-87. Subsequent investigation of the sponge metabolic profile revealed the presence of kasumigamide in the sponge extract. The kasumigamide producing bacterium was identified as an ‘Entotheonella’ sp. Moreover, an in silico analysis of kas gene homologs uncovered the presence of kas family genes in two additional bacteria from different phyla. The production of kasumigamide by distantly related multiple bacterial strains implicates horizontal gene transfer and raises the potential for a wider distribution across other bacterial groups.


Introduction
Metagenome mining strategies [1] are applicable for the discovery of unknown compounds, particularly polyketides and nonribosomal peptides, from uncultured bacteria. Based on the colinearity rule of domain and module organization of modular polyketide synthase (PKS) and nonribosomal peptide synthetases (NRPS) [2][3], it is possible to predict the chemical structures of products derived from orphan gene clusters comprising PKS and/or NRPS genes. As many polyketides and nonribosomal peptides have been isolated from marine sponges, and there is strong evidence that those natural products are actually produced by their microbial symbionts, sponge metagenomes have become a useful source for identifying the real producers of those natural products. Among the many different sponge symbionts, e.g., Proteobacteria, Actinobacteria, Acidobacteria, Cyanobacteria [4], the 'Enthotheonella' group belonging to the candidate phylum 'Tectomicrobia' has recently been described as a highly prolific bacterial symbiont phylotype capable of producing structurally complex bioactive molecules [5][6]. This as-yet-uncultured group of bacteria has been frequently detected in several marine sponges. In the marine sponges Theonella swinhoei Y and T. swinhoei WA, 'Entotheonella' was identified as the true producer of a number of natural products [7][8]. The Japanese marine sponge Discodermia calyx (Fig 1A) [9] is also known as a rich source of bioactive compounds, such as calyculins and calyxamides [10][11][12]. Through extensive screening of the metagenomic library of D. calyx, we recently identified the calyculin biosynthetic gene cluster and linked this cluster to the symbiotic bacterium, 'Entotheonella' sp. by employing single cell analysis [13].
Furthermore, we employed degenerate PCR to target gene within the D. calyx metagenome that encode ketosynthase (KS) domains [14][15][16][17][18] and found a range of distinct KS domain sequences. Of these, several were associated with trans-acyl transferase (AT) type KS domains and characterized as components of the calyculin biosynthetic gene cluster. Other KS domains, however, could not be attributed to a previously characterized biosynthetic pathway. Therefore, to search for new biosynthetic gene clusters and their products, extensive characterization of the diverse KS domains present in the D. calyx metagenomic library is warranted. Here, we present such a study, targeting KS domains affiliated with the cis-AT KS group and subsequently characterizing the production of the cyanobacterial natural product kasumigamide by a sponge symbiont.

Results
Annotation and predicted product of kas gene cluster Screening of the D. calyx fosmid library using specific primers for the cis-AT KS domain resulted in the identification of two clones pDCYN1 and pDCYN2. Sequencing of the fosmid clones revealed the existence of a hybrid PKS-NRPS biosynthetic gene cluster (kasA-I,~37 kb, LC160290) ( Table 1, Fig 2), composed of 9 open reading frames (ORFs), with kasA-C forming a PKS-NRPS core. KasA is comprised of an adenylation (A) domain (KasA-A1), a ketoreductase (KR) domain (KasA-KR), and a peptidyl carrier protein (PCP) domain. KasB consists of three modules (modules 2-4): the first two modules encode NRPSs, and the latter encodes a PKS. The two following NRPS modules (modules 5-6) are encoded in kasC. The substrates for the five A domains (KasA-A1, KasB-A1, A2, KasC-A1, and A2) were predicted from the NRPS codes [19] with the exception of KasA-A1, which was unmatched to known data, but annotated to the A domain recruiting phenyl pyruvic acid (PP) based on the close resemblance to AerA of aeruginosin biosynthetic gene cluster (S2 Table) [20]. Collectively, the chemical structure of the PKS and NRPS hybrid compound was predicted to be kasumigamide, which had previously been isolated from the freshwater cyanobacterium Microcystis aeruginosa NIES-87 (Fig 1B) [21]. However there have been no reports of the isolation of this compound from marine sponges.

Isolation of kasumigamide from marine sponge D. calyx
In order to search for kasumigamide and related compounds, the MeOH extract of the frozen sponge D. calyx was fractionated by column chromatography, based on LC-MS monitoring, to yield the putative kas gene product 1.
The molecular formula of 1 was established by ESI-TOFMS to be C 40 H 50 N 8 O 9 [m/z 809.3801 (M+Na) + for C 40 H 50 N 8 NaO 9 ], which is consistent with that of kasumigamide (S1 Fig). Furthermore, 1D and 2D NMR data of 1, including 1 H-1 H COSY, HMQC, HMBC, and NOESY spectra in DMSO-d 6 +TFA, suggested the presence of β-Ala and Arg (S3 Table) [21]. The four aromatic protons (δ H = 6.91, 7.00, 7.28, and 7.47) and an exchangeable proton (δ H = 10.74) coupled to another aromatic proton (δ H = 7.00) were consistent with an indole ring, supported by the UV absorption maxima at 280 nm. In addition, the other ten aromatic protons at δ H = 7.10-7.23 overlapped, which suggested the presence of two mono-substituted benzene rings. The HMBC correlation between aromatic protons (δ H = 7.23) and an oxymethine carbon (δ C 75.9) indicated the presence of 3-phenylserine. In contrast, other aromatic protons (δ H = 7.16) showed HMBC correlation with a methylene carbon (δ C 41.2), which further correlated with an oxymethine proton at δ H = 4.00, supporting the presence of phenyl lactic acid. The amino acid sequence of 1 was confirmed by NOESY and HMBC correlations. The ODS HPLC analysis of L-FDAA derivatives of the hydrolysate of 1 revealed the presence D-Arg and D-erythro-3-phenylserine (D-erythro-PS). Therefore, the gross structure was concluded to be 1, which is coincident with the predicted product of the PKS and NRPS hybrid gene cluster, kasA-I.

Kasumigamide producer in the marine sponge
To identify the true producer of kasumigamide in D. calyx, we used the laser microdissection (LMD) method to isolate the symbiont cells for PCR analysis. As the candidates, two types of cells designated as "F" and "S" filamentous morphologies (Fig 3A) were isolated from the sponge material. Of these two, only cells with the "F" morphology ( Fig 3A) were returned positive when the genomes were amplified using the kas specific primer pair (Fig 3B). This filamentous bacterium was previously reported to be 'Entotheonella' sp. as the producer of calyculins in D. calyx (Fig 3A) [13].   Table), using dissected cells ("F" or "S") as templates. doi:10.1371/journal.pone.0164468.g003

Kasumigamide biosynthetic gene cluster in cyanobacterium
Kasumigamide was first identified in the free-living cyanobacterium M. aeruginosa NIES-87 [21]; however, its biosynthetic pathway in this organism was unknown. To compare the corresponding gene clusters between the two different bacterial species, we set out to identify a kasumigamide biosynthetic gene cluster in M. aeruginosa NIES-87. First, the ability of M. aeruginosa NIES-87 to produce kasumigamide was confirmed by LC-MS analysis (S7 Fig).
One of the M. aeruginosa metabolites showed the same retention time and molecular mass as those of kasumigamide isolated from D. calyx. These results corroborated the previous report of kasumigamide production in M. aeruginosa NIES-87 [21]. The genome was then sequenced and assembled (see Materials and Methods). A homology-based search revealed the existence of the PKS-NRPS biosynthetic gene cluster, named makasA-D (S5 Table, The makasA-C genes encode five NRPS modules and one PKS module (Fig 4D).

Substrate specificity of the adenylation domains
In order to confirm the predicted substrates for the A domains in MakasA-C, an in vitro analysis was performed using the Biomol Green assay strategy [22][23].  5). In addition, MakasC-A2 showed higher selectivity to DL-threo-3-phenylserine (DL-threo-PS) than L-Phe, in accord to the substrate binding prediction (S2 Table). However, we have so far

Kasumigamide biosynthetic gene cluster in phylogenetically diverse bacteria
The above studies revealed that kasumigamide is produced by two phylogenetically distant bacterial species, 'Entotheonella' sp. and M. aeruginosa NIES-87. This observation suggests a horizontal transfer of kasumigamide biosynthetic gene cluster can occur between different bacterial species. Therefore, we searched for additional kas biosynthetic gene clusters using BLASTP, with KasA-D as a query. As a result, two candidate clusters were found in the β-proteobacteria, Delftia acidovorans CCUG 274B (WP_016445283-WP_016445295) and Herbaspirillum sp. CF444 (EJL94052-EJL94061), and the putative kasumigamide biosynthetic gene clusters were   [24]. The GC contents of kasA-I, makasA-D, dakasA-J, and hkasA-H are 57.5%, 52.9%, 70.6%, and 64.7% respectively. The domain organizations of PKS-NRPS modules of dakasA-J and hkasA-H were identical to those of kasA-C (Fig 4A-4C). The A domain binding sites exhibit 77~100% identity to the corresponding domains of KasA-C and/or MakasA-C (S2 Table). Finally, phylogenetic analysis of the KS domains (S10 Fig) shows their close relationships to each other despite that they are originated from phylogenetically distinct bacteria (S11 Fig).

Discussion
The strategy for isolating kasumigamide from D. calyx exemplifies the effectiveness of the metagenome mining approach in identifying cryptic biosynthetic gene clusters as well as their products. The involvement of the kas cluster in kasumigamide biosynthesis was predicted based on its domain organization (Fig 2). Whereas the NRPS code of KasA-A1 did not provide any information about its natural substrate (S2 Table), KasA-KR resembles the KR domain of AerA (~41% identity), which reportedly reduces PP to generate D-phenyl lactic acid [20].
KasD was annotated as a β-hydroxylase, according to its similarity to CmlA [25] (~39% identity). CmlA catalyzes β-hydroxylation of L-p-aminophenylalanine (L-PAPA) to L-p-aminophenylserine (L-PAPS), and the metal ion binding site of CmlA are also well conserved in KasD (S12 Fig). Since L-Phe is structurally similar to L-PAPA, KasD was proposed to catalyze the β-hydroxylation of L-Phe, to generate L-threo-3-phenylserine (L-threo-PS). The epimerase (E) domain, encoded in module 6, was predicted to epimerize L-threo-PS to D-erythro-PS. The kasumigamide producer in D. calyx was identified as an 'Entotheonella' sp. by a single cell analysis in conjunction with PCR. This bacterial phylotype has been reported as the symbiotic producer of not only secondary metabolites in T. swinhoei [7], but also calyculin A in D. calyx [13], highlighting the fact that 'Entotheonella' is responsible for the production of multiple natural products in Theonellidae sponges. In addition, four kasumigamide gene clusters were detected in very different bacterial species, namely 'Entotheonella' sp. (a marine sponge symbiont), the free-living cyanobacterium M. aruginosa NIES-87, the human oral bacterium D. acidovorans CCUG 274B, and Herbaspirillum sp. CF444 from the endosphere of the tree Populus deltoids (Fig 4). Although we did not confirm the production of kasumigamide in the latter two bacterial species, the phylogenetic tree analysis of KS domains illuminate close relationships among the kas-related gene clusters (S10 Fig).
On the other hand, some peculiar features of the domain or module organization can be found in the kas family gene clusters. One of the remarkable points is the shift of PKS module in the M. aeruginosa NIES-87 kas-cluster (makasA-D), while the positions of the PKS module in the other kas genes were located on module 4, in agreement with the order of the biosynthetic reactions. To rule out the possibility that this inconsistency is due to misassembly, the contiguous domain organization between modules 5 and 6 in makasC was confirmed by cloning of the corresponding region. To obtain further evidence for the biosynthetic mechanism of makasA-D, we conducted substrate specificity assays with all five A domains, which have the substrate binding sites closely related to their counterpart domains in KasA-C with 77%~89% identity (S2 Table). Four of them accepted the substrates expected from the amino acid sequences of their substrate binding sites. Although the putative substrate of MakasC-A2 was L-Phe, this A domain exhibited specificity for DL-threo-PS. Considering the fact that D-erythro-PS is the C-terminal residue of kasumigamide, the biosynthetic mechanism of this step was proposed, as follows. First, L-Phe is hydroxylated by MakasD to generate L-threo-PS. Subsequently, MakasC-A2 loads the L-threo-PS onto the PCP domain. Finally, L-threo-PS is epimerized to D-erythro-PS by MakasC-E1, which is encoded between module 4 and 5 (S13 Fig) [26][27].
Other unusual features of the kas family genes are the absence of a thioesterase (TE) domain and the likely termination of the chain by a condensation (C) domain. Since some C domains have been reported to function as a TE domain [28], we expect that the C-terminal C domain encoded in kasC, dakasC, and hkasC can serve as a thioesterase. However, since the C-terminal C domain is missing in makasA-D, the release mechanism of the makas pathway remains unclear.
The presence of putative kasumigamide biosynthetic gene clusters among different kinds of bacteria living in various environments implies horizontal gene transfer between different bacterial species. The pair of long terminal repeats flanking the kas gene was annotated as putative transposases (Table 1, S14 Fig), suggesting the role of transposons in interspecies transfer of kas gene clusters. Sponge-associated bacteria reportedly contain high numbers of transposable insertion elements, expected to take part in the evolution of symbiont bacteria genomes [29]. Examples of hypothetical horizontal transmission have been suggested for gene clusters encoding synthesis of actin-binding macrolides, such as luminolide, tolytoxin, and misakinolide [8]. Although macrolide compounds are produced by different bacterial species, including 'Entotheonella serta' (which is associated with the marine sponge T. swinhoei WA), the gene clusters encoding some transposases exhibit high relationship between the corresponding PKS domains [8]. The makas gene is also flanked by two putative transposases, ORFM1 and ORFM2, which are widely conserved in several M. aeruginosa strains. Notably, makasA-D sequences were only found in M. aeruginosa NIES-87 strain (S8C Fig) among sixteen different strains whose genomes are available in National Center for Biotechnology Information (NCBI) data bank. As in the case for M. aeruginosa strains, only one of four sequenced D. acidovorans strains contains the kas family genes, though we could not observe a putative transposase region in the kas-related genes encoded in the other two kinds of bacterial species, the β-proteobacteria, D. acidovorans CCUG 274B and Herbaspirillum sp. CF444.
It is known that the same or similar secondary metabolites were identified from different kinds of bacteria, even across phyla. Lyngbyatoxins, which are potent skin irritants, were originally isolated from the marine cyanobacterium Moorea producens (formerly Lyngbya majuscule) [30][31]. On the other hand, structurally and pharmacologically related compounds, teleocidin and olivoretin, were isolated from the marine Streptomyces spp [32][33][34][35]. Saxitoxin, which is produced by marine dinoflagellates, is also made by some freshwater cyanobacteria [36]. The gene clusters were assembled independently in the distantly related bacteria [37]. Thus, the study presented here suggests that other bacteria may also have the ability to produce kasumigamide. Although the biological activity of kasumigamide had been reported to be antialgae against Chlamydomonas neglecta NIES-439 [21], further investigations are required to decipher its advantageous role in the survival competition among taxonomically distant bacterial species.

Specimen collection
The marine sponge D. calyx was collected by hand, at a depth of 10 m, during scuba diving at Shikine-jima Island, Tokyo, on May 18, 2011. The specimens were kept frozen at −30°C and used for the construction of the clone library and isolation of kasumigamide. The single cell isolation was performed with the specimen collected at a depth of 10 m in the ocean near Nakagi, Shizuoka, Japan in December 4, 2013. Samples were transported to the laboratory (4 h) in a cooling box and immediately processed for single cell analysis. Permits and approval for the collections at Shikine-jima and Nakagi were obtained from the local governments. Both specimens contained calyculins, calyxamides as well as kasumigamide, which was confirmed by LC-MS analysis.

Fosmid clone library screening
The D. calyx metagenomic DNA fosmid library is composed of 250,000 colonies, each contains approximately 40 kb of insert DNA, as previously reported [38]. We focused on the DCKS10, which shares homology with the cis-AT-type-KS domains (the KS region of JamM [39], 59% identity), among 19 different partial sequences of KS domains obtained in our previous PCR analysis with KS degenerate primers from D. calyx metagenomic DNA [10]. The following primer pair was used: DCKS10F/DCKS10R (S1 Table) for screening positive fosmids. The library was screened by the Piel pooling strategy [40]. The screening yielded two positive fosmids, pDCYN1-2.

Genome sequencing and assembly
The sequencing of pDCYN1-2 was performed on an Ion PGM sequencer (Life Technologies), with a total number of 48,198 sequence reads (~300 bp). These sequences were assembled using the de novo assembler MIRA (v3.4.2.0) [41], and 17 contigs were obtained. Further assembly to produce larger contigs was achieved with the Genious assembler (Biomatters), with the default medium sensitivity. This assembly provided a contig of 40 kb. Putative protein-coding sequences (CDSs) were determined by a combination of FramePlot [42] and the Glimmer 3.02 [43]. The domain organizations were assessed by BLASTP and PKS/NRPS analysis [44].

Isolation and structure elucidation of kasumigamide
To isolate the putative kasA-I product from the frozen sponge, D. calyx (2.0 kg, wet weight), the methanol extract was partitioned between hexane and H 2 O. The aqueous layer was then partitioned between EtOAc and H 2 O. The aqueous layer was further partitioned between n-BuOH and H 2 O. The n-BuOH-soluble material was fractionated by gel-filtration column chromatography (Sephadex LH-20; 2.5 × 75 cm) with MeOH. The fraction was further separated by HPLC (Cosmosil MS-II C18; 10 × 250 mm, Nacalai Tesque, flow rate 4.0 mL/min; 0−100% CH 3 CN/H 2 O over 30 min; UV detection at 280 nm) to obtain pure kasumigamide (2.3 mg). The LC-MS data for monitoring were obtained from an Agilent 1100 series HPLC-micro TOF mass spectrometer (Bruker Daltonics), using electrospray ionization with a Cosmosil 5C 18 MS-II column (2.0 × 75 mm), 5-100% MeOH/H 2 O in 0.1% AcOH over 20 min, 0.2 mL/min, positive ESI mode. To monitor an (M + H) + ion peak of kasumigamide at m/z 787.38, the mass range between m/z 787.3 and 787.5 was selected for the extracted ion chromatogram.

PCR analysis of dissected filamentous bacterial cells
Aliquots of the calcium/magnesium-free artificial seawater suspension of minced sponge tissues were spread onto Membrane slides (PEN-Membrane 2.0 μm, Leica), dehydrated by sequential incubations in 50%, 70%, and 90% aqueous ethanol for 3 min at each step, and air dried. Two or ten portions of autofluorescent cells and single filament or five filaments of filamentous bacteria ('Entotheonella' sp.) were directly isolated into a PCR tube by laser microdissection (Leica LMD7000). As other bacteria in Fig 3B, the membrane area containing other bacterial cells, except for filamentous bacterial cells, was concurrently dissected. The template DNA for the dissection PCR was adjusted according to a previously published procedure [13]. The PCR was performed in a volume of 10 μl, containing 1.75 mM MgCl 2 , 0.4 μM of each primers, 0.3 mM dNTPs and 0.25 U of KAPATaq Extra DNA polymerase (Nippon Genetics). The specific primers DCKS10F/DCKS10R were used for detection of the kasumigamide biosynthetic gene cluster.

Metabolic and genomic analysis of M. aeruginosa NIES-87
M. aeruginosa NIES-87 was obtained from the National Institute for Environmental Studies collection, and cultured in MA medium [45]. The methanol extract (9.6 mg) of freeze-dried bacteria was analyzed on LC-MS, as described above. The M. aeruginosa NIES-87 genomic DNA was isolated according to the previously published method [46]. Sequencing of the genomic DNA was performed by an Ion PGM sequencer, with a total number of 414,749 sequence reads (~300 bp). These sequences were assembled de novo into contigs by using the Genious assembler, with the default medium sensitivity. Putative CDSs were determined by combining the prediction results from the FramePlot [42] and the Glimmer 3.02 [43] programs into one large contig (26 kb). The domain organizations were assessed by BLASTP and PKS/NRPS analysis [44]. In order to confirm the DNA sequence of makasC, the cloning of the corresponding region was performed with the primer pair, MA5F/MA6R (S1 Table). The PCR products amplified from the M. aeruginosa NIES-87 genomic DNA were introduced into pT7Blue Tvector (Novagen). The constructed plasmid pMAYN1 was subjected to the sequence analysis by Eurofins Genomics K. K.

Functional analysis of adenylation domains
Each purified protein (1 μM) was incubated with 1 mM substrate in 50 μl of buffer, containing 32 mM hydroxylamine, 1 mM dithiothreitol, 0.4 U/ml pyrophosphatase (Sigma), 0.5 mM ATP, 10 mM MgCl 2 , and 50 mM Tris-HCl buffer (pH7.5). The reaction was incubated for 10 min at room temperature, and then quenched by adding 50 μl of the working reagent from the malachite green phosphate assay kit (Enzo). After a 10 min incubation at room temperature, the absorption at 620 nm (A 620 ) was measured. The control A 620 value was subtracted from the A 620 value of the reaction mixture, and then the relative adenylation activity was calculated.