Complete genome sequence of the deep South China Sea-derived Streptomyces niveus SCSIO 3406, the producer of cytotoxic and antibacterial marfuraquinocins

Streptomyces niveus SCSIO 3406 was isolated from a sediment sample collected from South China Sea at a depth of 3536 m. Four new sesquiterpenoid naphthoquinones, marfuraquinocins A-D, and two new geranylated phenazines, i. e. phenaziterpenes A and B, were isolated from the fermentation broth of the strain. Here, we present its genome sequence, which contains 7,990,492 bp with a G+C content of 70.46% and harbors 7088 protein-encoding genes. The genome sequence analysis revealed the presence of a 28,787 bp gene cluster encoding for 24 open reading frames including 1,3,6,8-tetrahydroxynaphthalene synthase and monooxygenase, seven phenazine biosynthesis proteins, two prenyltransferases and a squalene-hopene cyclase. These genes are known to be necessary for the biosynthesis of both marfuraquinocins and phenaziterpenes. Outside the gene cluster (and scattered around the genome), there are seven genes belonging to the methylerythritol phosphate pathway for the biosynthesis of the essential primary metabolite, isopentenyl diphosphate, as well as six geranyl diphosphate/farnesyl diphosphate synthase genes. The strain S. niveus SCSIO 3406 showed type I PKS, type III PKS and nonribosomal peptide synthetase cluster. The sequence will provide the genetic basis for better understanding of biosynthesis mechanism of the above mentioned six compounds and for the construction of improved strain for the industrial production of antimicrobial agents.


Introduction
Deep-sea Streptomyces are widely recognized as an emerging source of novel and bioactive secondary metabolites [1]. They have been phylogenetically classified in 13 groups (MAR1--MAR13) [2]. The MAR4 group is a rich source of polyketide-terpenoid secondary metabolites, such as marinone, azamerone, and napyradiomycins [3][4][5]. As a member of the MAR4 group, Streptomyces sp. CNQ-509 could produce two polyketide-terpenoids (naphterpin and debromomarinone), five new farnesyl-α-nitropyrroles nitropyrrolins A-E and O-prenylated phenazines marinophenazine A-B [6]. Here, S. niveus SCSIO 3406 was isolated from a South China Sea sediment sample collected at a depth of 3536 m. Four new sesquiterpenoid naphthoquinones, marfuraquinocins A-D (1-4) (Fig 1), which exhibited antibacterial activities against Staphylococcus aureus ATCC 29213 or methicillin-resistant Staphylococcus epidermidis (MRSE) were previously isolated from the strain SCSIO 3406 [7]. Additionally, two new geranylated phenazines, phenaziterpenes A and B (5-6) (Fig 1), were also isolated from this strain in spite of low production [7]. In order to gain insights about the genetic basis of the above six compounds and about the discovery of further new natural products, the genome of S. niveus SCSIO 3406 was sequenced.

Materials and methods
S. niveus SCSIO 3406 was cultivated in trypticase soy broth and grown for 2 days at 28˚C with 200-rpm aeration. High-molecular-weight DNA was prepared using standard genomic DNA isolation method [8].
Genomic DNA was then used to generate two libraries, one is paired-end (PE) library with insert sizes of 300~500 bp and the other is a SMRTbell TM template library with insert sizes of about 8~10 kb. The complete genome of S. niveus SCSIO 3406 was subsequently sequenced using a combination of PacBio RSII sequencing (Pacific Biosciences) and Illumina Hiseq 2500 technologies at Biozeron Biotech Co., LTD (Shanghai, China). Genome assembly was de novo performed with SOAPdenovo v2.04 [9] and Celera Assembler 8.0 [10]. Putative protein-coding sequences were predicted by Glimmer 3.02 [11]. Gene functional annotation was performed using BlASTP with Nr, String, COG and KEGG databases. rRNA, tRNA were predicted using RNAmmer v1.2 and NCBI Prokaryotic Genome Annotation Pipeline respectively. Protein coding genes were analyzed for COG functional annotation using WebMGA server [12]. CRISPRFinder, freely accessible at http://crispr.i2bc.paris-saclay.fr is used to find clustered regularly interspaced short palindromic repeats (CRISPRs) in S. niveus SCSIO 3406 genome. Genes involved in secondary metabolic pathways were predicted using antiSMASH 2.0 (http://antismash.secondarymetabolites.org/) [13].

PLOS ONE
We identified gene orf01170 coding for type III polyketide synthase and orf01166 for 1,3,6,8-tetrahydroxynaphthalene (THN) monooxygenase, responsible for the formation and modification of THN successively [17] to generate THN moiety of marfuraquinocins A-D. In many cases, the encoding genes responsible for the antibiotic biosynthesis are clustered in a continuous genomic DNA region, and usually in association with one or more genes that regulate their transcription and with resistance genes [18]. Therefore, the other biosynthetic genes for marfuraquinocins A-D are expected to exist in the vicinity of orf01170 and orf01166. By bioinformatic analysis of the genes located upstream and downstream of orf01170 and orf01166, a~28.7 kb continuous DNA segment encoding for 24 open reading frames (ORFs) was predicted to contain the putative gene cluster of marfuraquinocins A-D. The function of the individual ORFs was deduced by BLAST analysis. The results are summarized in Table 3 and Fig 3. The identified DNA segment also included a squalene-hopene cyclase gene (orf01153), which is predicted to catalyze the cyclization of sesquiterpene to generate one of the parent skeletons of marfuraquinocins A-D. In addition, we found that, as shown in Fig 3 and Table 3, this continuous DNA segment also contained orthologues of most genes required for phenazine biosynthesis in two discontinuous loci (orf01155-01158 and orf01161-orf01163). Although the orthologue of phzF necessary for phenazine biosynthesis is absent in this DNA segment, two phzF orthologues (orf05123 and orf05140) were identified outside of the gene cluster.
Two prenyltransferase genes (orf01154 and orf01164) were identified in this DNA segment. Prenyltransferases are a class of enzymes that transfer allylic prenyl groups to acceptor molecules [19]. Therefore, orf01154 and orf01164 were predicted to be responsible for the condensation reaction between the sesquiterpene moiety and the THN moiety to form the backbone of marfuraquinocins A-D and/or the condensation reaction between the monoterpene moiety and the phenazine moiety to form the backbone of phenaziterpenes A-B. Based on above all, we speculate that the continuous DNA segment of~28.7 kb was involved in the biosynthetic pathway of both marfuraquinocins A-D and phenaziterpenes A-B.
Outside the gene cluster, we also identified gene orf00932 encoding isopentenyl pyrophosphate isomerase (IPP isomerase). In particular, six geranyl diphosphate (GDP) synthase and farnesyl diphosphate (FDP) synthase genes (orf00192, orf00915, orf01258, orf02165, orf04121 and orf06133) are located outside the~28.7 kb gene cluster. GDP synthase and FDP synthase catalyze the addition of one and two molecules of IPP to DMAPP, yielding GDP and FDP, respectively [20]. Therefore, these six GDP/FDP synthases are proposed to be responsible for the formation of terpene core moieties of both marfuraquinocins A-D and phenaziterpenes A-B. It is obvious that the genes of the biosynthesis of marfuraquinocins A-D and phenaziterpenes A-B are not clustered at a single locus of the genome.  The major interest of Streptomyces is its potential to produce diverse secondary metabolites with biological activities. Here, analysis using antiSMASH showed 27 other gene clusters in the genome of S. niveus SCSIO 3406 (Table 4). Of these gene clusters, some have the really low similarity with the known clusters, revealing the potential of S. niveus SCSIO 3406 to produce novel natural products. In the putative T1pks-oligosaccharide gene cluster (Table 4, clus-ter_21), about 32% gene coding products showed similarity with the homologues of the known biosynthetic gene cluster of angucycline antibiotic, grincamycin, in Streptomyces lusitanus SCSIO LR32 [21]. We also identify an ectoine gene cluster (Table 4, cluster_23) which  consists of hydroxylase (Orf05563), L-ectoine synthase (Orf05564), diaminobutyrate-2-oxoglutarate transaminase (Orf05565) and L-2,4-diaminobutyric acid acetyltransferase (Orf05566) in S. niveus SCSIO 3406; these gene coding products shows 75% similarity with the homologues of the known ectione biosynthetic cluster (BGC0000853_c1). As one kind of compatible solute, ectoine can be used for protecting enzymes, membranes and whole cells against stresses [22]. CRISPR (Clustered regularly interspaced short palindromic repeat) acronym was proposed by Jansen et al. [23]. CRISPR was observed first in 1987 in Escherichia coli [24] and were subsequently reported in a wide range of prokaryotic genomes. CRISPR associated proteins (Cas) use the CRISPR spacers to recognize and cut foreign genetic elements [25]. Therefore, the CRISPR/Cas system is a prokaryotic immune system [26]. Here, 22 CRISPRs candidates spreading over S. niveus SCSIO 3406 genome, including 6 confirmed CRISPRs and 16 questionable CRIPSRs [27], were identified in S. niveus SCSIO 3406 genome via CRISPRFinder, well above the average level in Streptomyces whose genome sequences have been published ( Table 5). The CRISPR sequence contains 168 spacer sequences in size from 19 to 99 bp. The number of spacers in each locus varies from 1 to 78. Only two of the spacer sequences (5'-g cgcgacggacgcgccgccggtgagcacgcgcaggg-3' and 5'-gtcctcggtccgttc gtcctgcgcgatctccag-3'), namely, protospacers match any sequences in the public sequence databases. The rest of the spacers remain the CRISPR "dark matter". We also identified ten genes (orf03151, orf05316, orf06507-orf06514) coding for CRISPR-associated proteins. CRISPR loci together with cas (CRISPR-associated) genes form the powerful immune system for S. niveus SCSIO 3406. Interestingly, several antibiotic resistance genes were identified in S. niveus SCSIO 3406 genome. The gene orf04161 encodes penicillin amidase catalyzing the hydrolysis of benzylpenicillin [28], that efficiently accounts for the fact S. niveus SCSIO 3406 can survive on plates which contained 100μg/ml penicillin; the gene orf00505 encodes erythromycin esterase hydrolyzing the lactone ring of the 14 membered macrolides erythromycin and oleandomycin [29]; the gene orf02495 encodes virginiamycin B lyase inactivating the type B streptogramin antibiotics by linearizing the lactone ring at the ester linkage [30].
In summary, we have completely sequenced the genome of the deep South China Seaderived S. niveus SCSIO 3406. By bioinformatic analysis, we have identified a biosynthetic gene cluster for both the marfuraquinocins A-D and phenaziterpenes A-B. The identified gene cluster provides important genetic basis for better understanding of biosynthesis mechanism of the marfuraquinocins A-D and phenaziterpenes A-B, as well as the construction of improved strain for the industrial production. Notably, 27 other gene clusters were also predicted in the genome of S. niveus SCSIO 3406. More importantly, some of these clusters have the really low similarity with the known clusters, strongly suggesting the potential of S. niveus SCSIO 3406 to produce diversity of novel natural products. This sequence information paves the way for the genome mining of S. niveus SCSIO 3406 for the novel natural product discovery.