Strain-Level Diversity of Secondary Metabolism in Streptomyces albus

Streptomyces spp. are robust producers of medicinally-, industrially- and agriculturally-important small molecules. Increased resistance to antibacterial agents and the lack of new antibiotics in the pipeline have led to a renaissance in natural product discovery. This endeavor has benefited from inexpensive high quality DNA sequencing technology, which has generated more than 140 genome sequences for taxonomic type strains and environmental Streptomyces spp. isolates. Many of the sequenced streptomycetes belong to the same species. For instance, Streptomyces albus has been isolated from diverse environmental niches and seven strains have been sequenced, consequently this species has been sequenced more than any other streptomycete, allowing valuable analyses of strain-level diversity in secondary metabolism. Bioinformatics analyses identified a total of 48 unique biosynthetic gene clusters harboured by Streptomyces albus strains. Eighteen of these gene clusters specify the core secondary metabolome of the species. Fourteen of the gene clusters are contained by one or more strain and are considered auxiliary, while 16 of the gene clusters encode the production of putative strain-specific secondary metabolites. Analysis of Streptomyces albus strains suggests that each strain of a Streptomyces species likely harbours at least one strain-specific biosynthetic gene cluster. Importantly, this implies that deep sequencing of a species will not exhaust gene cluster diversity and will continue to yield novelty.


Introduction
More than two-thirds of all therapeutic small molecules used in medicine are derived or inspired from complex natural products produced by filamentous actinobacteria, most notably Streptomyces spp.[1].Streptomyces spp.are predominantly known as filamentous soil bacteria that have a differentiating mycelial life-cycle, which begins with spore germination and outgrowth of a vegetative mycelium and ends with production of reproductive aerial hyphae and the formation of unigenomic spores [2].Aerial hyphae production and sporulation is often accompanied by the production of secondary metabolites.These secondary metabolites are most likely used to outcompete neighbouring organisms [3].Biotechnology has exploited many of these natural products as anticancer, antiviral, insecticidal, herbicidal, antibacterial, antifungal and immunosuppressive compounds [4].
Growing global concerns about resistance to antibacterial agents has led to a renaissance in bioprospecting and natural product discovery.The resurgence of interest in natural products is greatly aided by the relatively inexpensive cost to sequence genomes of strains that produce promising bioactive small molecules.One-hundred and forty-two streptomycete genomes are available in DDBJ/EMBL/Genbank.This dataset has made it abundantly clear that Streptomyces spp.only express a mere fraction of their biosynthetic genes under standard laboratory growth conditions.Activation of silent biosynthetic gene clusters and characterisation of their products represents a major potential source for new lead compounds for industry and is an area in which synthetic biology holds huge promise [5].
In order to capitalise on available genomic resources, systematic analyses of secondary metabolism are required.Doroghazi and Metcalf provided the first comparative analysis of secondary metabolism in organisms with closed genomes from the phylum Actinobacteria, which included eight Streptomyces species and revealed, for good reason, why this taxa has been the focus of rigorous genomic and biochemical analyses over the years [6].Recently, Ziemert et al. performed a focused analysis of the secondary metabolism in 75 sequenced Salinispora species identified a total of 124 biosynthetic pathways encoded by the genus and provided insight into populationlevel genetic exchange of biosynthetic pathways in marine environments [7].Doroghazi et al. recently developed a method for classification of gene clusters into families and used this approach to analyse the biosynthetic potential of 830 sequenced Actinobacteria, which they found to contain a total of 11,422 gene clusters comprising 4,122 gene cluster families [8].More analyses of these type will be required in order to drive the fields of natural product discovery and synthetic biology forward and maximise the promise held by genome mining actinomycetes.
Streptomyces albus, which is one of the most widely geographically distributed streptomycetes and has been isolated from diverse environments including sponges, sea sediments and insects [9][10][11][12][13][14].The archetype member of this species is S. albus J1074 which is a derivative of S. albus G in which the salI restriction system was deleted to better enable transformation [15].S. albus J1074 has therefore been used as a host for heterologous expression of several natural product gene clusters, including cyclooctatin [16], fredericamycin [17], iso-migrastatin [18], moenomycin [19], napyradiomycin [20], steffimycin [21] and thiocoraline [22] and there has recently been renewed interest in further developing this expression platform because of its fastidious growth and naturally minimised genome [23].The clear ability of S. albus J1074 to heterologously biosynthesise diverse and important natural products suggests strains of S. albus may encode important natural product gene clusters of their own, a question which genomics and genome mining is only just now beginning to address.As more researchers sequence closely related strains it makes necessary an understanding of strain-level diversity in secondary metabolism.With this view in mind, here I report a strain-level analysis of secondary metabolism for six sequenced S. albus strains.A total of 48 biosynthetic gene clusters were identified and approximately 18 specify the core secondary metabolome of S. albus, 14 are auxiliary gene clusters and 16 are strain-specific, indicating there is still appreciable chemical diversity to be discovered at the strain level.

Results and Discussion
A multilocus phylogeny of Streptomyces spp.reveals significant redundancy in sequenced organisms Many of the 142 genome sequences available for Streptomyces spp.originate from so-called environmental isolates and their taxonomic classification remains enigmatic.A multilocus phylogeny was reconstructed in order to infer a taxonomic relationship among sequenced Streptomyces spp.and assess redundancy in the genomic database.Multiple loci were used to infer phylogenetic relationships because of well recognised problems with the use of solely the 16S rRNA gene as a phylogenetic marker, as it only provides an accurate and reliable classification to the genus level of streptomycetes [24] likely due to extensive recombination in the evolutionary past [25].The loci selected for this study were those employed by previous multilocus phylogenies of streptomycetes: 16S rDNA, aptD (ATP synthase), gyrA (DNA gyrase subunit A), recA (recombination protein), rpoB (RNA pol subunit) and trpB (tryptophan biosynthesis) [26,27].16S rDNA sequences could not be identified in some draft genome sequences.This is presumably a result of an inadequacy with DNA assembly software to process the multiple copies (five to seven copies) of the ribosomal RNA locus streptomycetes are known to harbour.The partial 16S rDNA sequences (variable region IV) that were retrieved had a maximum pairwise divergence of *5% over 292 nt (determined by blast analysis).With the motivation to include as many genome sequences in this analysis as possible, the decision was therefore made to exclude the 16S rRNA gene as a phylogenetic marker for this study.Partial DNA sequences for atpD, gyrA, recA, rpoB and trpB, corresponding to regions targeted by well established oligonucleotide primer sequences employed in phylogenetic analyses [26,27] were retrieved from Genbank (see methods).Due to the poor quality of some of the genome sequences and/or the absence of some of these genes entirely, *14% (20 genomes) were excluded from this analysis.Redundant genomes for type-strains were also excluded, namely S. bottropensis ATCC 25435 ([Genbank:AOCF00000000]), S. clavuligerus ATCC 27064 ([Genbank:ADGD00000000]) and S. albus J1074 ([Genbank:ABYC00000000]).
An approximately maximum-likelihood phylogenetic tree based on concatenated aptD-gyrB-recA-rpoB-trpB gene fragments (2566 nt in total) was constructed (Fig. 1).Overall, there was good separation and statistical support for most of the branches in the tree.Interestingly, the tree suggested that many Streptomyces species have been sequenced more than once.To further analyse this, the concatenated aptD-gyrB-recA-rpoB-trpB gene fragments were next binned into operational taxonomic units (OTUs) with a shared identity threshold of 97%, which is a widely used threshold for species-level classification [28].Approximately 70% (82 out of 120) of the sequenced streptomycetes analysed here correspond to a unique species of Streptomyces (S1 Table ).The most (over-)represented species for which a genome sequence is available is Streptomyces albus (seven sequences in total).The availability of multiple genome sequences for a single species enables valuable analyses of the diversity and distribution of secondary metabolism which have only now become possible and will help inform and direct bioprospecting efforts in Streptomyces spp.

Secondary metabolism in S. albus
The archetype member of the S. albus clade is S. albus J1074 [15] commonly used as a heterologous expression host [16][17][18][19][20][21][22].The six additionally sequenced strains of S. albus were identified more recently and their isolation was motivated, at least in part, by bioprospecting in unexploited microbial niches and include: S. sp.PVA-94-07, S. sp.GBA 94-10, S. sp.SM8, S. sp.PP-C42, S. sp.LaPpAH-202 and S. sp.S4.Details of S. albus strains are summarised in Table 1.The poor quality of the genome sequence available for S. sp.PP-C42 (>7,000 contigs) prevented its inclusion in this analysis so therefore a total of six S. albus genomes were analysed here.
Gene clusters encoding putative secondary metabolites were identified using antiSMASH 2.0 [29] and, crucially, were edited to best reflect published experimental data.Three independent analyses of secondary metabolism in S. albus J1074 have been conducted in this year  [9,23,30].These analyses disagree with regard to the total number of putative biosynthetic gene clusters encoded by S. albus J1074.Briefly, these analyses were hindered by using the draft version of the S. albus J1074 genome sequence [30], use of an earlier version of antiSMASH [23] and not taking into consideration experimental data [9,23].
S. albus strains encode between 25-30 biosynthetic gene clusters with S. albus J1074 encoding the least (25) and S. sp.PVA-94-07 encoding the most gene clusters (30) (Table 2).A pairwise comparison of gene clusters revealed significant redundancy in the putative secondary metabolites produced by S. albus strains.Importantly, the pairwise comparison also revealed that between 3 and 21% of gene clusters harboured by an individual strain are in fact strainspecific (Table 2), which suggests that gene cluster diversity may not be exhausted by deep-sequencing multiple strains of a single species, a prediction that was recently validated for the marine actinomycete, Salinispora spp.[7].

Auxiliary biosynthetic capabilities of Streptomyces albus
Beyond the core metabolome, S. albus harbours 14 'auxiliary' biosynthetic gene clusters.Auxiliary biosynthetic gene clusters are conserved to varying extents by S. albus strains, the details of which are summarized in Table 4. NRPS gene clusters were the most abundant class of biosynthetic system (7 out of 14 gene clusters) followed by hybrid NRPS / PKS systems (2 out of 14).As to be expected, the overwhelming majority of auxiliary gene clusters encode the production of unknown products (Table 4).Thus far, only one product of an auxiliary gene cluster has been elucidated, indigoidine.Indigoidine is a blue NRPS-derived pigment produced by S. albus J1074 and S. sp.LaPpAH-202.Interestingly, biosynthesis of indigoidine, at least in S. albus J1074 is repressed under normal laboratory growth conditions, and indigoidine production was only achieved by knocking-in the ermEÃ promoter upstream of core biosynthetic genes [30].Although production of only one auxiliary metabolite has been analysed, bioinformatics analyses suggest that both S. sp.PVA 94-07 and S. sp.GBA 94-10 possess gene clusters coding for the biosynthesis of enterocin and a compound related to kijanamycin, which are both antibacterial agents [9].

Strain-specific metabolites produced by Streptomyces albus
In addition to core and auxiliary metabolites, S. albus strains harbour a total of 17 strain-specific gene clusters whose putative products comprise all of the major classes of secondary metabolites (Table 5).Each S. albus strain specifies at least one strain-specific gene cluster, which is consistent with Salinispora arenicola, S. pacific and S. tropica strains each encoding the production of *1.0 strain-specific polyketide or non-ribosomal peptide [7].S. sp.PVA 94-07 and S. sp.GBA 94-10 harbour a single strain-specific gene cluster apiece, which is the fewest number specified out of all strains (Tables 2 and 5).However, eight gene clusters with unknown products are shared between S. sp.PVA 94-07 and S. sp.GBA 94-10 and are not harboured by other S. albus strains, suggesting that despite this, S. sp.PVA 94-07 and S. sp.GBA 94-10 produce a significant amount of novel chemistry.S. sp.S4 harbours six strain-specific gene clusters whose products represent 21% of its secondary metabolome, which is the most of any S. albus strain (Table 2) and may reflect its possible role as a defensive symbiont of fungus-growing ants [12].Paulomycin, the product of a hybrid NRPS/PKS gene cluster encoded by S. albus J1074 is the only analysis of a strain-specific gene cluster thus far [30,43].Although chemical analysis is required for confirmation, there is strong bioinformatics support to suggest that products of two of the strain-specific gene clusters encoded by S. sp.S4 are the hybrid type I / type III polyketide kendomycin and the type II polyketide fredericamycin [38].The remaining 13 biosynthetic gene clusters harboured by S. albus strains are unknown.The antiSMASH 2.0-implementation of MultiGeneBlast [44] was used to identify the closest relative for each strain-specific gene cluster.Organisms harbouring putative orthologous gene clusters and the associated MultiGe-neBlast score are reported in Table 5.A possible orthologue was identified for all but one strain-specific gene cluster specifying a bacteriocin harboured by S. albus S4 (Table 5).

Conclusions and perspectives
The genomes of S. albus isolates have been sequenced more than any other species of Streptomyces.The putative biosynthetic capabilities of six S. albus strains were analysed here, which identified a core secondary metabolome specified by 18 biosynthetic gene clusters as well as 14 auxiliary gene clusters and 16 strain-specific gene clusters.The products of 29 of the 48 gene clusters identified in this analysis are unknown, representing an attractive reservoir of compounds that may have useful medicinal or industrial applications or may otherwise comprise a chemically interesting scaffold.The flurry of recent analyses investigating secondary metabolism of S. albus strains have collectively resulted in assigning products to 15 of the 25 gene clusters encoded by S. albus J1074, rivaling what is known about S. coelicolor which has been rigorously studied for over half a century [45].Robust and thorough bioinformatics approaches that prioritise taxonomic uniqueness of producing organisms and novel gene clusters will drive the discovery of new compounds.However, many of the gene clusters encoded by streptomycetes are not expressed under normal laboratory growth conditions.In order to therefore maximally exploit the biosynthetic potential of these organisms the regulation of biosynthetic systems must be refactored in the native host or cloned and heterologously expressed variants whose expression has been engineered.These efforts are aided by recent advances in the selective cloning of large genomic DNA inserts [46,47] and will be further aided by the decreasing price of custom DNA synthesis and the ability to assemble these fragments in yeast [48].

Phylogenetic analyses
The Genomic Blast service hosted by NCBI was used to query all complete and draft genomic sequences from bacteria taxonomically classified as Streptomyces spp.(taxid = 1883) with partial DNA sequences for atpD, gyrA, recA, rpoB and trpB, which corresponded to the sequences targeted by oligonucleotide primers used by [26,27] to infer a multilocus phylogeny.FASTA sequence files for relevant accession numbers were downloaded from Genbank using Batch  [51] and were trimmed to the same length (including gaps) and subsequently concatenated in the order: aptD-gyrB-recA-rpoB-trpB.Phylogenetic relationships were inferred from the concatenated sequences by approximate maximum likelihood analysis using FastTree 2.1.7[52].Mycobacterium tuberculosis H37Rv was used as an outgroup and MEGA 5.2.2 was used to visualise and edit the tree.Concatenated aptD-gyrB-recA-rpoB-trpB sequences were grouped into operational taxonomic units (OTUs) using the MacQiime v1.80 implementation of UCLUST [28,53] with a shared identity threshold of 97%.

Analysis of secondary metabolite gene clusters
Genome sequences analysed here were downloaded from Genbank or EMBL (see Table 1 for accessions) and putative biosynthetic gene clusters for secondary metabolites were identified using the default settings in the web implementation of antiSMASH 2.0 [29] and the nucleotide sequence for each gene cluster was extracted from the outputted Genbank files using EMBOSS utility seqret [54].The large number of contigs in some draft genomes caused antiSMASH 2.0 to identify numerous broken or incomplete gene clusters.This was a particular problem with polyketide synthase gene clusters.In order to minimise the impact of broken gene clusters on this analysis, the gene clusters identified from the fully sequenced genome of S. albus J1074 were used as a reference for NUCmer [55] alignments of gene clusters from draft genome sequences.A diagrammatic workflow of this approach is displayed in Fig. 2. Gene clusters from draft genomes that aligned to the same S. albus J1074 gene cluster were subsequently concatenated into a single FASTA file and considered a single gene cluster.A For gene clusters in which S. albus J1074 did not harbour a homologous cluster, NaPDoS [56] was used to identify and extract ketosynthase domains from gene clusters identified by antiSMASH 2.0.The resulting amino acid sequences were aligned by the Geneious 7.1.5implementation of Muscle (eight iterations) and a neighbour-joining phylogenetic tree was inferred from the alignment using the Geneious 7.1.5tree builder with a Jukes-Cantor distance model (not shown).A customised blast database was generated using Blast 2.2.29+ [57] and a combination of blast analysis and whole gene cluster alignments using Mauve 2.3.1 [58] were used to both further refine broken gene clusters in draft genome sequences and to ascertain the conservation of secondary metabolite gene clusters across the S. albus clade.Self vs. self blastn analyses were used to identify and remove duplicate gene clusters.

Figure 1 .
Figure 1.An approximately maximum likelihood phylogenetic tree of sequenced Streptomyces species.A phylogeny was inferred for Mycobacterium tuberculosis and 120 sequenced streptomycetes

Figure 2 .
Figure 2. Diagrammatic workflow of the NUCmer approach used to piece together a biosynthetic gene clusters spread over more than one contig.NUCmer is part of the MUMmer[55] and can be downloaded from http://sourceforge.net/projects/mummer/.NUCmer will align contigs from draft genomes to an intact gene cluster with high shared nucleotide identity.Commands used to perform an analysis of this type are given.Black arrows represent a biosynthetic gene cluster; black and red lines represent contigs in a draft genome sequence.doi:10.1371/journal.pone.0116457.g002

Table 1 .
Accessions and genomic features of Streptomyces albus strains.

Table 2 .
Pairwise comparison of gene clusters encoding putative secondary metabolites from Streptomyces albus strains.The percentage in braces reflects the total number of gene clusters conserved in the pairwise comparison with respect to the strains listed vertically.

Table 3 .
The core secondary metabolome of Streptomyces albus.

Table 4 .
Auxiliary secondary metabolites produced by Streptomyces albus.

Table 5 .
Strain-specific gene clusters encoded by Streptomyces albus.
[50]ez and BedTools 2.19.0[49]was used to extract nucleotide sequence ranges reported in the blast search into a multifasta file.The BioPerl[50]script shortenID.pl(http://nebc.nox.ac.uk/ scripts/parse/shortenID.pl ) written by Bela Tiwari, NERC Environmental Bioinformatics Centre,was used to shorten headers for FASTA entries and the BioPerl script split_multifasta.pl(http://iubio.bio.indiana.edu/gmod/genogrid/scripts/split_multifasta.pl ) written by the Genome Informatics Lab at Indiana University was used to generate individual FASTA files from the resulting multifasta output from BedTools.DNA sequences were aligned using eight iterations of the MEGA 5.2.2 implementation of Muscle