The marine-derived Scopulariopsis brevicaulis strain LF580 produces scopularides A and B, which have anticancerous properties. We carried out genome sequencing using three next-generation DNA sequencing methods. De novo hybrid assembly yielded 621 scaffolds with a total size of 32.2 Mb and 16298 putative gene models. We identified a large non-ribosomal peptide synthetase gene (nrps1) and supporting pks2 gene in the same biosynthetic gene cluster. This cluster and the genes within the cluster are functionally active as confirmed by RNA-Seq. Characterization of carbohydrate-active enzymes and major facilitator superfamily (MFS)-type transporters lead to postulate S. brevicaulis originated from a soil fungus, which came into contact with the marine sponge Tethya aurantium. This marine sponge seems to provide shelter to this fungus and micro-environment suitable for its survival in the ocean. This study also builds the platform for further investigations of the role of life-style and secondary metabolites from S. brevicaulis.
Citation: Kumar A, Henrissat B, Arvas M, Syed MF, Thieme N, Benz JP, et al. (2015) De Novo Assembly and Genome Analyses of the Marine-Derived Scopulariopsis brevicaulis Strain LF580 Unravels Life-Style Traits and Anticancerous Scopularide Biosynthetic Gene Cluster. PLoS ONE10(10): e0140398. https://doi.org/10.1371/journal.pone.0140398
Editor: Monika Schmoll, AIT Austrian Institute of Technology GmbH, AUSTRIA
Received: July 3, 2015; Accepted: September 24, 2015; Published: October 27, 2015
Copyright: © 2015 Kumar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: The whole genome sequence and RNA-Seq data for S. brevicaulis is publicly available under BioSample accession ID: SAMN03764504 and corresponding BioProject accession ID: PRJNA288424.
Funding: Funded by European Union Seventh Framework Program FP7/2007–2013 under grant agreement number 265926.
Competing interests: The authors have declared that no competing interests exist. Concerning the role of VTT Technical Research Centre of Finland Ltd we emphasize that VTT as a whole is a not-for-profit organisation (http://www.vttresearch.com/about-us/co-operation-withvtt/mission-and-mode-of-operations for further information). Biocomputing Platforms Ltd is the current employer of MFS and had no involvement in the study. During the time of the study MFS was employed by VTT Technical Research Centre of Finland Ltd.
Current estimate for numbers of marine fungal species is over 10,000  and this number can even be higher by several folds, if fungi from ocean will be tapped properly . Enormous biodiversity of marine fungal isolates is mirrored by the molecular diversity of their secondary metabolites [3–7]. Initial studies of the usages of marine fungi for potential antibiotics was started by Giuseppe Brotzu in 1945, he isolated and cultivated marine-derived Cephalosporium acremonium from seawater samples near a sewage outlet in Sardinia . A decade later, Newton and Abraham found that cephalosporin C was the responsible antibiotics . At the end of last decade, over 1000 bioactive compounds have been derived from marine fungi [5–7] with major stockholders are polyketides (40%), alkaloids (20%), peptides (15%), terpenes (15%), prenylated polyketides (7%), shikimates (2%) and lipids (1%) . Despite marine fungi being a potent group of bioactive compound producers, they are not well characterized and still underexplored in terms of biotechnological applications. Hence, there are urgent requirements to focus on the marine fungi and their biosynthetic gene clusters with their capabilities of producing derived bioactive compounds and subsequently pharmaceutical potentials of these compounds. The cyclodepsipeptides scopularides producing fungus Scopulariopsis brevicaulis LF580 was isolated from the inner tissue of the marine sponge Tethya aurantium collected in the Mediterranean Sea. The two cyclodepsipeptides scopularides A and B produced by S. brevicaulis LF580 are capable of inhibiting the growth of the pancreatic tumor cell lines (Colo357, Panc89) and the colon tumor cell line (HT29) [9,10]. This data suggest that S. brevicaulis strain LF580 is capable of producing potentially anti-cancerous compounds, imposing the immediate importance to characterize the genes involved in production of bioactive compounds.
During the last decade, there are rapid advancements in next-generation DNA sequencing (NGS) methods, which opened up a wide facet in discovering and characterization of various aspects of biological science in a cost-effective manner. Major NGS platforms are: Roche GS-FLX 454 pyrosequencer, MiSeq, HiSeq, and Genome Analyzer II platforms (Illumina), SOLiD system (Life Technologies/Applied Biosystems), Ion Torrent, and Ion Proton (Life Technologies), and the PacBio RS II (Pacific Biosciences) [11,12]. NGS methods have become standard approaches for detection of several genes of interests and providing genome wide information . These methods have recently been employed to several fungi including Sordaria macrospora  and Pyronema confluens  as well as eukaryotic transcriptomic analyses [15,16].
S. brevicaulis is an opportunistic fungus, which is capable of growing on several materials and is often found in the soil. Our strain LF580 of S. brevicaulis was isolated from the inner tissue of the marine sponge Tethya aurantium . We set out to perform genomic and transcriptomic approaches regarding the marine-derived S. brevicaulis strain LF580 to further characterize biosynthetic gene clusters and associated genes with special focus to the gene cluster that produces scopularides. Under normal laboratory conditions, biosynthetic gene clusters are silent and these clusters are activated during stress . Hence, we used UV-based mutant for examining expression patterns of biosynthetic genes of this fungi.
Herein, we present our results from whole genome as well as transcriptome sequencing of the marine sponge-derived S. brevicaulis strain LF580, which represents the draft genome of this marine-derived species. By comparative genomic analyses, we demonstrate that NRPS1 is responsible for scopularides production. RNA-Seq analyses of the UV-mutant M26 revealed detectable transcripts of about 90% of the genes in the genome, including genes of the NRPS-PKS hybrid cluster (with nrps1 and pks2 genes), which is believed to be responsible for scopularides biosynthesis. Finally, we characterized several other features of the S. brevicaulis genome, such as repeat contents, mating type loci, carbohydrate-active enzymes, MFS-type transporters and performed a protein domain analysis.
Results & Discussion
General Genome Features
We have sequenced the genome of the marine-derived S. brevicaulis using three different genome sequencing methods namely Roche 454, Illumina HiSeq 2000 and ion-torrent. Using Roche 454 pyrosequencing, we achieved a 32.2. Mb genome with N50 equals to 88 kb and 935 contigs, which further joined to form 699 scaffolds with N50 of 116.7 kb. Using short reads of Illumina and Ion-torrent, we achieved smaller N50 (and large numbers of scaffolds) as 67.5 kb (2605) and 26.3 kb (12119), respectively. We performed a hybrid assembly using all these three types of reads, this yielded N50 of 131.8 kb with 623 scaffolds. This corroborates that Roche 454 alone is good performer for fungal genome assembly and combining more than one method is the better choice. This is also reflected by data from a recent genome assembly of the white-rot fungus Pycnoporus cinnabarinus . We identified 16,298 putative genes in the assembled genome (Table 1). The number of identified genes is rather high when compared to other ascomycetes, which may contain about 10,000 to 12,000 genes (Fig 1). The average intron length for this genome is 129.4 bp, which is well within the range of known fungal intron sizes.
Repeat Elements in S. brevicaulis Genome
Repeat elements constitute up to 419,240 bp or 1.33% of the assembled genome of S. brevicaulis. 0.75% of total genome size are tandem repeat sequences, 0.35% are transposable elements (TEs) and 0.20% consisted of low complexity regions (Table 2 and S1 Table). Low-complexity regions are regions of biased composition and regions enriched in imperfect direct and inverted repeats [20,21]. Retroelements make up about 0.18% of the S. brevicaulis genome. Among these retrotransposons with long terminal repeats (LTRs) are in the majorities (0.16%). Class II DNA transposons comprised 0.17% of the genome and the majority of them belong to the Tc1-IS630-Pogo family. Fungal genomes are generally known to possess a low content of only 1–4% of transposable elements . Only a few fungal groups have higher number of repeats, such as several species of dothiodeomycetes  and Tuber melanosporum, a pezizomycetes species . However, these fungi typically have large expansions of the genome size like Tuber melanosporum, which has a genome size of 125 Mb . For further details see a recent review .
Genome Annotation and Phylogenetic Analysis
Functional annotation is critical for understanding the genomic data of new species and is supported by Gene Ontology (GO) . GO helps in characterization of genes, transcripts and proteins of many organisms in terms of biological processes (BP), cellular components (CC), and molecular functions (MF) . We have used this method for the functional annotation of S. brevicaulis proteins using the Blast2GO suite . The derived S. brevicaulis proteins were assigned to three functional groups based on GO terminology: BP, CC and MF (S2 Table). We traced 5,159 proteins to BP terms (Fig 2) with the following five top categories: 761 related to oxidation-reduction processes, 485 related to trans-membrane transport, 423 related to regulation of transcription, 318 related to mycelium development, and 187 related to methylation. Under GO annotation of biological processes (BP), we found that S. brevicaulis is equipped with genes and proteins required for pathogenesis (Fig 2). This can be explained by the fact that this opportunistic fungus serves as a pathogen for immune compromised humans and other animals .
Similarly, 1,566 proteins were assigned to CC terms (Fig 2) with the five top components being: 570 related to integral membrane proteins, 285 related to different protein complexes, 109 related to ribosome proteins, 107 related to extracellular region proteins, and 92 related to proteins of the nuclear lumen. Finally, 4,129 proteins were linked to MF terms (Fig 2) with the five top categories as follows: 675 related to zinc ion binding, 541 related to ATP binding, 340 related to sequence-specific DNA binding transcription factors, 190 related to hydrolase activity (hydrolyzing O-glycosyl compounds), and 189 related to methyltransferase activity. All GO terms in these three categories are listed in S2 Table.
During the BLAST2GO based annotation process, we were able to annotate 9,340 genes (57.31%) while 6,958 genes (43.69%) remained non-annotated in this fungus genome as summarized in S2 Table. A homology based annotation process suggests S. brevicaulis belongs to the Sordariomycetes class and it is most closely related to Nectria and Fusarium species (Fig 3A), it does not group within that clade, but seems to be somewhat distinct. To evaluate exact location of this fungus, we performed a genome-wide phylogenetic analysis using the CVtree . We found that S. brevicaulis has diverged early from other representative Sordariomycetic fungi such as Verticillium, Glomerella, Coletotrichum, Nectria, Fusarium, Metarhizium, Trichoderma, Magnaporthe and Neurospora (Fig 3B).
Protein Domains of S. brevicaulis
Protein domains are biochemically independently foldable structural units, which depicting evolutionary conservation with the presence of at least one protein motif. This implies that proteins carrying common domains may have similar functions. Hence, this is an important source for scanning new genomes for putative proteins with similar functions. There are two state of the art databases namely the Pfam  and the Interpro , both used for protein domain analysis. This analysis is helpful for better annotation of new genomes. We found a total of 10,458 deduced protein sequences of S. brevicaulis associated with all eukaryotic protein domains (S3 Table) and top 20 Pfam domains, which are summarized in Fig 4. Additionally, we found two transporter domains in Pfam with 221 proteins harboring a major facilitator superfamily/MSF_1 domain (PF07690.11), and 107 proteins containing a sugar (and other) transporter/sugar_tr domain (PF00083.19). These transporters are generally single-polypeptide secondary carriers involved in transportation of sugars and other small solutes in response to chemiosmotic ion gradients [30,31].
Full details of all Pfam domains are provided in S3 Table.
Two transcription factor domains are also in this list with 205 fungal specific transcription factor domain/ Fungal_trans (PF04082.13) and 112 fungal Zn(2)-Cys(6) binuclear cluster domain/Zn_clus (PF00172.13). These proteins serve as transcription regulatory elements. We compared all transcription factor domains, which suggested that these two fungal transcription factor domains are highly expanded in selected ascomycetes (S4 Table) as shown previously . We detected 112 G-beta repeat/WD40 (PF00400.27) domains, which may be involved in signal transduction. Additionally, WD40 domains are also regulate fungal cell differentiation processes . We computed comparative protein domain analyses using selected fungal genomes (S5 Table). All protein domains of S. brevicaulis genome are in accordance with known fungal genomes from ascomycetes.
Plant Biomass Associated Metabolism Evident from Comparative Analyses of Carbohydrate Active Enzymes
The type of association between sponges and fungi and the corresponding ecological function remain unclear and little evidence is available on fungal adaptation to sponges (if any). Studying the carbohydrate-active enzymes (CAZy) profile could provide interesting information on the main families represented in S. brevicaulis genome and perhaps reveal its substrate preference and its nutritional relationship with the sponge.
Using specialized homology detections and annotation from CAZy database, (www.cazy.org) we identified 478 CAZy genes in S. brevicaulis genome (S6 and S7 Tables). These enzymes are divided into six classes, i.e. 71 auxiliary activities (AAs), 34 carbohydrate esterases (CEs), 50 carbohydrate binding modules (CBMs), 227 glycoside hydrolases (GHs), 81 glycosyl transferases (GTs) and 15 polysaccharide lyases (PLs). S. brevicaulis contains many candidate enzymes involved in cellulose breakdown as apparent from enzymes from families GH1, GH3, GH5, GH6, GH7, GH12, GH45, with a global diversification similar to other ascomycete fungi able to modify or deconstruct the plant biomass, including the model fungus N. crassa. Trichoderma reesei contains only eleven representatives of families GH5, GH6 and GH7 against 21 for S. brevicaulis. If we consider hemicellulases and particularly xylanases (families GH10, GH11, GH30 and GH51), galactanases (GH53) or mannanases (GH26), the same picture emerges, i.e. a similar number of representatives in S. brevicaulis and N. crassa but no representative for the entomopathogenic fungus M. anisopliae. S. brevicaulis and other plant degrading fungi contain members of the pectinolytic families PL1 (pectin/pectate lyase), PL3 (pectate lyase), PL9 (pectate lyase), PL11 (rhamnogalacturonan lyase), CE8 (pectin methylesterase), CE12 (rhamnogalaturonan acetyl esterase) and GH53 (endo-β-1,4-galactanase). All these data related to glycoside hydrolases (GH) indicate that S. brevicaulis has developed a metabolism focused on the breakdown of terrestrial plant materials rather than algal or animal biomass.
The carbohydrate portion of land plants is intimately linked to lignin, and auxiliary activities (AA) are needed to give access to GH in order that plant modifying or degrading fungi could penetrate into the cell wall and gain access to the carbohydrate energy source. Considering the AA families acting on lignins, S. brevicaulis is composed of a poor set of laccase-like oxidases (AA1) and peroxidases (AA2), but with a substantial number of enzymes of the glucose-methanol-choline (GMC) superfamily, i.e. 23 AA3 with one cellobiose dehydrogenase (CDH, AA3_1), 19 putative aryl alcohol oxidases and glucose oxidases (AA3_2), and three alcohol oxidases (AA3_3). In addition, a low number of glyoxal oxidase (AA5), providers of H2O2 as other members of the GMC family suggest that the fungus does not possess a strong ligninolytic capacity. In contrast, other oxidative enzymes targeting the carbohydrate portion are well represented. For instance, there are four gluco-oligosaccharide oxidases (AA7), and 27 potential members of the lytic polysaccharide mono-oxygenases (LPMO) oxidatively cleaving the glycosidic chains on the crystalline surface of cellulose, chitin or starch (AA9, 11 and 13, respectively). LPMOs create entry points for hydrolytic cellulases, chitinases or amylases. Their recent discovery opened a new route to accelerate biomass degradation in biotechnological applications . Phillips et al.  and Bey et al.  demonstrated that AA9s and CDH (AA3_1) of N. crassa and of Pycnoporus cinnabarinus, respectively, act in concert to cleave cellulose oxidatively. LPMOs of families AA11 and AA13 recently identified from N. crassa, Aspergillus nidulans and Aspergillus oryzae [37–39] are also represented in the S. brevicaulis genome, suggesting that the fungus could be able to cope with a large variety of plant substrates to degrade.
Analyses of MFS-Type and Sugar Transporters Also Support Plant Biomass Associated Metabolism
Taking into account the entire CAZyme repertoire, it is clear that S. brevicaulis has a metabolism capable of break down plant biomass. The same picture emerges when S. brevicaulis proteins predicted to harbor either a MFS domain (PF07690.11), or a sugar (and other) transporter domain (PF00083.19) are compared to the corresponding transporter complement of N. crassa, a representative plant biomass saprophyte. N. crassa has only about half as many transporters encoded in its genome compared to S. brevicaulis (159 vs. 328 with the same PFam annotations). Yet an overall similar distribution of the transporters can be observed across the categories as defined by the Transport Classification database (TCDB; ) can be observed (Fig 5). The major categories in both cases are the Sugar Porter family (2.A.1.1), the Anion:Cation Symporter family (2.A.1.14), the Drug:H+ Antiporter families 1 and 2 (2.A.1.2 and 2.A.1.3), as well as the Monocarboxylate Porter family (2.A.1.13) (S8 Table). Transporters in fungi are notoriously under-characterized, and thus clear annotations are difficult, but the comparison indicates that the two transporter families linked to sugar uptake (2.A.1.1 with 102 vs. 37 members in S. brevicaulis and N. crassa, respectively) and the uptake/transport of small charged solutes and metabolites (2.A.1.14 with 94 vs. 26 members) are overrepresented in S. brevicaulis as compared to N. crassa, suggesting a broadened substrate spectrum that this fungus is able to utilize. This feature could have been potentially helpful in the transition from a soil fungus  to a marine sponge habitat. As Tethya aurantium (http://www.marlin.ac.uk/index.php, species ID 4450) grows on rocks and stones in the shallow sub-littoral, it is likely that the sponge may have taken up fungal spores drifted from nearby shores. The sponges may have acted as a spore trap or a shelter. Since S. brevicaulis is able to act as pathogen of humans associated with onychomycosis , it may also be able to dwell in a sponge. Therefore, the sponge may have created a suitable micro-environment for a terrestrial fungus that could adapt to the sea salt environment and find nutritional resources. It is known that other fungi from sponges are rather related to fungi from terrestrial sources and are generally able to cope with media containing salt concentration found in the marine environments . Alternatively, it may happen that marine sponge-associated fungi are able to survive without any knowledge of their hosts. It is beyond the scope of this manuscript to explore further details into aspects of fungal-sponge relationships. Nevertheless, our work clears the way for the potential of genomic investigation to study such marine fungal strains.
A. Comparison of MFS type and sugar transporters from S. brevicaulis and N. crassa. This classification is based on TCDB  categories. B. Distribution of TCDB categories  according to the number of classified MFS type and sugar transporters of S. brevicaulis. C. Distribution of TCDB categories  according to the combined RPKMs of the assigned MFS type and sugar transporters. The size of each category is presented as percentage of the total number of RPKMs. All categories with less than two percent were grouped together (“other”). The respective values for these categories are presented enlarged in the bar to the right. SP Family (TCDB category: 2.A.1.1): The Sugar Porter Family; OFA Family (2.A.1.11): The Oxalate:Formate Antiporter Family; SHS Family (2.A.1.12): The Sialate:H+ Symporter Family; MCP Family (2.A.1.13): The Monocarboxylate Porter Family; ACS Family (2.A.1.14): The Anion:Cation Symporter Family; SIT Family (2.A.1.16): The Siderophore-Iron Transporter Family; OCT Family (2.A.1.19): The Organic Cation Transporter Family; DHA1 Family (2.A.1.2): The Drug:H+ Antiporter-1 (12 Spanner) Family; FLVCR Family (2.A.1.28): The Feline Leukemia Virus Subgroup C Receptor Family; DHA2 Family (2.A.1.3): The Drug:H+ Antiporter-2 (14 Spanner) Family; YnfM Family (2.A.1.36): The Acriflavin-sensitivity Family; LAT3 Family (2.A.1.44): The L-Amino Acid Transporter-3) Family; V-BAAT Family (2.A.1.48): The Vacuolar Basic Amino Acid Transporter Family; NAG-T Family (2.A.1.58): The N-Acetylglucosamine Transporter Family; UMF12 Family (2.A.1.63): The Unidentified Major Facilitator-12 Family; FHS Family (2.A.1.7): The Fucose: H+ Symporter Family; UMF23 Family (2.A.1.75): The Unidentified Major Facilitator-23 Family; NNP Family (2.A.1.8): The Nitrate/Nitrite Porter Family; PHS Family (2.A.1.9): The Phosphate: H+ Symporter Family; TDT Family (2.A.16): The Tellurite-resistance/Dicarboxylate Transporter Family; POT/PTR Family (2.A.17): The Proton-dependent Oligopeptide Transporter Family; GPH:Cation Symporter Family (2.A.2): The Glycoside-Pentoside-Hexuronide:Cation Symporter Family; CPA1 Family (2.A.36): The Monovalent Cation:Proton Antiporter-1 Family.
Characterization of Gene Content and Expression Using RNA-Seq
The S. brevicaulis LF580 genome contains over 16,000 genes, which is on the higher side for known ascomycetes (Fig 1). It is interesting to see how many of these are expressed in a single condition. To evaluate this status, we extracted RNA of S. brevicaulis strain M26 growing in WSP30 medium (see Materials and Methods section), which also supports production of Scopularide A and B. We performed RNA sequencing using Illumina HiSeq 2000. Resulting reads were mapped to the putative genes of the assembled S. brevicaulis genome. A total of 14,724 genes were found to be expressed in this analysis, which represents 90% of the entire gene complement. These expressed genes were classified into 10 tiers based on their reads per kilobase of transcript per million mapped reads (RPKM) values (Table 3 and S8 Table). Tier #1 has 120 genes with RPKM values >1000, which accounts for 0.8% of all expressed genes, while 26% (3832 genes) were detected with very low transcript quantities with RPKM values ranging from higher than 0 to lower than 1.0 (marked by red font or blue shade in S9 Table, respectively) and these were all placed into tier #10 (non-expressing genes are marked by yellow shade in S9 Table). To further evaluate highly expressing genes in the mutant M26, we examined selected genes and their expression patterns tier-wise according to their RPKM values. In the following, we provide some vignettes of top expressing genes in the UV-mutant M26.
Regarding MFS-type transporters, the accumulated expression per category broadly follows their TCDB classification distribution (compare Fig 5B and 5C) with one notable exception. Class 2.A.1.12 (the Sialate:H+ Symporter family; dark green), is greatly overrepresented in terms of transcript abundance (only 2 genes, but with 5.9% of total transporter-specific transcript) due to g12790.t1 being the second most highly expressed transporter in the genome (Table 4). Homology search by BLAST  suggests that g12790.t1 encodes for a carboxylic acid transporter, such as for lactate or pyruvate uptake, which should have been abundant in the rich medium S. brevicaulis was grown in. An analysis of the remaining genes in the list of top 10 transcribed transporters suggests that these collectively help to satisfy some of the major nutritional requirements of the fungus, such as for carbon and nitrogen as well as vitamins. Sources for these are carbohydrates (hexoses such as glucose, pentoses and other polyols; g14394.t1, g3025.t1, g3159.t1, and g6510.t1), small organic, nitrogenous compounds such as allantoate (g10354.t1), and important nutrients such as the B-vitamins niacin (g116.t1 and g12121.t1) and (potentially) biotin (g10354.t1).
Expressed genes are classified into 10 tiers based on their RPKM value.
Hydrophobins are morphogenetic, small mass (≤20 kDa) secreted hydrophobic fungal proteins . S. brevicaulis has three hydrophobin genes, namely SbreHPB1 (g5510.t1), SbreHPB2 (g7216.t1), and SbreHPB3 (g15602.t1). Two hybrophobins (SbreHPB1 and SbreHPB2) were expressed in the top expression tiers #3 and #1 with RPKM values of 101.6 and 1035, respectively. In contrast, SbreHPB3 had lower expression values (60 RPKM) and was classified into the tier #4. Other recent studies also indicated that fungal hydrophobins have different expression patterns under abiotic and biotic stresses, in which adherence mechanisms are influenced [44,45]. Furthermore, common numbers of hydrophobins are generally 2–10 per fungus with no apparent increase in copy number in fungi of marine origin .
In summary, transcripts for about 90% of the genes in this genome could be detected. This rather high value indicates that S. brevicaulis has a higher number of expressed genes than most other Ascomycetes.
Overview of Bioactive Compounds Encoding Genes
The S. brevicaulis genome has 16 genes encoding for non-ribosomal peptide synthetases (NRPSs) (Fig 6) with three NRPS genes (NRPS1-3) encoding enzymes with a multi-modular organization with more than one condensation domain. This modular architecture is known to be specific for fungal NRPSs . The domain organization of putative NRPS and PKS proteins is shown in Fig 6. Additionally, we identified six full-length polyketide synthase genes (PKSs), one fatty acid synthase (FAS) gene and three putative terpene encoding genes in the genome (Table 5). Additional single domain enzymes such as reductases and cytochrome P450 monooxygenases were also identified but not taken into further consideration. All these genes are localized into 18 different clusters (Fig 7 and S2 Fig), which include four NRPS clusters, six PKS clusters, and five other clusters that have NRPS6-NRPS16 genes. Since the encoded NRPSs of these genes are not modular in nature, these are placed separately by the AntiSMASH tool  in comparison to other clusters. A single cluster was identified on the scaffold477, which possesses NRPS1 and PKS2 genes in the N-terminal 78 kb region (Fig 8), which is composed of the contig264 and contig358. Corresponding clusters of supporting genes and their expression values are shown in Fig 8. Our data indicate that this gene cluster, which is involved in the scopularide production and indeed, it is actively expressed under conditions supporting scropularides A and B production [49,50]. The nrps1 gene (g12932) is the best candidate gene to be responsible for production of the cyclic lipopeptide scopularide , which consists of five amino acids (glycine, L-valine, D-leucine, L-alanine and L-phenylalanine), and a reduced carbon chain . Its production scheme is shown in Fig 8B. The reduced carbon chain (3-hydroxy-methyldecanoyl) may be derived from the product of the pks2 gene. This is further supported by the fact that the two genes (nrps1 and pks2) are localized on a single cluster on the scaffold477. This cluster has a high degree of similarity with clusters in the genomes of Cordyceps militaris (JH126399.1), Aspergillus nidulans FGSC A4 (BN001307.1), Streptomyces bingchenggensis (NC_016582.1), A. niger ATCC 1015 (ACJE01000001.1), and Streptomyces achromogenes subsp. rubradiris (AJ871581.1) (S2 Fig). S2 Fig depicts details of these clusters with information about homologous clusters in either fungi or bacteria. Obviously, most of these 18 clusters have homologous in closely related ascomycetes, as shown in Fig 3 on a phylogenetic scale. However, we could not identify in other fungal genomes any homologous for clusters 5 and 16 (S2 Fig). However, some clusters exhibit similarities to bacterial counterparts, which is especially true for the cluster 5 (S2E Fig) with a homolog in only Streptomyces violaceusniger (NC_015957.1). This may suggest horizontal gene transfer from bacteria to fungi. Indeed, horizontal gene transfer is considered to be a major source of metabolite diversity in fungi . The nrps2 gene (g8056) encodes an enzyme which is a homologue of the synthetase responsible for production of the iron-chelating siderophore ferricrocin (SidC) , found in numerous fungi . The third multi-modular NRPS, encoded by nrps3 (g5523) contains four adenylation domains, but the product is currently unknown. The gene was not expressed under the examined conditions and BLASTP analyses did not identify orthologs with known products.
A. Summary of NRPS genes reveals three multimodular NRPS and thirteen non-modular NRPS genes. Both of these NRPS genes are expressed in RNA-seq data. B. List of full-length polyketide synthetases (PKS) genes and corresponding protein domain organization. C. Proteins encoded by the nrps1 and psk2 gene are capable of producing scopularide, which has two forms A and B.
Gene clusters are numbered as C1-C18 followed by scaffold and positions on the scaffold and different colors are illustrating different types of clusters. Biosynthetic genes are key gene (like either nrps or pks and so on) and main supporting genes such as a cytochrome P450 gene as per antiSMASH  guidelines. Similarly, other genes are any other gene in the cluster, which are not key genes, regulatory (such as transcription factor or suppressor) and transporters (such as ABC transporter) and are marked in grey shade.
A. Summary of scopularide producing cluster of NRPS1/PKS2 of S. brevicaulis. This cluster is localized on the scaffold477 in a region of 78 kb at the 3’ end of this scaffold (scaffold size 280kb). This cluster is active as different flanking genes (marked in blue) are expressed during UV-mutagenesis based RNA-Seq experiment. Top homologs clusters are found in both fungi and bacteria, which hints that this clusters might have originated via horizontal gene transfers from bacteria. P-Cu amine oxidase—Peroxisomal copper amine oxidase; Tri101—Trichothecene 3-o-acetyltransferase; EutQ—Ethanolamine utilization protein like EutQ. B. Schema of generation of scopularide using NRPS1 and PKS2 of S. brevicaulis. Modified from Lukassen et al. .
Five of the six PKS proteins (PKS1-5) contained the reducing domains dehydratase, enoylreductase and ketoreductase. The only actively expressed PKS gene was pks2 (g14542), which has possible orthologs in Aspergillus nidulans (AN2547), F. graminearum (PKS6; FGSG_08208) and F. pseudograminearum (PKS40, FPSE_09183). The encoded PKSs are involved in production of the lipopeptides emericellamide, fusaristatin and W493, respectively [56,57], each consisting of a reduced carbon chain provided by the PKSs, which is requited by NRPSs together with three to seven amino acids. The resulting product is then released by the NRPS by cyclization. The pks6 gene (g13622) on the other hand has a non-reducing PKS protein product and BLASTP analysis against GenBank showed that it is shares similarities with the mycelium pigment synthase and shares 71% identity to a PKS (VDAG_00190) that has been proposed to be involved in the biosynthesis of melanin in Verticillium dahlia . Hence, PKS6 could be involved in pigment biosynthesis in S. brevicaulis. These pks genes are type I PKS genes and are localized in 5 different clusters (Fig 7 and S2 Fig). Cluster 9 is the only cluster (Fig 7 and S2 Fig), which can lead to type III PKS, which might be responsible for the production of chalcone and stilbene synthase as the key enzyme shows 70% identities with homologous gene in the Colletotrichum higginsianum (GenBank, ID CCF34076.1). We also identified three genes encoding aristolochene synthase (g9860.t1), geranylgeranyl diphosphate synthase (g13546.t1) squalene synthase (g5738.t1) forming two clusters (Fig 7 and S2 Fig) on the scaffolds scaffold440 and scaffold446.
At current the secondary metabolite products produced by many of these proteins are unknown as it is typical for many fungi studied. As genome sequencing of fungi has become affordable we expect more and more fungal genomes being available in the public databases, which will lead into a better picture of homologous gene clusters and their final products. This opens opportunities for other researchers for utilizing genome wide information of this fungus to explore the potentials of these genes and their clusters. In addition this analysis, a separate study was carried out for characterization of scopularide producing proteins using iTRAQ-based proteomics analysis .
S. brevicaulis LF580 Is a Mating Type MAT1-1 Strain
Mating type (MAT) locus governs sexual reproduction in fungal kingdom by possessing key transcriptional regulators that facilitates the cell identity and fate . We detected three MAT1-1 specific ORFs, MAT1-1-1 (g7314.t1), MAT1-1-2 (g7313.t1) andMAT1-1-3 (g7312.t1) on contig 95 in the S. brevicaulis genome (Fig 9) with help of several parameters for homology detection using BLAST suite . The MAT1-1 specific genes are flanked by the SLA2 and APN2 genes on contig 95 of S. brevicaulis (Fig 9A), and these two genes are frequently found close to the MAT loci in filamentous ascomycetes [61–64]. In addition, two putative ORFs (gi7311.t1 and gi7315.t1) are predicted in a reverse orientation of APN2 and SLA2, respectively. However, a BLASTP search with these two ORFs gave no significant hit in the databases.
A. The mating-type locus MAT1-1 of S. brevicaulis is localized on contig 95. The positioning and transcriptional direction of the mating-type genes (blue) is indicated by an arrow. Flanking genes APN2 and SLA2 are shown in yellow and green, respectively. Two predicted ORFs are indicated in grey. B. Expression profiling of genes conserved in the mating-type locus MAT1-1 of S. brevicaulis.
Upon scanning expression profiling based on RNA-seq data, we found that all of the mating type genes of S. brevicaulis are expressed. The MAT1-1-1 gene has highest expression among three mating genes, which is followed by MAT1-1-3 and MAT1-1-2 (Fig 9B). The flanking gene SLA2 gene has a particular high expression, which is 10-fold higher in comparison to the APN2 gene. Overall, we report that conserved genes of the mating type loci of S. brevicaulis are expressed.
Overall, the presence of three MAT1-1 genes and absence of MAT1-2 gene in the S. brevicaulis genome corroborate that S. brevicaulis LF580 is a MAT1-1 strain. Additionally, all three MAT1-1 genes appear to be functional genes, because these genes shown expression profiles in the RNA-Seq datasets.
Our article presents the draft assembly of the S. brevicaulis strain LF580 genome isolated from marine environment. Using three different sequencing methods, the genome was assembled with genome size of 32.2 Mb harboring 16,298 putative genes. We identified 18 gene clusters responsible for secondary metabolite production, which appear to express secondary metabolite enzymes. This includes a cluster with NRPS1 and PKS2 genes, which together synthesize scopularides with anticancerous properties. In summary, by combining genomic and transcriptomic data, we have compiled new genetic and expression information for a marine-derived strain of S. brevicaulis. Moreover, we analysed the obtained genome data for clues explaining the necessary life style changes.
Collection of Fungal Strain, Cultivation, and DNA Isolation
S. brevicaulis LF580 strain was cultivated as previously described . The strain was obtained from the fungal collection of the Kiel Center for Marine Natural Products as cryo-conserved material. Originally, this strain was isolated from the inner tissue of the marine sponge Tethya aurantium. This fungus was cultivated on solid WSP30 medium, which is a variant of Wickerham-medium (with composition as following 1% glucose, 0.5% soy peptone, 0.3% malt extract, 0.3% yeast extract, 3% NaCl) . S. brevicaulis M26  was provided by Linda Paun (Kiel). Genomic DNA from S. brevicaulis was prepared by following a modification of previously published methods [66,67]. Mycelium was frozen in liquid nitrogen, pulverized, and incubated in equal volumes of lysis buffer (10 mM Tris-HCl, 1 mM EDTA, 100 mM NaCl, 2% SDS, pH 8.0), After centrifugation, the supernatant was treated with RNase, and afterwards with an equal volume phenol/chloroform (1:1).
Genome Sequencing Using Three Different Methods
Roche 454 sequencing was performed with 20 μg genomic DNA at Macrogen (Korea). This provided 631 Mb 454 reads with an average read length of 432.5 bp (S3 Fig). Illumina sequencing was performed using Illumina HiSeq™ 2000 with 20 μg genomic DNA at the Macrogen (Korea). This yielded in 2.5*108 Illumina reads with an average length of 101 bp (S10 Table). Ion-torrent sequencing was carried out with 20 μg genomic DNA at Genotypic Technology (Bangalore, India) and 630 Mb of Ion-torrent reads were generated with average length of 119 bp (S11 Table). The whole genome sequencing and RNA-Seq data for S. brevicaulis is publically available using BioSample accession ID: SAMN03764504 and corresponding BioProject accession ID: PRJNA288424.
Genome Assembly, Repeat Detection, Gene Prediction and Annotation Analyses
Roche 454 reads were assembled into contigs using Newbler assembler . Several Genome assemblies were performed using de Brujin graph based method by de novo assembler in the CLCBio Genomic workbench  using generated reads of Illumina, and Ion-Torrent and all reads for hybrid assembly. Scaffolding of contigs generated by respective assemblies was carried out using genome finishing module of CLCBio Genomic workbench . The Repeat elements were predicted using RepeatMasker and RepeatProteinMasker software programs (Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-4.0.0 1996–2013 http://www.repeatmasker.org) using the fungal transposon species library (database version 20120418) as input. Gene prediction was performed using Augustus gene prediction tool using Asperigillus niger as training dataset. This prediction was compared with other genome prediction tools. Predict genes were annotated using BLAST homology searches  with an E-value cutoff of 1e−3, supported by BLAST2GO tool . Predicted coding regions were annotated using BLAST  with comparing the Kyoto Encyclopedia of Genes and Genomes (KEGG) , Swiss-Prot, TrEMBL, Gene Ontology (GO), and non-redundant (NR) databases.
Genome-Wide Phylogenetic Relationship
In order to confirm the phylogenetic position of species under study we reconstructed a phylogenetic tree using the CVtree . CVtree is an alignment free composition vector tree based method and hence does not require selection of specific genes for phylogeny reconstruction. The only parameter required by the method is k, that was set to 7 . We used the fully predicted proteomes of 67 fungi, and the choanoflagellate Monosiga brevicollis as an outgroup . Bootstrap scores for phylogeny were calculated as in  by randomly sampling the proteome of each species, with replacement, to create a novel perturbed proteome for each of the 100 bootstrap runs. A representative subset of the 68 species was plotted using APE package  within the R computing environment .
Protein Domain Estimation
Predicted proteins of this genome were scanned to all known Pfam (version 27)  and Interpro (version 43)  protein domains collections, respectively. Pfam domains were predicted using HMMER 3.0 , removing overlapping clans. In order to compare protein coding gene content across fungal species we constructed a python script to carry out the following tasks. Interpro database  was searched for Pfam  identifiers corresponding to Interpro identifiers of interest. For each Pfam  identifier pfam_scan.pl (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/Tools/), a wrapper for HMMER 3.0 , was run to find the matching proteins in the genomes of interest. To analyse subfamily structure of each Pfam family’s member proteins in the genomes, the corresponding protein sequences were collected and mcl clustered based on the E-value matrix of all-vs-all BLASTP . The e-value matrix was tresholded prior to clustering . The mcl clustering has a single major parameter that defines the granularity of the clustering i.e. the inflation value. mcl clustering was run over the range of possible inflation values. For each inflation value, a sensitivity and specificity was calculated for the clustering as previously described [32,79]. In order to calculate these, other secondary Pfam matches were determined for the member proteins of the Pfam under study and the most variable secondary Pfam selected for sensitivity and specificity calculations. Sensitivity and specificity were centred and the inflation value corresponding to their minimum difference selected to get a single subfamily clustering for each Pfam. A R-script using the APE package  within the R computing environment  was used to plot and process the result tables. The program code for the analysis is available at https://github.com/fahad-syed/ProSol.git.
Identifications and Classifications of CAZyme Domains
All putative proteins were compared to the entries in the CAZy database [80,81] using BLASTP . The proteins with E-values smaller than 0.1 were further screened by a combination of BLAST searches against individual protein modules belonging to the following classes auxiliary activities (AA), glycoside hydrolases (GH), glycosyltransferases (GT), polysaccharide lyases (PL), carbohydrate esterases (CE) and carbohydrate-binding modules (CBM) in the CAZy database (http://www.cazy.org/). HMMER 3.0  was used to query a collection of custom-made hidden Markov model (HMM) profiles constructed for each CAZy family. All identified proteins were then manually curated and whenever possible, assigned to a subfamily within a family.
Classification of MFS-Type and Sugar Transporters
For a more precise classification of the S. brevicaulis genes annotated according to Pfam as MFS-type or sugar transporters (328 genes total), the Transporter Classification Database (TCDB) was used . In addition, the 159 transporter genes of N. crassa with the same Pfam annotations were also classified using TCDB . To this end, sequence similarity searches were performed against the TCDB for each gene using BLASTP  with default parameters. To ensure a certain level of stringency, only E-value of 1e-10 and below were considered as reliable hits. If a BLAST  search met these preconditions, the corresponding gene was classified into the same category as the TCDB homolog with the best e-value. When the E-value threshold was exceeded for all TCDB results, the gene was not categorized. Some TCDB results exhibited low e-values and diverse categories. In these cases, the respective genes were flagged as uncertain, but still categorized for further analysis.
Moreover, the ten most highly expressed genes were further analyzed by performing a BLASTP  sequence similarity search against the RefSeq database (NCBI) with default parameters to identify homologs not present in TCDB with a descriptive annotation.
RNA Isolation, Sequencing and RNA-Seq Analyses
Cultivation of fungal strain M26 was done in WSP-30 medium for 7 days at 200 rpm in the dark. RNA was isolated using previously known methods for RNA isolation [13,14,66]. RNA sequencing was performed using Illumina HiSeq™ 2000 at the Beijing Genome Institute (BGI) (Shenzhen, China). A total of 17,452,507 illumina reads were obtained for the S. brevicaulis. Raw reads were mapped to predicted genes using RNA-Seq mapping tool of CLC Bio Genomic workbench  and relative expression levels were calculated as Reads Per Kilobase of transcript per Million mapped reads (RPKM).
Detection of Bioactive Encoding Genes and Their Clusters
Initially, putative genes that encoding for proteins which produce bioactive compounds are identified using BLAST  with an E-value < 1e−3. Subsequently, this genome was analysed using SMURF  and antiSMASH  for putative clusters and further examined by manually coupled with RNA-Seq data. The functional domains of PKSs and NRPSs were identified as previously described , using a combinations of tools namely antiSMASH , NCBI Conserved Domain Database , InterPro  and the PKS/NRPS Analysis Web-site .
S1 Fig. Summary of ribosomal proteins expressed in top 3 tiers in the mutant M26.
S2 Fig. Summary of 18 secondary metabolite clusters from S. brevicaulis and their homologs in different fungal and bacterial genomes.
Two clusters namely cluster 11 and 16 have no homologs in known fungal genomes.
S1 Table. Overview of repeat contents in the S. brevicaulis genome.
S2 Table. Summary of genome annotation of S. brevicaulis genome
S3 Table. Pfam domain annotation of S. brevicaulis genome wide peptides.
S4 Table. Overview of transcription factors derived from protein domain analysis.
S5 Table. Comparison of protein domains of S. brevicaulis with selected fungi.
S6 Table. Overview of CATyome of S. brevicaulis.
S7 Table. Comparisons of CATymes of S. brevicaulis and selected fungi.
S8 Table. Overview of transporter genes of S. brevicaulis.
S9 Table. Summary of transcriptomics of mutant M26 of S. brevicaulis generated by UV mutagenesis generated using Illumina based RNA sequencing.
Tiers are defined in the Table 4.
S10 Table. Quality control and base reports of Illumina reads.
The research of the project MARINE FUNGI leading to these results has received funding from the European Union Seventh Framework Program (FP7/2007–2013 under grant agreement number 265926). We thank Hanna Schmidt for DNA and RNA isolation and Linda Paun for providing strain M26, respectively. We also thank Genotypic Technology (Bangalore, India) for providing Ion-torrent based genome sequencing as a gift. We also thank Chandan Goswami for editing final version of this MS.
Conceived and designed the experiments: AK FK. Performed the experiments: AK BH MFS MA SP. Analyzed the data: AK BH ER MFS MA NT JPB JLS SP FK. Contributed reagents/materials/analysis tools: AK FK. Wrote the paper: AK BH ER JPB JLS SP FK.
- 1. Jones EBG (2011) Are there more marine fungi to be described? Botanica Marina 54: 343–354.
- 2. Mora C, Tittensor DP, Adl S, Simpson AGB, Worm B (2011) How many species are there on Earth and in the ocean? PLoS biology 9: e1001127–e1001127. pmid:21886479
- 3. Konig GM, Kehraus S, Seibert SF, Abdel-Lateff A, Muller D (2006) Natural products from marine organisms and their associated microbes. Chembiochem 7: 229–238. pmid:16247831
- 4. Ebada SS, Proksch P (2013) Bioactive secondary metabolites from marine-derived fungi. In: Kim SK, editor. Marine Pharmacognosy:Trends and Applications: CRC Press Taylor & Francis Group, LLC, Boca Raton. pp. 27–51.
- 5. Saleem M, Ali MS, Hussain S, Jabbar A, Ashraf M, et al. (2007) Marine natural products of fungal origin. Nat Prod Rep 24: 1142–1152. pmid:17898901
- 6. Bugni TS, Ireland CM (2004) Marine-derived fungi: a chemically and biologically diverse group of microorganisms. Nat Prod Rep 21: 143–163. pmid:15039840
- 7. Rateb ME, Ebel R (2011) Secondary metabolites of fungi from marine habitats. Nat Prod Rep 28: 290–344. pmid:21229157
- 8. Newton GG, Abraham EP (1955) Cephalosporin C, a new antibiotic containing sulphur and D-alpha-aminoadipic acid. Nature 175: 548. pmid:14370161
- 9. Yu Z, Lang G, Kajahn I, Schmaljohann R, Imhoff JF (2008) Scopularides A and B, cyclodepsipeptides from a marine sponge-derived fungus, Scopulariopsis brevicaulis. J Nat Prod 71: 1052–1054. pmid:18412398
- 10. Imhoff JF, Kajahn I, Lang G, Wiese J, Peters A (2010) Production and use of antimumoral, antibiotic and insecticidal cyclodepsipeptides (WO 2010/142258).
- 11. Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11: 31–46. pmid:19997069
- 12. Culligan EP, Sleator RD, Marchesi JR, Hill C (2013) Metagenomics and novel gene discovery: Promise and potential for novel therapeutics. Virulence 5: 1–14.
- 13. Nowrousian M, Stajich JE, Chu M, Engh I, Espagne E, et al. (2010) De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis. PLoS genetics 6: e1000891–e1000891. pmid:20386741
- 14. Traeger S, Altegoer F, Freitag M, Gabaldon T, Kempken F, et al. (2013) The Genome and Development-Dependent Transcriptomes of Pyronema confluens: A Window into Fungal Evolution. PLoS Genetics 9: e1003820–e1003820. pmid:24068976
- 15. Kumar A, Congiu L, Lindström L, Piiroinen S, Vidotto M, et al. (2014) Sequencing, De Novo Assembly and Annotation of the Colorado Potato Beetle, Leptinotarsa decemlineata, Transcriptome. PLoS ONE 9: e86012–e86012. pmid:24465841
- 16. Vidotto M, Grapputo A, Boscari E, Barbisan F, Coppe A, et al. (2013) Transcriptome sequencing and de novo annotation of the critically endangered Adriatic sturgeon. BMC genomics 14: 407–407. pmid:23773438
- 17. Wiese J, Ohlendorf B, Blumel M, Schmaljohann R, Imhoff JF (2011) Phylogenetic identification of fungi isolated from the marine sponge Tethya aurantium and identification of their secondary metabolites. Mar Drugs 9: 561–585. pmid:21731550
- 18. Brakhage AA (2013) Regulation of fungal secondary metabolism. Nature reviews Microbiology 11: 21–32. pmid:23178386
- 19. Levasseur A, Lomascolo A, Chabrol O, Ruiz-Duenas FJ, Boukhris-Uzan E, et al. (2014) The genome of the white-rot fungus Pycnoporus cinnabarinus: a basidiomycete model with a versatile arsenal for lignocellulosic biomass breakdown. BMC Genomics 15: 486. pmid:24942338
- 20. Cox R, Mirkin SM (1997) Characteristic enrichment of DNA repeats in different genomes. Proc Natl Acad Sci U S A 94: 5237–5242. pmid:9144221
- 21. Hancock JM (2002) Genome size and the accumulation of simple sequence repeats: implications of new data from genome sequencing projects. Genetica 115: 93–103. pmid:12188051
- 22. Linda P, Kempken F (2015) Fungal Transposable Elements; van den Berg MA, Maruthachalam K, editors: Springer International Publishing Switzerland. 79–96 p.
- 23. Ohm RA, Feau N, Henrissat B, Schoch CL, Horwitz BA, et al. (2012) Diverse lifestyles and strategies of plant pathogenesis encoded in the genomes of eighteen Dothideomycetes fungi. PLoS Pathog 8: e1003037. pmid:23236275
- 24. Martin F, Kohler A, Murat C, Balestrini R, Coutinho PM, et al. (2010) Perigord black truffle genome uncovers evolutionary origins and mechanisms of symbiosis. Nature 464: 1033–1038. pmid:20348908
- 25. Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, et al. (2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36: 3420–3435. pmid:18445632
- 26. Cuenca-Estrella M, Gomez-Lopez A, Mellado E, Buitrago MJ, Monzon A, et al. (2003) Scopulariopsis brevicaulis, a fungal pathogen resistant to broad-spectrum antifungal agents. Antimicrob Agents Chemother 47: 2339–2341. pmid:12821493
- 27. Xu Z, Hao B (2009) CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Research 37: W174–W178. pmid:19398429
- 28. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, et al. (2014) Pfam: the protein families database. Nucleic Acids Res 42: D222–230. pmid:24288371
- 29. Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, et al. (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40: D306–312. pmid:22096229
- 30. Pao SS, Paulsen IT, Saier MH Jr., (1998) Major facilitator superfamily. Microbiol Mol Biol Rev 62: 1–34. pmid:9529885
- 31. Walmsley AR, Barrett MP, Bringaud F, Gould GW (1998) Sugar transporters from bacteria, parasites and mammals: structure-activity relationships. Trends Biochem Sci 23: 476–481. pmid:9868370
- 32. Arvas M, Kivioja T, Mitchell A, Saloheimo M, Ussery D, et al. (2007) Comparison of protein coding gene contents of the fungal phyla Pezizomycotina and Saccharomycotina. BMC Genomics 8: 325. pmid:17868481
- 33. Poggeler S, Kuck U (2004) A WD40 repeat protein regulates fungal cell differentiation and can be replaced functionally by the mammalian homologue striatin. Eukaryot Cell 3: 232–240. pmid:14871953
- 34. Harris PV, Welner D, McFarland KC, Re E, Navarro Poulsen JC, et al. (2010) Stimulation of lignocellulosic biomass hydrolysis by proteins of glycoside hydrolase family 61: structure and function of a large, enigmatic family. Biochemistry 49: 3305–3316. pmid:20230050
- 35. Phillips CM, Beeson WT, Cate JH, Marletta MA (2011) Cellobiose dehydrogenase and a copper-dependent polysaccharide monooxygenase potentiate cellulose degradation by Neurospora crassa. ACS Chem Biol 6: 1399–1406. pmid:22004347
- 36. Bey M, Zhou S, Poidevin L, Henrissat B, Coutinho PM, et al. (2013) Cello-oligosaccharide oxidation reveals differences between two lytic polysaccharide monooxygenases (family GH61) from Podospora anserina. Appl Environ Microbiol 79: 488–496. pmid:23124232
- 37. Hemsworth GR, Henrissat B, Davies GJ, Walton PH (2014) Discovery and characterization of a new family of lytic polysaccharide monooxygenases. Nat Chem Biol 10: 122–126. pmid:24362702
- 38. Vu VV, Beeson WT, Span EA, Farquhar ER, Marletta MA (2014) A family of starch-active polysaccharide monooxygenases. Proc Natl Acad Sci U S A 111: 13822–13827. pmid:25201969
- 39. Lo Leggio L, Simmons TJ, Poulsen JC, Frandsen KE, Hemsworth GR, et al. (2015) Structure and boosting activity of a starch-degrading lytic polysaccharide monooxygenase. Nat Commun 6: 5961. pmid:25608804
- 40. Saier MH Jr., Tran CV, Barabote RD (2006) TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Res 34: D181–186. pmid:16381841
- 41. Schippers KJ, Sipkema D, Osinga R, Smidt H, Pomponi SA, et al. (2012) Cultivation of sponges, sponge cells and symbionts: achievements and future prospects. Adv Mar Biol 62: 273–337. pmid:22664125
- 42. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402. pmid:9254694
- 43. Bayry J, Aimanianda V, Guijarro JI, Sunde M, Latge JP (2012) Hydrophobins—unique fungal proteins. PLoS Pathog 8: e1002700. pmid:22693445
- 44. Plett JM, Gibon J, Kohler A, Duffy K, Hoegger PJ, et al. (2012) Phylogenetic, genomic organization and expression analysis of hydrophobin genes in the ectomycorrhizal basidiomycete Laccaria bicolor. Fungal Genet Biol 49: 199–209. pmid:22293303
- 45. Dubey MK, Jensen DF, Karlsson M (2014) Hydrophobins are required for conidial hydrophobicity and plant root colonization in the fungal biocontrol agent Clonostachys rosea. BMC Microbiol 14: 18. pmid:24483277
- 46. Kis-Papo T, Weig AR, Riley R, Persoh D, Salamov A, et al. (2014) Genomic adaptations of the halophilic Dead Sea filamentous fungus Eurotium rubrum. Nat Commun 5: 3745. pmid:24811710
- 47. Keller NP, Turner G, Bennett JW (2005) Fungal secondary metabolism—from biochemistry to genomics. Nat Rev Microbiol 3: 937–947. pmid:16322742
- 48. Blin K, Medema MH, Kazempour D, Fischbach MA, Breitling R, et al. (2013) antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res 41: W204–212. pmid:23737449
- 49. Kramer A, Paun L, Imhoff JF, Kempken F, Labes A (2014) Development and validation of a fast and optimized screening method for enhanced production of secondary metabolites using the marine Scopulariopsis brevicaulis strain LF580 producing anti-cancer active scopularide A and B. PLoS One 9: e103320. pmid:25079364
- 50. Tamminen A, Kramer A, Labes A, Wiebe MG (2014) Production of scopularide A in submerged culture with Scopulariopsis brevicaulis. Microb Cell Fact 13: 89. pmid:24943257
- 51. Lukassen MB, Saei W, Sondergaard TE, Tamminen A, Kumar A, et al. (2015) Identification of the Scopularide Biosynthetic Gene Cluster in Scopulariopsis brevicaulis. Mar Drugs 13: 4331–4343. pmid:26184239
- 52. Yu ZG, Lang G, Kajahn I, Schmaljohann R, Imhoff JF (2008) Scopularides A and B, cyclodepsipeptides from a marine sponge-derived fungus, Scopulariopsis brevicaulis. Journal of Natural Products 71: 1052–1054. pmid:18412398
- 53. Kuck U, Bloemendal S, Teichert I (2014) Putting fungi to work: harvesting a cornucopia of drugs, toxins, and antibiotics. PLoS Pathog 10: e1003950. pmid:24626260
- 54. Tobiasen C, Aahman J, Ravnholt KS, Bjerrum MJ, Grell MN, et al. (2007) Nonribosomal peptide synthetase (NPS) genes in Fusarium graminearum, F. culmorum and F. pseudograminearium and identification of NPS2 as the producer of ferricrocin. Current Genetics 51: 43–58. pmid:17043871
- 55. Sørensen JL, Knudsen M, Hansen FT, Olesen C, Fuertes PR, et al. (2014) Fungal NRPS-dependent siderophores: from function to prediction. In: Martín J-F, Garcia-Estrada C, Zeilinger S, editors. Biosynthesis and Molecular Genetics of Fungal Secondary Metabolites: Springer.
- 56. Sørensen JL, Sondergaard TE, Covarelli L, Fuertes PR, Hansen FT, et al. (2014) Identification of the biosynthetic gene clusters for the lipopeptides fusaristatin A and W493 B in Fusarium graminearum and F. pseudograminearum. Journal of Natural Products 77: 2619–2615. pmid:25412204
- 57. Chiang YM, Szewczyk E, Nayak T, Davidson AD, Sanchez JF, et al. (2008) Molecular genetic mining of the Aspergillus secondary metabolome: Discovery of the emericellamide biosynthetic pathway. Chemistry & Biology 15: 527–532.
- 58. Duressa D, Anchieta A, Chen D, Klimes A, Garcia-Pedrajas MD, et al. (2013) RNA-seq analyses of gene expression in the microsclerotia of Verticillium dahliae. BMC Genomics 14: 607. pmid:24015849
- 59. Kramer A, Beck HC, Kumar A, Kristensen LP, Imhoff JF, et al. (2015) Proteomic analysis of anti-cancerous scopularide production by a marine Microascus brevicaulis strain and its UV-mutant. PLoS one (under revision).
- 60. Fraser JA, Heitman J (2004) Evolution of fungal sex chromosomes. Mol Microbiol 51: 299–306. pmid:14756773
- 61. Debuchy R, Turgeon BG (2006) Mating-type structure, evolution, and function in euascomycetes. In: Kües U, Fischer R, editors. Growth, differentiation and sexuality. Berlin, Heidelberg: Springer. pp. 293–323.
- 62. Dyer PS (2007) Sexual reproduction and significance of MAT in the aspergilli. In: Heitman J, Kronstad JW, Taylor JW, Casselton LA, editors. Sex in fungi. Washington, D.C.: ASM Press. pp. 123–142.
- 63. Galagan JE, Calvo SE, Cuomo C, Ma LJ, Wortman JR, et al. (2005) Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature 438: 1105–1115. pmid:16372000
- 64. Rydholm C, Dyer PS, Lutzoni F (2007) DNA sequence characterization and molecular evolution of MAT1 and MAT2 mating-type loci of the self-compatible ascomycete mold Neosartorya fischeri. Eukaryot Cell 6: 868–874. pmid:17384199
- 65. Wickerham LJ (1951) Taxonomy of yeasts: US Dept. of Agriculture.
- 66. Kempken F, Kuck U (1996) restless, an active Ac-like transposon from the fungus Tolypocladium inflatum: structure, expression, and alternative RNA splicing. Mol Cell Biol 16: 6563–6572. pmid:8887685
- 67. Kollath-Leiss K, Bonniger C, Sardar P, Kempken F (2014) BEM46 shows eisosomal localization and association with tryptophan-derived auxin pathway in Neurospora crassa. Eukaryot Cell 13: 1051–1063. pmid:24928924
- 68. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376–380. pmid:16056220
- 69. Knudsen T, Knudsen B (2013) CLC Genomics Benchwork 6. Available: http://www.clcbio.com. Accessed on 2013 Sept 20.
- 70. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M (2010) KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38: D355–360. pmid:19880382
- 71. Wang H, Xu Z, Gao L, Hao B (2009) A fungal phylogeny based on 82 complete genomes using the composition vector method. BMC Evolutionary Biology 9.
- 72. King N, Westbrook MJ, Young SL, Kuo A, Abedin M, et al. (2008) The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature 451: 783–788. pmid:18273011
- 73. Qi J, Wang B, Hao BI (2004) Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J Mol Evol 58: 1–11. pmid:14743310
- 74. Paradis E, Claude J, Strimmer K (2004) APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics (Oxford, England) 20: 289–290.
- 75. R Development Core Team (2013) R: A language and environment for statistical computing. Vienna, Austria.: R Foundation for Statistical computing.
- 76. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39: W29–37. pmid:21593126
- 77. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30: 1575–1584. pmid:11917018
- 78. Apeltsin L, Morris JH, Babbitt PC, Ferrin TE (2011) Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution. Bioinformatics 27: 326–333. pmid:21118823
- 79. Veeramachaneni V, Makalowski W (2004) Visualizing sequence similarity of protein families. Genome Res 14: 1160–1169. pmid:15140831
- 80. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42: D490–495. pmid:24270786
- 81. Levasseur A, Drula E, Lombard V, Coutinho PM, Henrissat B (2013) Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary redox enzymes. Biotechnol Biofuels 6: 41. pmid:23514094
- 82. Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, et al. (2010) SMURF: Genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol 47: 736–741. pmid:20554054
- 83. Hansen FT, Gardiner DM, Lysøe E, Fuertes PR, Tudzynski B, et al. (2014) An update to polyketide synthase and nonribosomal synthetase genes and nomenclature in Fusarium. Fungal Genetics and Biology in press.
- 84. Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, et al. (2011) CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Research 39: D225–D229. pmid:21109532
- 85. Bachmann BO, Ravel J (2009) Methods for in silico prediction of microbial polyketide and nonribosomal peptide biosynthetic pathways from DNA sequence data. In: Hopwood DA, editor. Complex enzymes in microbial natural product biosynthesis Methods in enzymology. San Diego, CA, USA: Elsevier academic press inc.