De Novo Assembly and Genome Analyses of the Marine-Derived Scopulariopsis brevicaulis Strain LF580 Unravels Life-Style Traits and Anticancerous Scopularide Biosynthetic Gene Cluster

The marine-derived Scopulariopsis brevicaulis strain LF580 produces scopularides A and B, which have anticancerous properties. We carried out genome sequencing using three next-generation DNA sequencing methods. De novo hybrid assembly yielded 621 scaffolds with a total size of 32.2 Mb and 16298 putative gene models. We identified a large non-ribosomal peptide synthetase gene (nrps1) and supporting pks2 gene in the same biosynthetic gene cluster. This cluster and the genes within the cluster are functionally active as confirmed by RNA-Seq. Characterization of carbohydrate-active enzymes and major facilitator superfamily (MFS)-type transporters lead to postulate S. brevicaulis originated from a soil fungus, which came into contact with the marine sponge Tethya aurantium. This marine sponge seems to provide shelter to this fungus and micro-environment suitable for its survival in the ocean. This study also builds the platform for further investigations of the role of life-style and secondary metabolites from S. brevicaulis.


Introduction
of VTT Technical Research Centre of Finland Ltd we emphasize that VTT as a whole is a not-for-profit organisation (http://www.vttresearch.com/about-us/ co-operation-withvtt/mission-and-mode-of-operations for further information). Biocomputing Platforms Ltd is the current employer of MFS and had no involvement in the study. During the time of the study MFS was employed by VTT Technical Research Centre of Finland Ltd. genome, such as repeat contents, mating type loci, carbohydrate-active enzymes, MFS-type transporters and performed a protein domain analysis.

General Genome Features
We have sequenced the genome of the marine-derived S. brevicaulis using three different genome sequencing methods namely Roche 454, Illumina HiSeq 2000 and ion-torrent. Using Roche 454 pyrosequencing, we achieved a 32.2. Mb genome with N50 equals to 88 kb and 935 contigs, which further joined to form 699 scaffolds with N50 of 116.7 kb. Using short reads of Illumina and Ion-torrent, we achieved smaller N50 (and large numbers of scaffolds) as 67.5 kb (2605) and 26.3 kb (12119), respectively. We performed a hybrid assembly using all these three types of reads, this yielded N50 of 131.8 kb with 623 scaffolds. This corroborates that Roche 454 alone is good performer for fungal genome assembly and combining more than one method is the better choice. This is also reflected by data from a recent genome assembly of the white-rot fungus Pycnoporus cinnabarinus [19]. We identified 16,298 putative genes in the assembled genome ( Table 1). The number of identified genes is rather high when compared to other ascomycetes, which may contain about 10,000 to 12,000 genes (Fig 1). The average intron length for this genome is 129.4 bp, which is well within the range of known fungal intron sizes.

Repeat Elements in S. brevicaulis Genome
Repeat elements constitute up to 419,240 bp or 1.33% of the assembled genome of S. brevicaulis. 0.75% of total genome size are tandem repeat sequences, 0.35% are transposable elements (TEs) and 0.20% consisted of low complexity regions ( Table 2 and S1 Table). Low-complexity regions are regions of biased composition and regions enriched in imperfect direct and inverted repeats [20,21]. Retroelements make up about 0.18% of the S. brevicaulis genome. Among these retrotransposons with long terminal repeats (LTRs) are in the majorities (0.16%). Class II DNA transposons comprised 0.17% of the genome and the majority of them belong to the Tc1-IS630-Pogo family. Fungal genomes are generally known to possess a low content of only 1-4% of transposable elements [22]. Only a few fungal groups have higher number of repeats, such as several species of dothiodeomycetes [23] and Tuber melanosporum, a pezizomycetes species [24]. However, these fungi typically have large expansions of the genome size like Tuber melanosporum, which has a genome size of 125 Mb [24]. For further details see a recent review [22].

Genome Annotation and Phylogenetic Analysis
Functional annotation is critical for understanding the genomic data of new species and is supported by Gene Ontology (GO) [25]. GO helps in characterization of genes, transcripts and proteins of many organisms in terms of biological processes (BP), cellular components (CC), and molecular functions (MF) [25]. We have used this method for the functional annotation of S. brevicaulis proteins using the Blast2GO suite [25]. The derived S. brevicaulis proteins were assigned to three functional groups based on GO terminology: BP, CC and MF (S2 Table). We traced 5,159 proteins to BP terms (Fig 2) with the following five top categories: 761 related to oxidation-reduction processes, 485 related to trans-membrane transport, 423 related to regulation of transcription, 318 related to mycelium development, and 187 related to methylation. Under GO annotation of biological processes (BP), we found that S. brevicaulis is equipped with genes and proteins required for pathogenesis (Fig 2). This can be explained by the fact that this opportunistic fungus serves as a pathogen for immune compromised humans and other animals [26]. Similarly, 1,566 proteins were assigned to CC terms (Fig 2) with the five top components being: 570 related to integral membrane proteins, 285 related to different protein complexes, 109 related to ribosome proteins, 107 related to extracellular region proteins, and 92 related to proteins of the nuclear lumen. Finally, 4,129 proteins were linked to MF terms (Fig 2) with the five top categories as follows: 675 related to zinc ion binding, 541 related to ATP binding, 340 related to sequence-specific DNA binding transcription factors, 190 related to hydrolase activity (hydrolyzing O-glycosyl compounds), and 189 related to methyltransferase activity. All GO terms in these three categories are listed in S2 Table. During the BLAST2GO based annotation process, we were able to annotate 9,340 genes (57.31%) while 6,958 genes (43.69%) remained non-annotated in this fungus genome as summarized in S2 Table. A homology based annotation process suggests S. brevicaulis belongs to the Sordariomycetes class and it is most closely related to Nectria and Fusarium species ( Fig  3A), it does not group within that clade, but seems to be somewhat distinct. To evaluate exact location of this fungus, we performed a genome-wide phylogenetic analysis using the CVtree [27]. We found that S. brevicaulis has diverged early from other representative Sordariomycetic fungi such as Verticillium, Glomerella, Coletotrichum, Nectria, Fusarium, Metarhizium, Trichoderma, Magnaporthe and Neurospora (Fig 3B).

Protein Domains of S. brevicaulis
Protein domains are biochemically independently foldable structural units, which depicting evolutionary conservation with the presence of at least one protein motif. This implies that proteins carrying common domains may have similar functions. Hence, this is an important source for scanning new genomes for putative proteins with similar functions. There are two Genome Analysis of Marine-Derived Fungus state of the art databases namely the Pfam [28] and the Interpro [29], both used for protein domain analysis. This analysis is helpful for better annotation of new genomes. We found a total of 10,458 deduced protein sequences of S. brevicaulis associated with all eukaryotic protein domains (S3 Table) and top 20 Pfam domains, which are summarized in Fig 4. Additionally, we found two transporter domains in Pfam with 221 proteins harboring a major facilitator superfamily/MSF_1 domain (PF07690.11), and 107 proteins containing a sugar (and other) transporter/sugar_tr domain (PF00083. 19). These transporters are generally single-polypeptide secondary carriers involved in transportation of sugars and other small solutes in response to chemiosmotic ion gradients [30,31].
Two transcription factor domains are also in this list with 205 fungal specific transcription factor domain/ Fungal_trans (PF04082.13) and 112 fungal Zn(2)-Cys(6) binuclear cluster domain/Zn_clus (PF00172.13). These proteins serve as transcription regulatory elements. We compared all transcription factor domains, which suggested that these two fungal transcription factor domains are highly expanded in selected ascomycetes (S4 Table) as shown previously [32]. We detected 112 G-beta repeat/WD40 (PF00400.27) domains, which may be involved in signal transduction. Additionally, WD40 domains are also regulate fungal cell differentiation processes [33]. We computed comparative protein domain analyses using selected fungal genomes (S5 Table). All protein domains of S. brevicaulis genome are in accordance with known fungal genomes from ascomycetes.

Plant Biomass Associated Metabolism Evident from Comparative Analyses of Carbohydrate Active Enzymes
The type of association between sponges and fungi and the corresponding ecological function remain unclear and little evidence is available on fungal adaptation to sponges (if any). Studying the carbohydrate-active enzymes (CAZy) profile could provide interesting information on the main families represented in S. brevicaulis genome and perhaps reveal its substrate preference and its nutritional relationship with the sponge.
The carbohydrate portion of land plants is intimately linked to lignin, and auxiliary activities (AA) are needed to give access to GH in order that plant modifying or degrading fungi could penetrate into the cell wall and gain access to the carbohydrate energy source. Considering the AA families acting on lignins, S. brevicaulis is composed of a poor set of laccase-like oxidases (AA1) and peroxidases (AA2), but with a substantial number of enzymes of the glucosemethanol-choline (GMC) superfamily, i.e. 23 AA3 with one cellobiose dehydrogenase (CDH, AA3_1), 19 putative aryl alcohol oxidases and glucose oxidases (AA3_2), and three alcohol oxidases (AA3_3). In addition, a low number of glyoxal oxidase (AA5), providers of H 2 O 2 as other members of the GMC family suggest that the fungus does not possess a strong ligninolytic capacity. In contrast, other oxidative enzymes targeting the carbohydrate portion are well represented. For instance, there are four gluco-oligosaccharide oxidases (AA7), and 27 potential members of the lytic polysaccharide mono-oxygenases (LPMO) oxidatively cleaving the glycosidic chains on the crystalline surface of cellulose, chitin or starch (AA9, 11 and 13, respectively). LPMOs create entry points for hydrolytic cellulases, chitinases or amylases. Their recent discovery opened a new route to accelerate biomass degradation in biotechnological applications [34]. Phillips et al. [35] and Bey et al. [36] demonstrated that AA9s and CDH (AA3_1) of N. crassa and of Pycnoporus cinnabarinus, respectively, act in concert to cleave cellulose oxidatively. LPMOs of families AA11 and AA13 recently identified from N. crassa, Aspergillus nidulans and Aspergillus oryzae [37][38][39] are also represented in the S. brevicaulis genome, suggesting that the fungus could be able to cope with a large variety of plant substrates to degrade.

Analyses of MFS-Type and Sugar Transporters Also Support Plant Biomass Associated Metabolism
Taking into account the entire CAZyme repertoire, it is clear that S. brevicaulis has a metabolism capable of break down plant biomass. The same picture emerges when S. brevicaulis proteins predicted to harbor either a MFS domain (PF07690.11), or a sugar (and other) transporter domain (PF00083. 19) are compared to the corresponding transporter complement of N. crassa, a representative plant biomass saprophyte. N. crassa has only about half as many transporters encoded in its genome compared to S. brevicaulis (159 vs. 328 with the same PFam annotations). Yet an overall similar distribution of the transporters can be observed across the categories as defined by the Transport Classification database (TCDB; [40]) can be observed ( Fig 5).  Table). Transporters in fungi are notoriously under-characterized, and thus clear annotations are difficult, but the comparison indicates that the two transporter families linked to sugar uptake (2.A.1.1 with 102 vs. 37 members in S. brevicaulis and N. crassa, respectively) and the uptake/transport of small charged solutes and metabolites (2.A.1.14 with 94 vs. 26 members) are overrepresented in S. brevicaulis as compared to N. crassa, suggesting a broadened substrate spectrum that this fungus is able to utilize. This feature could have been potentially helpful in the transition from a soil fungus [26] to a marine sponge habitat. As Tethya aurantium (http://www.marlin.ac.uk/ index.php, species ID 4450) grows on rocks and stones in the shallow sub-littoral, it is likely that the sponge may have taken up fungal spores drifted from nearby shores. The sponges may  [40] according to the number of classified MFS type and sugar transporters of S. brevicaulis. C. Distribution of TCDB categories [40] according to the combined RPKMs of the assigned MFS type and sugar transporters. The size of each category is presented as percentage of the total number of RPKMs. All categories with less than two percent were grouped together ("other"). The respective values for these categories are presented enlarged in the bar to the right. have acted as a spore trap or a shelter. Since S. brevicaulis is able to act as pathogen of humans associated with onychomycosis [26], it may also be able to dwell in a sponge. Therefore, the sponge may have created a suitable micro-environment for a terrestrial fungus that could adapt to the sea salt environment and find nutritional resources. It is known that other fungi from sponges are rather related to fungi from terrestrial sources and are generally able to cope with media containing salt concentration found in the marine environments [41]. Alternatively, it may happen that marine sponge-associated fungi are able to survive without any knowledge of their hosts. It is beyond the scope of this manuscript to explore further details into aspects of fungal-sponge relationships. Nevertheless, our work clears the way for the potential of genomic investigation to study such marine fungal strains.

Characterization of Gene Content and Expression Using RNA-Seq
The S. brevicaulis LF580 genome contains over 16,000 genes, which is on the higher side for known ascomycetes (Fig 1). It is interesting to see how many of these are expressed in a single condition. To evaluate this status, we extracted RNA of S. brevicaulis strain M26 growing in WSP30 medium (see Materials and Methods section), which also supports production of Scopularide A and B. We performed RNA sequencing using Illumina HiSeq 2000. Resulting reads were mapped to the putative genes of the assembled S. brevicaulis genome. A total of 14,724 genes were found to be expressed in this analysis, which represents 90% of the entire gene complement. These expressed genes were classified into 10 tiers based on their reads per kilobase of transcript per million mapped reads (RPKM) values ( Table 3 and S8 Table). Tier #1 has 120 genes with RPKM values >1000, which accounts for 0.8% of all expressed genes, while 26% (3832 genes) were detected with very low transcript quantities with RPKM values ranging from higher than 0 to lower than 1.0 (marked by red font or blue shade in S9 Table, respectively) and these were all placed into tier #10 (non-expressing genes are marked by yellow shade in S9 Table). To further evaluate highly expressing genes in the mutant M26, we examined selected genes and their expression patterns tier-wise according to their RPKM values. In the following, we provide some vignettes of top expressing genes in the UV-mutant M26.
Regarding MFS-type transporters, the accumulated expression per category broadly follows their TCDB classification distribution (compare Fig 5B and 5C) with one notable exception. Class 2.A.1.12 (the Sialate:H + Symporter family; dark green), is greatly overrepresented in terms of transcript abundance (only 2 genes, but with 5.9% of total transporter-specific transcript) due to g12790.t1 being the second most highly expressed transporter in the genome ( Table 4). Homology search by BLAST [42] suggests that g12790.t1 encodes for a carboxylic acid transporter, such as for lactate or pyruvate uptake, which should have been abundant in the rich medium S. brevicaulis was grown in. An analysis of the remaining genes in the list of top 10 transcribed transporters suggests that these collectively help to satisfy some of the major nutritional requirements of the fungus, such as for carbon and nitrogen as well as vitamins. Sources for these are carbohydrates (hexoses such as glucose, pentoses and other polyols; g14394.t1, g3025.t1, g3159.t1, and g6510.t1), small organic, nitrogenous compounds such as allantoate (g10354.t1), and important nutrients such as the B-vitamins niacin (g116.t1 and g12121.t1) and (potentially) biotin (g10354.t1).
In summary, transcripts for about 90% of the genes in this genome could be detected. This rather high value indicates that S. brevicaulis has a higher number of expressed genes than most other Ascomycetes.

Overview of Bioactive Compounds Encoding Genes
The S. brevicaulis genome has 16 genes encoding for non-ribosomal peptide synthetases (NRPSs) (Fig 6) with three NRPS genes (NRPS1-3) encoding enzymes with a multi-modular organization with more than one condensation domain. This modular architecture is known to be specific for fungal NRPSs [47]. The domain organization of putative NRPS and PKS proteins is shown in Fig 6. Additionally, we identified six full-length polyketide synthase genes (PKSs), one fatty acid synthase (FAS) gene and three putative terpene encoding genes in the genome ( Table 5). Additional single domain enzymes such as reductases and cytochrome P450 monooxygenases were also identified but not taken into further consideration. All these genes are localized into 18 different clusters (Fig 7 and S2 Fig), which include four NRPS clusters, six PKS clusters, and five other clusters that have NRPS6-NRPS16 genes. Since the encoded NRPSs of these genes are not modular in nature, these are placed separately by the AntiSMASH tool [48] in comparison to other clusters. A single cluster was identified on the scaffold477, which possesses NRPS1 and PKS2 genes in the N-terminal 78 kb region (Fig 8), which is composed of the contig264 and contig358. Corresponding clusters of supporting genes and their expression values are shown in Fig 8. Our data indicate that this gene cluster, which is involved in the scopularide production and indeed, it is actively expressed under conditions supporting scropularides A and B production [49,50]. The nrps1 gene (g12932) is the best candidate gene to be responsible for production of the cyclic lipopeptide scopularide [51], which consists of five amino acids (glycine, L-valine, D-leucine, L-alanine and L-phenylalanine), and a reduced carbon chain [52]. Its production scheme is shown in Fig 8B. The reduced carbon chain (3-hydroxy-methyldecanoyl) may be derived from the product of the pks2 gene. This is further supported by the fact that the two genes (nrps1 and pks2) are localized on a single cluster on the scaffold477. This cluster has a high degree of similarity with clusters in the genomes of . This may suggest horizontal gene transfer from bacteria to fungi. Indeed, horizontal gene transfer is considered to be a major source of metabolite diversity in fungi [53]. The nrps2 gene (g8056) encodes an enzyme which is a homologue of the synthetase responsible for production of the iron-chelating siderophore ferricrocin (SidC) [54], found in numerous fungi [55]. The third multi-modular NRPS, encoded by nrps3 (g5523) contains four adenylation domains, but the product is currently unknown. The gene was not expressed under the examined conditions and BLASTP analyses did not identify orthologs with known products.   Five of the six PKS proteins (PKS1-5) contained the reducing domains dehydratase, enoylreductase and ketoreductase. The only actively expressed PKS gene was pks2 (g14542), which has possible orthologs in Aspergillus nidulans (AN2547), F. graminearum (PKS6; FGSG_08208) and F. pseudograminearum (PKS40, FPSE_09183). The encoded PKSs are involved in production of the lipopeptides emericellamide, fusaristatin and W493, respectively [56,57], each consisting of a reduced carbon chain provided by the PKSs, which is requited by NRPSs together with three to seven amino acids. The resulting product is then released by the NRPS by cyclization. The pks6 gene (g13622) on the other hand has a non-reducing PKS protein product and BLASTP analysis against GenBank showed that it is shares similarities with the mycelium pigment synthase and shares 71% identity to a PKS (VDAG_00190) that has been proposed to be involved in the biosynthesis of melanin in Verticillium dahlia [58]. Hence, PKS6 could be involved in pigment biosynthesis in S. brevicaulis. These pks genes are type I PKS genes and are localized in 5 different clusters (Fig 7 and S2 Fig). Cluster 9 is the only cluster (Fig 7 and  S2 Fig), which can lead to type III PKS, which might be responsible for the production of chalcone and stilbene synthase as the key enzyme shows 70% identities with homologous gene in the Colletotrichum higginsianum (GenBank, ID CCF34076.1). We also identified three genes encoding aristolochene synthase (g9860.t1), geranylgeranyl diphosphate synthase (g13546.t1) squalene synthase (g5738.t1) forming two clusters (Fig 7 and S2 Fig) on the scaffolds scaf-fold440 and scaffold446.
At current the secondary metabolite products produced by many of these proteins are unknown as it is typical for many fungi studied. As genome sequencing of fungi has become affordable we expect more and more fungal genomes being available in the public databases, which will lead into a better picture of homologous gene clusters and their final products. This opens opportunities for other researchers for utilizing genome wide information of this fungus to explore the potentials of these genes and their clusters. In addition this analysis, a separate study was carried out for characterization of scopularide producing proteins using iTRAQbased proteomics analysis [59].
Upon scanning expression profiling based on RNA-seq data, we found that all of the mating type genes of S. brevicaulis are expressed. The MAT1-1-1 gene has highest expression among three mating genes, which is followed by MAT1-1-3 and MAT1-1-2 (Fig 9B). The flanking gene SLA2 gene has a particular high expression, which is 10-fold higher in comparison to the by scaffold and positions on the scaffold and different colors are illustrating different types of clusters. Biosynthetic genes are key gene (like either nrps or pks and so on) and main supporting genes such as a cytochrome P450 gene as per antiSMASH [48] guidelines. Similarly, other genes are any other gene in the cluster, which are not key genes, regulatory (such as transcription factor or suppressor) and transporters (such as ABC transporter) and are marked in grey shade.
doi:10.1371/journal.pone.0140398.g007 APN2 gene. Overall, we report that conserved genes of the mating type loci of S. brevicaulis are expressed.
Overall, the presence of three MAT1-1 genes and absence of MAT1-2 gene in the S. brevicaulis genome corroborate that S. brevicaulis LF580 is a MAT1-1 strain. Additionally, all three MAT1-1 genes appear to be functional genes, because these genes shown expression profiles in the RNA-Seq datasets. Overview of scopularide producing mechanism. A. Summary of scopularide producing cluster of NRPS1/PKS2 of S. brevicaulis. This cluster is localized on the scaffold477 in a region of 78 kb at the 3' end of this scaffold (scaffold size 280kb). This cluster is active as different flanking genes (marked in blue) are expressed during UV-mutagenesis based RNA-Seq experiment. Top homologs clusters are found in both fungi and bacteria, which hints that this clusters might have originated via horizontal gene transfers from bacteria. P-Cu amine oxidase-Peroxisomal copper amine oxidase; Tri101-Trichothecene 3-o-acetyltransferase; EutQ-Ethanolamine utilization protein like EutQ. B. Schema of generation of scopularide using NRPS1 and PKS2 of S. brevicaulis. Modified from Lukassen et al. [51]. doi:10.1371/journal.pone.0140398.g008

Conclusions
Our article presents the draft assembly of the S. brevicaulis strain LF580 genome isolated from marine environment. Using three different sequencing methods, the genome was assembled with genome size of 32.2 Mb harboring 16,298 putative genes. We identified 18 gene clusters responsible for secondary metabolite production, which appear to express secondary metabolite enzymes. This includes a cluster with NRPS1 and PKS2 genes, which together synthesize scopularides with anticancerous properties. In summary, by combining genomic and transcriptomic data, we have compiled new genetic and expression information for a marine-derived strain of S. brevicaulis. Moreover, we analysed the obtained genome data for clues explaining the necessary life style changes.

Methods
Collection of Fungal Strain, Cultivation, and DNA Isolation S. brevicaulis LF580 strain was cultivated as previously described [49]. The strain was obtained from the fungal collection of the Kiel Center for Marine Natural Products as cryo-conserved material. Originally, this strain was isolated from the inner tissue of the marine sponge Tethya aurantium. This fungus was cultivated on solid WSP30 medium, which is a variant of Wickerham-medium (with composition as following 1% glucose, 0.5% soy peptone, 0.3% malt extract, 0.3% yeast extract, 3% NaCl) [65]. S. brevicaulis M26 [47] was provided by Linda Paun (Kiel). Genomic DNA from S. brevicaulis was prepared by following a modification of previously published methods [66,67]. Mycelium was frozen in liquid nitrogen, pulverized, and incubated in equal volumes of lysis buffer (10 mM Tris-HCl, 1 mM EDTA, 100 mM NaCl, 2% SDS, pH 8.0), After centrifugation, the supernatant was treated with RNase, and afterwards with an equal volume phenol/chloroform (1:1).  Table). Ion-torrent sequencing was carried out with 20 μg genomic DNA at Genotypic Technology (Bangalore, India) and 630 Mb of Ion-torrent reads were generated with average length of 119 bp (S11 Table). The whole genome sequencing and RNA-Seq data for S. brevicaulis is publically available using BioSample accession ID: SAMN03764504 and corresponding BioProject accession ID: PRJNA288424.

Genome Assembly, Repeat Detection, Gene Prediction and Annotation Analyses
Roche 454 reads were assembled into contigs using Newbler assembler [68]. Several Genome assemblies were performed using de Brujin graph based method by de novo assembler in the CLCBio Genomic workbench [69] using generated reads of Illumina, and Ion-Torrent and all reads for hybrid assembly. Scaffolding of contigs generated by respective assemblies was carried out using genome finishing module of CLCBio Genomic workbench [69]. The Repeat elements were predicted using RepeatMasker and RepeatProteinMasker software programs (Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-4.0.0 1996-2013 http://www.repeatmasker.org) using the fungal transposon species library (database version 20120418) as input. Gene prediction was performed using Augustus gene prediction tool using Asperigillus niger as training dataset. This prediction was compared with other genome prediction tools. Predict genes were annotated using BLAST homology searches [42] with an E-value cutoff of 1e −3 , supported by BLAST2GO tool [25]. Predicted coding regions were annotated using BLAST [42] with comparing the Kyoto Encyclopedia of Genes and Genomes (KEGG) [70], Swiss-Prot, TrEMBL, Gene Ontology (GO), and non-redundant (NR) databases.

Genome-Wide Phylogenetic Relationship
In order to confirm the phylogenetic position of species under study we reconstructed a phylogenetic tree using the CVtree [27]. CVtree is an alignment free composition vector tree based method and hence does not require selection of specific genes for phylogeny reconstruction. The only parameter required by the method is k, that was set to 7 [71]. We used the fully predicted proteomes of 67 fungi, and the choanoflagellate Monosiga brevicollis as an outgroup [72]. Bootstrap scores for phylogeny were calculated as in [73] by randomly sampling the proteome of each species, with replacement, to create a novel perturbed proteome for each of the 100 bootstrap runs. A representative subset of the 68 species was plotted using APE package [74] within the R computing environment [75].

Protein Domain Estimation
Predicted proteins of this genome were scanned to all known Pfam (version 27) [28] and Interpro (version 43) [29] protein domains collections, respectively. Pfam domains were predicted using HMMER 3.0 [76], removing overlapping clans. In order to compare protein coding gene content across fungal species we constructed a python script to carry out the following tasks. Interpro database [29] was searched for Pfam [28] identifiers corresponding to Interpro identifiers of interest. For each Pfam [28] identifier pfam_scan.pl (ftp://ftp.sanger.ac.uk/pub/ databases/Pfam/Tools/), a wrapper for HMMER 3.0 [76], was run to find the matching proteins in the genomes of interest. To analyse subfamily structure of each Pfam family's member proteins in the genomes, the corresponding protein sequences were collected and mcl clustered [77]based on the E-value matrix of all-vs-all BLASTP [42]. The e-value matrix was tresholded prior to clustering [78]. The mcl clustering has a single major parameter that defines the granularity of the clustering i.e. the inflation value. mcl clustering was run over the range of possible inflation values. For each inflation value, a sensitivity and specificity was calculated for the clustering as previously described [32,79]. In order to calculate these, other secondary Pfam matches were determined for the member proteins of the Pfam under study and the most variable secondary Pfam selected for sensitivity and specificity calculations. Sensitivity and specificity were centred and the inflation value corresponding to their minimum difference selected to get a single subfamily clustering for each Pfam. A R-script using the APE package [74] within the R computing environment [75] was used to plot and process the result tables. The program code for the analysis is available at https://github.com/fahad-syed/ProSol.git.

Identifications and Classifications of CAZyme Domains
All putative proteins were compared to the entries in the CAZy database [80,81] using BLASTP [42]. The proteins with E-values smaller than 0.1 were further screened by a combination of BLAST searches against individual protein modules belonging to the following classes auxiliary activities (AA), glycoside hydrolases (GH), glycosyltransferases (GT), polysaccharide lyases (PL), carbohydrate esterases (CE) and carbohydrate-binding modules (CBM) in the CAZy database (http://www.cazy.org/). HMMER 3.0 [76] was used to query a collection of custommade hidden Markov model (HMM) profiles constructed for each CAZy family. All identified proteins were then manually curated and whenever possible, assigned to a subfamily within a family.

Classification of MFS-Type and Sugar Transporters
For a more precise classification of the S. brevicaulis genes annotated according to Pfam as MFS-type or sugar transporters (328 genes total), the Transporter Classification Database (TCDB) was used [40]. In addition, the 159 transporter genes of N. crassa with the same Pfam annotations were also classified using TCDB [40]. To this end, sequence similarity searches were performed against the TCDB for each gene using BLASTP [42] with default parameters. To ensure a certain level of stringency, only E-value of 1e -10 and below were considered as reliable hits. If a BLAST [42] search met these preconditions, the corresponding gene was classified into the same category as the TCDB homolog with the best e-value. When the E-value threshold was exceeded for all TCDB results, the gene was not categorized. Some TCDB results exhibited low e-values and diverse categories. In these cases, the respective genes were flagged as uncertain, but still categorized for further analysis.
Moreover, the ten most highly expressed genes were further analyzed by performing a BLASTP [42] sequence similarity search against the RefSeq database (NCBI) with default parameters to identify homologs not present in TCDB with a descriptive annotation.

RNA Isolation, Sequencing and RNA-Seq Analyses
Cultivation of fungal strain M26 was done in WSP-30 medium for 7 days at 200 rpm in the dark. RNA was isolated using previously known methods for RNA isolation [13,14,66]. RNA sequencing was performed using Illumina HiSeq™ 2000 at the Beijing Genome Institute (BGI) (Shenzhen, China). A total of 17,452,507 illumina reads were obtained for the S. brevicaulis. Raw reads were mapped to predicted genes using RNA-Seq mapping tool of CLC Bio Genomic workbench [69] and relative expression levels were calculated as Reads Per Kilobase of transcript per Million mapped reads (RPKM).

Detection of Bioactive Encoding Genes and Their Clusters
Initially, putative genes that encoding for proteins which produce bioactive compounds are identified using BLAST [42] with an E-value < 1e −3 . Subsequently, this genome was analysed using SMURF [82] and antiSMASH [48] for putative clusters and further examined by manually coupled with RNA-Seq data. The functional domains of PKSs and NRPSs were identified as previously described [83], using a combinations of tools namely antiSMASH [48], NCBI Conserved Domain Database [84], InterPro [29] and the PKS/NRPS Analysis Web-site [85].