A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.


Introduction
Secondary metabolism in fungi produces a variety of chemical compounds including toxins and antibiotics. These secondary metabolites (SMs) are not necessary for fungus growth or conidiation, but sometimes confer pathogenicity, virulence and fungal adaptation to the environment [1,2]. To date, many secondary metabolism biosynthesis pathways (SMBPs) have been characterized in fungi. These pathways commonly contain 'backbone enzymes', i.e., polyketide synthase (PKS), nonribosomal peptide synthetase (NRPS) or dimethylallyl transferase (DMAT), for generation of the carbon skeleton. And they also encode several other kinds of enzymes that perform post-modification of the intermediate products, e.g., hydroxylase, Omethyltransferase and cytochrome P450. Genes involved in the biosynthesis of a particular metabolite are always organized in clusters and are often co-regulated, e.g., the biosynthetic pathway for the well-known metabolites aflatoxin (AF) and sterigmatocystin (ST) contains about 26 genes, including a cluster specific transcription factor AflR that positively regulates the expression of about half the genes in the cluster [3][4][5].
Based on these common features of SM biosynthetic gene clusters, many bioinformatics tools have been developed and widely used to characterize the genetic basis of SM biosynthesis. These tools, such as the most frequently used SMURF and antiSMASH, are complementary to chemical and genetic approaches and have proven very useful in unraveling the biosynthesis pathways of novel compounds. Especially in the era of post genomics, fungal genome sequencing projects are rapidly increasing in number [6,7].
More than twenty Aspergillus genomes have been sequenced, including the main pathogens for invasive pulmonary aspergillosis (IPA) like A. fumigatus, A. flavus and A. terrus. However, the rarely found IPA pathogens, such as A. alliaceus, A. ustus and A. udagawae [8], have not been well studied at genome level. A. ustus belongs to the section Usti and is a mostly free living micro-organism. But it is also an opportunistic pathogen of immunocompromised patients, causing infection in lungs, eyes and hands [9][10][11][12]. However, in contrast to the extensive information available about other Aspergillus species, including A. fumigatus (the major cause of invasive pulmonary aspergillosis [13]), and A. nidulans (the model organism for studying a wide range of topics including development, cell cycle control and metabolism [14]), the biology and genetics of A. ustus is largely unknown. For a better understanding of the genetics of this rare pathogen, we sequenced the genome of a freeliving A. ustus strain and performed a comparative analysis with special attention to its SMBPs.

Genome sequencing and gene annotation
The A. ustus strain sequenced in this work, A. ustus 3.3904, was provided by the China General Microbiological Culture Collection Center (Beijing, China). Genomic DNA was prepared using the CTAB method as described previously [15]. Both a pair-end library (ca. 300 bp insertion) and a mate-pair library (ca. 3 kb insertion) were constructed and sequenced using HiSeq 2000 platform (Illumina, San Diego, CA, USA). Sequences were checked and assembled in the CLC Genomics Workbench (CLC Bio, Aarhus, Denmark). All reads were trimmed with default parameters and were then assembled by the de novo assembly module (word size 20). The draft assembled A. ustus genome has been deposited in GenBank under the accession number JOMC00000000.
Protein-coding genes in A. ustus were identified by AUGUSTUS software using the default parameters and the gene set of A. nidulans was used as the training data [16]. These predicted genes were primarily annotated by homology search against non-redundant protein database (National Center for Biotechnology Information, www.ncbi.nlm.nih.gov) using BLASTP tool (E-value<1e-3, identity>25%, query coverage>50%) [17].
The genome information of 21 sequenced Aspergillus species was downloaded from AspGD, including their predicted proteomes and the constructed orthologous groups (www. aspergillusgenome.org) (S1 Table). A comparative analysis of the predicted proteomes of A. ustus and the collected Aspergillus species was performed using BLASTP (E-value<1e-3, identity>25%, query coverage>50%). Gene ontology annotations were transferred to A. ustus genes when at least two of their best hits belonged to the same constructed orthologous group. Reciprocal best hits between A. ustus and A. nidulans were collected under strict parameters (E-value<1e-10, identity>50%, query coverage>50%).

Identification of SM biosynthetic gene clusters
Identification of SM biosynthetic gene clusters in A. ustus 3.3904 was performed following these steps: Firstly, SM biosynthetic gene clusters in A. ustus were predicted by SMURF [18]. Secondly, these clusters were extended by three protein coding genes in genome on both sides and were then compared to the annotated clusters of A. nidulans, A. niger and A. oryzae using BLASTP (E-value<1e-10, identity>50%, query coverage>50%). If the newly added gene was mapped to the same genomic region as SMURF predicted cluster, or was assigned GO functions associated with the SM biosynthesis [19], it would be accepted as a component of the eventually determined A. ustus SMBPs and step2 would be repeated.

Phylogenetic analysis
Phylogeny of Aspergillus species was investigated by analyzing their conserved proteins, which were collected by parsing the result of homology search between A. ustus and other Aspergillus species (E<1e-10, identity>80%, and the aligned regions covered >80% of both query and subject sequences) (S2 Table). ClustalW2 was used to align the concatenated sequences of selected proteins [20], and the alignment was subsequently used in neighbor-joining tree construction by MEGA 6 [21]. The phylogenetic tree was tested by bootstrap analysis with 1,000 replicates.

Identification of horizontally acquired clusters
Putative horizontal transfer of SM biosynthetic gene clusters to A. ustus were discovered by the strategy similar to HGTector [22], i.e. a gene cluster predicted to be horizontally acquired should meet the following criterion: firstly, the cluster was absent in the closely related Aspergillus species; secondly, the cluster was found in distant organisms and the homologous gene pair shared pronounced sequence similarity (identity>50%).

Results and Discussion
Genome assembly and annotation A total of 65 Mbp short reads were generated by sequencing the pair-end and mate-pair libraries. Approximately 94% of the short reads were assembled into 770 scaffolds (>500 bp) that represented the A. ustus 3.3904 chromosomes, with an average G+C content of 50.5%, a total length of 38.3 Mbp and an average length of 50 kbp. The chromosome-encoded genes were predicted by an ab initio method and by comparative genomics, which yielded a total of 13,143 predicted genes, showing a comparable gene density with other Aspergillus [23] (Table 1). Homologs for 93% of A. ustus protein coding genes in nr database, with about 75% genes had their best hits in other Aspergillus species.

Comparison of predicted proteomes
A phylogenetic analysis using several housekeeping genes, including beta-tubulin and cytochrome oxidase subunit I (cox1), has revealed the close relationship between A. ustus and the Nidulantes section species A. nidulans (which is also referred to as Emericella nidulans for its sexual form) [24]. In this study, a comparative analysis of the predicted proteomes was performed between A. ustus 3.3904 and the other 21 Aspergillus species, showing that 69% and 89% of the proteins encoded by A. niger (strain CBS 513.88) and A. clavatus (strain NRRL 1) (cutoff identity 25% and query coverage 50%), respectively, had homologs in A. ustus, with the other species scattered in between. Interestingly, further analysis found that A. nidulans, A. sydowii and A. versicolor shared a large number of highly similar genes (identity>75%) with A. ustus, which is consistent with their close relationships in phylogeny revealed by conserved genome content (Fig. 1).
A. nidulans, the species most closely related to A. ustus 3.3904 among investigated Aspergilli, has been extensively studied for SM biosynthesis, and its genome has been well annotated [14]. A comparative analysis between A. ustus and A. nidulans revealed a pronounced similarity between them in gene content that 70% of A. nidulans gene set had reciprocal best hits (RBH) in A. ustus. A detailed comparison of their SM biosynthesis was subsequently performed with the purpose of identifying the recent evolution of the SMBPs (the results are described in the next sections).

Identification of SM biosynthetic gene clusters in A. ustus
SM biosynthetic gene clusters have been well annotated in genome-wide studies of some Aspergillus species, and the numbers vary greatly, for example, 81 for A. niger but only 39 for A. fumigatus [19]. In this study, a total of 52 SM biosynthetic gene clusters and partial clusters were predicted in A. ustus 3.3904 by bioinformatics analysis and comparative analysis ( Table 2). More than half of these clusters contained genes encoding PKS/PKS-like enzymes, while the others encoding NRPS/NRPS-like (18) or DMAT (6), for generating the SM carbon skeletons. Although the products of most clusters in A. ustus remain unknown, some of them were predicted from the conservation of the SMBP components in the genus Aspergillus, including cluster14 (monodictyphenone), cluster39 (sterigmatocystin)(ST), cluster47 (emericellamide), cluster49 (ferricrocin) and cluster50 (asperthecin).
Moreover, some SMBPs in A. ustus 3.3904 were characterized in Aspergillus species for the first time, such as the biosynthesis pathway of viridicatumtoxin (VRT) (cluster27). VRT is a tetracycline-type antibiotic, which can inhibit the growth of methicillin-resistant and quinolone-resistant Staphylococcus aureus with high activity [25]. Until recently, the vrt cluster was only characterized in Penicillium aethiopicum, including vrtA-L and two regulators (vrtR1 and vrtR2), which was assumed to be a recent acquisition by horizontal gene transfer [26]. In this study, homology search of this cluster against nr database identified another vrt cluster that was encoded by the genome of an insect pathogen, Metarhizium acridum [27] (BLASTP with the cutoff identity 50% and query coverage 50%). And in A. ustus 3.3904, we found a vrt gene cluster containing 19 genes (AUSM27_001-019), most of which shared the sequence identities of 56-76% and 62-71% with their counterparts in P. aethiopicum and M. acridum, respectively. The possible regulatory components in A. ustus vrt cluster diverged much more than the other components, i.e., the vrtR1 gene was found not embedded in the vrt gene cluster (ca. 55% identity), and the VrtR2 showed low sequence identity (ca. 36%) to its orthologs. Notably, the vrt cluster of A. ustus 3.3904 is lack of the coding genes of VrtE (cytochrome P450) and VrtF (Omethyltransferase), thus further study is necessary to determine the end products. Some organic compounds produced by another A. ustus strain, KMM 4640, have been determined by chemical methods, including patulin (PAT), ST and cladosporin [28]. Among them, PAT biosynthetic pathway has been characterized in some other Aspergilli and Penicillia, like Aspergillus clavatus and Penicillium expansum [29], however, a homology search in A. ustus 3.3904 did not find any potential gene cluster encoding enzymes associated with PAT biosynthesis (cutoff E-value<1e-3, identity>25% and query coverage>50%), indicating that production of PAT is not a common feature of A. ustus.

Comparative analysis of SM biosynthetic gene clusters in Aspergillus species
The increasing availability of Aspergillus genomes has led to a rapid identification of SMBPs in recent years and subsequently has revealed that only a small proportion of SMBPs are conserved between even closely related species [19]. In this work, comparative analysis of SM biosynthetic gene clusters was performed between A. ustus and the other three well annotated Aspergillus species, i.e., A. nidulans, A. niger and A. oryzae, revealing that A. ustus shared the greatest number of SMBPs with A. nidulans, like cluster2 (ivo), cluster18 (pki), cluster 32(pkf) and several other gene clusters with function unknown (S3 Table). Much less SM biosynthetic gene clusters were conserved in A. niger or A. oryzae, but interestingly, the cluster35 (function unknown) was highly conserved among all these four species. We also checked the A. ustus SM biosynthetic gene clusters in other sequenced Aspergillus species and found five of them were specific to A. ustus, which therefore indicated horizontal acquisition. Similar gene clusters for four of them, i.e., cluster 27, 29, 44 and 45, were found in P. aethiopicum, Trichoderma virens or Talaromyces marneffei (S4 Table), indicating penicillia are important reservoir of SM biosynthetic gene clusters in A. ustus.
In addition, the gene organization of particular SM clusters in different fungi also shows remarkable variation. Although the gene organization does not appear to be crucial for SM biosynthesis [30], it would provide another way to understand fungal evolution. As most of the SMBP components have not been well characterized, the st cluster that was usually found in Aspergillus as well as the vrt cluster, which was found in Aspergillus for the first time, were used, for example, to show the variation of the organization in SM clusters (Fig. 2).
The st clusters of A. ustus and another three closely related species (A. versicolor, A. nidulans and A. ochraceoroseus) were compared. The contents of the st clusters in these species are almost the same, and the st genes share amino acid identities of 46-87% between A. ustus and the other species. Interestingly, the structure of the st cluster in A. ustus was most like that of A. versicolor rather than like that of A. nidulans, with a slight variation in the stcD location, i.e., in the other three Aspergillus species, stcC and stcD are commonly coupled and located between aflV and aflD, with the structure aflV-stcC-stcD-aflD. However, in A. ustus, only stcC was found between aflV and aflD, while stcD was located at the end of the st cluster.
In contrast, the variation in the organization of the vrt clusters in A. ustus, P. aethiopicum and M. acridum is remarkable in that only small scaled conservation was found, such as vrtJ-vrtI-vrtA-vrtB. In addition, the vrt cluster in A. ustus contains seven additional genes between vrtH and vrtR2, including two PKS-like genes that share ca. 60% identity with their homologs in M. acridum. Notably, in M. acridum, orthologs for these two PKS-like genes (EFY92556 and EFY92553) and another laccase encoded gene (EFY92552) are also clustered on chromosome. The flanking genomic regions of the A. ustus vrt cluster were analyzed, but there was no syntenic region found in the genomes of other Aspergillus species. These findings indicated that the vrt cluster in A. ustus might result from two independent evolutionary events, including a remote insertion into the vrt cluster of a segment, which possibly occurred in an organism other than a member of the genus Aspergillus, and a subsequent HGT of the newly formed vrt cluster into A. ustus, which very possibly took place after the divergence of A. ustus from the other Aspergilli.

Concluding Remarks and Perspective
An increasing amount of knowledge about human fungal pathogens is becoming available, largely derived from genome sequencing projects. In this study, we sequenced and annotated the genome of a rare human pathogen, A. ustus and revealed the phylogenetic relationship between A. ustus and other Aspergilli at the whole-genome level. Interestingly, although A. ustus and A. nidulans are the most closely related species and share a large number of highly similar genes, their genome size varied a lot, i.e., 38M and 30M, respectively. This is partially caused by the incomplete state of A. ustus genome, but also indicates large scaled gene gain/ loss in recent adaptation evolution. SMBPs were systematically analyzed in A. ustus 3.3904 and 52 SM gene clusters were found, including the vrt cluster that was found in Aspergillus for the first time. In addition, extensive differences in the SMBPs were also identified, from the products profile to the gene content and organization in particular SM clusters, showing that SM biosynthesis is under rapid evolution in Aspergillus, and might be associated with the complex environment they confront. Invasive fungal infections, including IPA, plagues a large number of immunocompromised patients. The early diagnosis of IPA is of great benefit to prognosis; however, it is still a challenge for the current methods used in clinics. Some of the methods are time consuming (e.g., blood or sputum culture) or lacking in both sensitivity and specificity (e.g., computed tomography (CT) and the beta-D-glucan/galactomannan test). Therefore, new molecular methods, such as microarray and PCR assays, are anticipated to provide higher sensitivity and speed. For example, PCR based methods have been greatly improved by the efforts of EAPCRI (Working Group European Aspergillus PCR Initiative), which aims to develop a standard for Aspergillus PCR methodology (www.eapcri.eu), and a recent study has revealed that combined sequencing of multiple genomic sites would improve the performance [31]. In this sense, the generation of more Aspergillus genome information remains important and would be of great help in screening for new biomarkers for Aspergillus identification in clinics.
Supporting Information S1