Whole genome comparison of Aspergillus flavus L-morphotype strain NRRL 3357 (type) and S-morphotype strain AF70

Aspergillus flavus is a saprophytic fungus that infects corn, peanuts, tree nuts and other agriculturally important crops. Once the crop is infected the fungus has the potential to secrete one or more mycotoxins, the most carcinogenic of which is aflatoxin. Aflatoxin contaminated crops are deemed unfit for human or animal consumption, which results in both food and economic losses. Within A. flavus, two morphotypes exist: the S strains (small sclerotia) and L strains (large sclerotia). Significant morphological and physiological differences exist between the two morphotypes. For example, the S-morphotypes produces sclerotia that are smaller (< 400 μm), greater in quantity, and contain higher concentrations of aflatoxin than the L-morphotypes (>400 μm). The morphotypes also differ in pigmentation, pH homeostasis in culture and the number of spores produced. Here we report the first full genome sequence of an A. flavus S morphotype, strain AF70. We provide a comprehensive comparison of the A. flavus S-morphotype genome sequence with a previously sequenced genome of an L-morphotype strain (NRRL 3357), including an in-depth analysis of secondary metabolic clusters and the identification SNPs within their aflatoxin gene clusters.


Phenotypic comparison of A. flavus morphotypes
For obtaining images of fungal phenotype, 2x10 5 fresh spores from strains AF70 and NRRL 3357 were spot plated on YGT agar plates and grown for 6 days at 30˚C. Stereoscope images were taken on a Zeiss Stereoscope Discovery.v20 using an Axiocam MRc5 digital camera system (Zeiss, Oberkochen, Germany). To determine spore production, 1x10 6 fresh spores were point inoculated on YGT and V8 media plates, and grown for 9 days at 30˚C in 12 hr light/ dark cycles. Three representative plugs were removed from each plate and added to 500 μl 0.02% Triton-X 100 (TTX-100). A hemocytometer (Hausser Scientific, Horsham, PA) was used for counting spores, and in every case >100 spores were counted. Three independent plates for each strain and medium were used.
To determine sclerotium diameter, 2x10 5 fresh spores from each strain were point inoculated on YGT plates. After 6 days of growth on YGT agar, sclerotia were harvested with 0.01% TTX-100 and rinsed onto filter paper (Whatman #4, Whatman Ltd.). Three independent plates for each strain were used. Sclerotium diameter was measured by averaging two crossmeasurements. Measurements were conducted using Axiovision 4.8 image analysis software (Zeiss).
To determine if both morphotype representatives are of the same VCG, the method of Horn et al. [32] was used with some modification. Spores were inoculated onto 0.5X V8 plates. After 5 to 7 days of growth, five agar plugs containing spores were cut and put into sterilized glass vials containing 2.5 ml deionized water. Twenty microliters of each spore suspension was then inoculated onto the center of multiple plates containing a single type of selection medium. Between days 5-14, mutations were identified as a cloudy, thick growth that outgrew the wild-type. Observed mutants were then transferred to MIT agar plates for further selection. After approximately 3 days of growth, 1 plug from the outer growth was then transferred to a nitrogen-selecting medium, where vegetative compatibility was determined by pairing complementary mutants (nia-D, nir-A or cnx).

Aflatoxin extraction and ultra-performance liquid chromatography (UPLC) analysis
After growth of mycelia on solid supplemented YGT media, 220 ml of 2:1 acetone:water was added and shaken vigorously for 16 hours, then allowed to separate. The supernatant was extracted with 150 ml methylene chloride, which was then evaporated under filtered nitrogen gas. Each dried sample extract was resuspended in 1 ml acetonitrile, then transferred to a 2 ml, 0.45 μm nylon filter centrifuge tube (Spin-X; Corning Inc., Corning, NY) and centrifuged at 14000 rpm for 1 min. Injections (1 μl) of filtered extract were analyzed by UPLC. AFB 1 and AFB 2 analysis was performed with modifications to a procedure previously described [33]. UPLC analyses were performed with a Waters Acquity H-Class System combined with an Acquity fluorescence (FLR) detector (Waters, Milford, MA). Sample extract (1 μl) was injected for separation through an Acquity BEH C18 column (1.7 um, 2.1 x 50 mm) at 30˚C. Run time was 3 min with an elution flow rate of 0.4 ml/min and isocratic mobile phase consisting of methanol:water (40:60). AFB 1 detection wavelength was 365 nm (excitation) and 440 nm (emission) and retention time was 2.1 min. A calibration curve with high linearity (R 2 = 0.9993) was constructed for AFB 1 from a series of diluted standards (Sigma-Aldrich). Statistical analysis conducted using Graphpad Prism 7 (Graphpad Software Inc).
Sequencing and assembly of the A. flavus AF70 genome DNA was extracted from 10 g of A. flavus AF70 mycelia harvested after 24 h growth in PDB (Becton, Dickinson and Co., Franklin Lakes, NJ) and continuous shaking at 30˚C. Briefly, mycelia were ground to a fine powder in liquid N 2 , then subjected to CTAB extraction (1% CTAB [mixed alkyltrimethyl-ammonium bromide], 200mM Tris-HCl pH8.0, 0.8M NaCl, 1% B-mercaptoethanol, 10mM EDTA), followed by Proteinase K (500ug/ml) digestion, two rounds of phenol:chloroform extraction and ethanol precipitation [34]. Long strands of visible DNA were removed using a glass rod, and shipped frozen to J. Craig Venter Institute, Bethesda, MD. The sample quality was assayed using an agarose gel and an Agilent 2100 bioanalyzer (Santa Clara, CA). Sequencing of strain AF70 was performed using single end 100 bp reads in the Illumina GA-II (Illumina, San Diego, CA). A total of 20,255,792 reads were obtained equivalent to roughly 70X sequence coverage. The assembly was done using CLC Genomics Workbench de novo assembler (v4.3). Raw Reads are available at the sequence read archive, SRA accession: SRP131888 (www.ncbi.nih.gov/sra).

Genome annotation
Gene predictions were performed using the MAKER pipeline (version 2.31.4) [35]. First, repetitive elements were identified in the initial AF70 genome sequence by MAKER using the RepeatMasker [36] algorithm in conjunction with the RepBase [37] repeat library for fungi and MAKER's built-in repeat database. To train the gene prediction algorithm Augustus [38], Trinity was used to assemble a transcriptome from in-house generated RNA-seq experiments for A. flavus AF70 [23]. MAKER was then run using the assembled transcriptome, the predicted proteins from A. oryzae, A. flavus NRRL 3357, and A. nidulans (with the "protein2genome" and "est2genome" options set to 1), and fungal protein sequences from the SWISS-Pro database to develop a set of predicted genes. These genes were filtered using maker2zzf based on how well they fit the EST evidence, and narrowed down to about 1,200 genes which were then used to train the first iteration of Augustus. MAKER then produced ab initio gene predictions from the repeat-masked AF70 genomic sequence using the GeneMark [39] and trained Augustus. Gene prediction was also performed on the AF70 sequence using the PASA pipeline [40]. These sets of predictions were included in subsequent runs of MAKER through the "pred_gff" option. Augustus was retrained using results from the second run of MAKER, and MAKER was run a third time. All runs of MAKER were performed with the "single_exon" set to 1 and "correct_est_fusion" set to 1. Gene predictions that had no overlapping EST or protein homology evidence were included in the final set of predictions if they were found to have an InterPro domain when examined using InterProScan. The final annotations were manually edited using WebApollo (version 2013- [11][12][13][14][15][16][17][18][19][20][21][22]. The sequence and annotation file was uploaded to NCBI under Genbank Assembly Accession GCA_000952835.1. The genome sequence and predicted proteins for A. oryzae and A. nidulans were obtained from AspGD (www. aspergillusgenome.org, accessed 9/15/14). The sequence for NRRL 3357 was obtained from NCBI (accessed 4/1/14).

Comparative analyses of A. flavus morphotypes
For construction of Venn Diagrams, coding sequences were aligned using blastn. Sequences with coverage > 50%, identity over 70% and an e-value < 1e-10 were considered orthologous. For Gene Ontology Enrichment Analysis, the R package GOSeq was used [41]. Construction of the phylogenetic tree involved orthologous proteins that were detected using Proteinortho [42], aligned with Muscle [43], and concatenated into a 1.1 Mb amino acid alignment using GBLOCKS [44]. The tree was produced using RAxML with A. zonatus as the outgroup taxon. The inset shows closely related species with expanded branch lengths. For secondary metabolic cluster prediction, the Antibiotics-Secondary Metabolite Analysis Shell (anti-SMASH) program was used [45]. Default parameters were used except for the incorporation of the Cluster-Finder algorithm. Anti-SMASH is designed to predict 43 categories of gene clusters (e.g. Type 1-3 polyketide synthases [PKS], Non-ribosomal peptide-synthetases [Nrps], terpenes, etc.), thus it generally provides a relatively comprehensive list of cluster predictions. Examination of the clusters within the AF70 and NRRL 3357 genomes was undertaken using MultiGeneBlast [46] in tblastn mode. Each of the known eukaryotic cluster entries from the MIBiG database (accessed 7/1/17) [47] was used as a query against the AF70 and NRRL 3357 genomes. The entry with the most genes was chosen for clusters that had redundant entries. Variant analysis involved creating simulated short reads of the AF70 genome using bbmap [48], mapping those reads to the NRRL 3357 genome using BWA [49], calling variants using Freebayes [50], and annotating variants using SNPeff [51]. Although the total number of orthologous genes will be different than the method described above, the pipeline provides a more accurate estimation of SNP content and is in keeping with best practices [52].
We also utilized nearly 70 kb of genomic sequence from the aflatoxin gene clusters of AF70 (A. flavus S-morphotype) and NRRL 3357 (A. flavus L-morphotype) to compare the quantities and types of single nucleotide polymorphisms (SNPs) that differentiated their biosynthetic pathways. The aflatoxin cluster sequences for both strains were acquired from the NCBI database and uploaded as FASTA files to the sequence analysis program Sequencher (Gene Codes Corporation, Michigan). To accurately identify the start and end of each gene and intergenic region, within the aflatoxin cluster, we also added sequences from individual A. flavus cluster genes to the alignment. Once the cluster sequences were aligned, and the correct orientation of individual cluster genes was verified, the process of scrolling along the cluster sequences began. For each gene and intergenic region, observations were noted for the numbers of transition/transversion SNPs as well as deletions within each strain's cluster sequence. Another comparison involved multi-locus sequence typing (MLST) for NRRL3357 and AF70. Alignments for each of six genomic regions were prepared using Sequencher and then exported as Nexus files: two aflatoxin cluster intergenic regions (aflM/aflN and aflW/aflX), and four noncluster regions (amdS, trpC, MAT1-1 and mfs). These loci have proven, in previous MLST studies for A. flavus, to accurately segregate individuals within a population in lieu of vegetative compatibility group (VCG) testing [53]. The Nexus files were then imported into a suite of nucleotide analysis programs (SNAP) that are used for population inferences [54]. Using one of the workbench programs, SNAP: Combine, we concatenated the six different alignments to make a single sequence alignment. Another program (SNAP: Map) was used to collapse the sequences into haplotypes using the parameters of recoded indels and excluded infinite sites violations. The resulting map file indicates whether or not each strain examined is considered an "individual" based on its haplotype designation [55].

Morphological characterization of A. flavus morphotypes
The phenotypic differences exhibited between AF70 and NRRL 3357 are relevant for contextually analyzing a genomic comparison. While some characteristics have been reported previously [7], we wanted a direct and current comparison of their morphological characteristics, aflatoxin production and diameter of sclerotia. Although equal amounts of spore suspension from each strain were plated on YGT, the AF70 (S-morphotype) strain demonstrated slower growth, greater quantities of sclerotia, and a lack of olive-green conidia, compared to the NRRL 3357 L-morphotype strain (Fig 1A). AF70 contained significantly higher spore content (Fig 1B, left and center) and cross-sectional measurements of sclerotia indicated the average size of sclerotia was approximately 400 um, whereas the average diameter for NRRL 3357 sclerotia was just under 1000 um (Fig 1B, right). AFB 1 and B 2 levels were 4-fold higher and 3.5-fold higher, respectively, for S-morphotype AF70 than for L-morphotype NRRL 3357 ( Fig 1C).
Collectively, our morphological comparisons of AF70 and NRRL 3357 support general observations relating to both morphotypes. It was also determined, through our VCG assays, that strains NRRL 3357 and AF70 are not vegetatively compatible (S1 Fig), which is consistent with previous findings [56].

Genomic comparison of AF70 and NRRL 3357
Of the approximately 13,500 genes predicted to exist in NRRL 3357 and AF70, 13,118 genes were predicted to be orthologous based on blastn results of the coding sequences versus the genome based on 70% identity, 50% query coverage, and an E value <1x10 -10 (Fig 2). Also based on this criterion, 472 genes were found to be unique to AF70, while 367 genes were found to be unique to NRRL 3357. Using Proteinortho to determine the number of orthologues indicated 2,397 coding sequences unique to NRRL 3357 and 2,039 unique to AF70. However, it should be pointed out that differences in annotation could contribute significantly to this determination. Phylogenetic comparison including multiple Aspergillus species, and using concatenated amino acid sequence alignment, indicates NRRL 3357 shares highest sequence identity with A. oryzae (Fig 2B). Also observed was that both of the examined A. flavus morphotypes share a common ancestor. Gene enrichment analysis of the genes unique to each morphotype was conducted to determine if broad categories of genes are present/absent in either genome. Genes unique to strain AF70 were enriched in cytochrome P450 mono-oxygenases (GO:0016705 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen) ( Table 1 and S1 Table). Three of the identified genes are part of a unique gene cluster (AFLA70_220g001820, AFLA70_220g001850, AFLA70_220g001880).

Secondary metabolic gene clusters in A. flavus AF70 and NRRL 3357
The metabolic clusters within A. flavus strains AF70 and NRRL 3357 were identified by anti-SMASH, and alignment of each of the PKS, PKS-NRPS hybrid, NRPS and dimethylallyl tryptophan synthetase (DMAT) enzymes was conducted to illustrate the identity of the domains within each enzyme (Fig 3A-3C). The genomes of AF70 and NRRL 3357 were queried for known eukaryotic biosynthetic clusters using tblastn with MultiGeneBlast, which searches for each gene individually and scores the hits based on their proximity to hits from other genes   Table 2). The criterion for presuming that a cluster was "present" in either strain was that 50% of the genes in the MIBiG cluster were present [57]. The results indicate that 22 characterized clusters are present in both of our examined A. flavus morphotypes, consisting of the well-characterized aflatoxin gene cluster, as well as the aflavarin, aflatrem, and cyclopiazonic acid gene clusters. Nine clusters were predicted to be unique to S-morphotype AF70, including the agriculturally relevant carcinogen ochratoxin. To our knowledge, ochratoxin cluster genes have not been previously identified or characterized in A. flavus strains, thus AF70 may provide such an opportunity. Further study would be necessary to determine if even a minimal amount of ochratoxin production is detectable in AF70, and if not, then identify the cause for its non-production. As well, it may be important to determine if this cluster was inherited through horizontal gene transfer from a closely-related ochratoxigenic species such as A. alliaceus [58]. Six clusters are putatively unique to NRRL 3357.

Identification of high impact SNPs found in secondary metabolic gene clusters
An analysis of A. flavus strains AF70 and NRRL 3357 genomes allowed us to summarize and identify high impact SNPs. This analysis provides a foundation to describe the phenotypic differences between strains, such as differences in aflatoxin or sclerotium production. For this analysis, 13,403 the genomes were aligned, and SNPs were classified as having either high, moderate or low impact. High impact polymorphisms include exon deletions, premature stop codons and frameshift mutations. Moderate impacts consist of mutations occurring at the 3' end of the gene, or mutations resulting in a change in amino acid. Low impact mutations are less likely to affect protein function. Our findings indicate that a majority of the genes (11,888) contain between 1 and 10 SNPs (Table 3) in NRRL 3357 The A. flavus S-(AF70) and L-strain (NRRL 3357) morphotypes differ in their production of aflatoxin (Fig 1). To examine if this is potentially due to structural differences between their aflatoxin clusters, we further examined those SNPs identified in their respective coding sequences. Table 4 and S2 Table detail the SNPs present in the aflatoxin gene clusters for both morphotypes examined.
To analyze the entire aflatoxin cluster, including intergenic regions, Fig 4 shows the results of our comparative polymorphism analysis between the aflatoxin gene clusters of NRRL 3357 and AF70, for which we found 1192 transition or transversion mutations (1.78% of the aflatoxin gene cluster) that differentiated the two morphotypes. The cluster of NRRL 3357 alone contained 116 deletions (0.17%), while the cluster of AF70 contained 742 deletions (1.11%). In total, more than 2000 SNPs within the aflatoxin cluster alignment could be used to differentiate these strains based on genotype. The greatest span of cluster alignment (9825 bp) having the lowest percentage of SNPs (0.35%) was found to encompass the start of the aflC/aflD intergenic region through the end of the aflA/aflB intergenic region. The second greatest span and second lowest percentage (7146 bp and 0.39%, respectively) encompassed the aflR gene through the end of the aflJ/aflE intergenic region. The intergenic region with the highest percentage (14.15%) of SNPs per span of genomic sequence (1039 bp) was aflL/aflI, and the gene with the highest percentage of SNPs (9.06%) per span of genomic sequence (1335 bp) was aflO. The highest concentration of deletions for both strains was found to reside in the aflF/ aflU (norB/cypA) regions of their aflatoxin clusters. This region in L-morphotype NRRL 3357 exhibited 34 deleted bases not observed in AF70, and the S-morphotype AF70 exhibited 619 deleted bases not observed in NRRL 3357. Since both isolates are of the same mating type (MAT1-1), the presence of SNPs would be required to help differentiate the sequences, but their respective sequences were identical so our MLST analysis was based on five genomic loci. It has been reported that the inability of A. flavus to produce G-aflatoxins is based on partial or complete deletion of the aflU gene, and that the amount of deletion observed in the aflF/ aflU region will correlate with sclerotial morphology [59]. For example, the L-strain genotype for the aflF/aflU region results in an amplicon that is approximately 1 kb in size, while the S- strain genotype results in amplicon size of approximately 300 bp. Therefore, based on the report of Ehrlich and co-workers [59], NRRL 3357 is an L-morphotype strain and AF70 is an S-morphotype strain, which supported these strains being individuals and not sharing a VCG. Our MLST findings further corroborate the VCG results for these strains. VCGs in filamentous fungi are considered to be determined based on specific heterokaryon incompatibility (het) loci [60], and the identities of the relevant loci have not been identified in A. flavus.
In conclusion, the sequence and phenotypic differences quantified here between these A. flavus morphotypes are significant factors to consider in experimental design. In addition to being visibly different morphotypes, these model organisms produce significantly different levels of toxins. Here we describe additional differences in secondary metabolic gene cluster profiles, and identify several high impact SNPs within their aflatoxin gene clusters that could account for their differences in toxin production. This information should contribute to the further use and development of these strains in examining both fungal biology and their pathogenic potential.
Supporting information S1 Fig. VCG assay for A. flavus strains AF70 (S-morphotype) and NRRL 3357 (L-morphotype). The results here illustrate a lack of cleft formation between mycelia of AF70 and NRRL