Discovery of new mycoviral genomes within publicly available fungal transcriptomic datasets

The distribution and diversity of RNA viruses in fungi is incompletely understood due to the often cryptic nature of mycoviral infections and the focused study of primarily pathogenic and/or economically important fungi. As most viruses that are known to infect fungi possess either single-stranded or double-stranded RNA genomes, transcriptomic data provides the opportunity to query for viruses in diverse fungal samples without any a priori knowledge of virus infection. Here we describe a systematic survey of all transcriptomic datasets from fungi belonging to the subphylum Pezizomycotina. Using a simple but effective computational pipeline that uses reads discarded during normal RNA-seq analyses, followed by identification of a viral RNA-dependent RNA polymerase (RdRP) motif in de novo assembled contigs, 59 viruses from 44 different fungi were identified. Among the viruses identified, 89% were determined to be new species and 68% are, to our knowledge, the first virus described from the fungal species. Comprehensive analyses of both nucleotide and inferred protein sequences characterize the phylogenetic relationships between these viruses and the known set of mycoviral sequences and support the classification of up to four new families and two new genera. Thus the results provide a deeper understanding of the scope of mycoviral diversity while also increasing the distribution of fungal hosts. Further, this study demonstrates the suitability of analyzing RNA-seq data to facilitate rapid discovery of new viruses.


Introduction
Advances in low-cost, high-throughput sequencing technologies has revolutionized our ability to query the molecular world. Transcriptome projects are of particular use by providing targeted information about gene expression and improving the annotation of genome projects. Indeed, a number of "1K" transcriptome projects are underway, including the 1000 Plant Genome project, Fish-T1K project, 1K Insect Transcriptome Evolution (1KITE) project, and 1000 Fungal Genome and Transcriptome project. These large scale, collaborative projects have contributed to the vast amount of sequencing data available. As of February 2018, more than 16,000 petabases have been deposited in the Short Read Archive (SRA) at NCBI, with over 6,000 petabases available as open-access data.
Beyond revealing gene expression patterns within the target organism, these datasets are a source for additional insights as novel RNA sequences, specifically viral genomic sequences, have been identified within them. In an early example, analysis of RNA-seq data from a soybean cyst nematode revealed the presence of four different RNA viruses [1] . A more broad scale analysis of 66 non-angiosperm RNA-seq datasets, produced as part of the 1000 Plant Genome project, identified viral sequences in 28 plant species [2] . Additionally, the first viral sequences for any member of the family of spiders Nephilidae were identified through a secondary analysis of the RNA-seq data produced during the spider genome sequencing project [3] .
While over 200 mycoviruses have been described, the full depth of the mycoviral landscape remains to be determined as certains biases have impacted the discovery of mycoviral sequences. Commonly infections of fungi appear cryptic, having little to no apparent impact on fungal health, thus occluding the presence of the virus. Additionally, many of the fungi studied are important human or plant pathogens, but this represents only a fraction of the known diversity of fungi. Advancements in mycovirus discoveries have been aided by screening the transcriptomes of various collections of fungi; study systems include those fungi associated with the seagrass Posidonia oceanica [4] , five common plant-pathogenic fungal species [5] , and soybean phyllosphere phytobiomes [6] .
In the biosphere, fungi are not found in isolation; instead, they are members of complex environments that include members from some or all other kingdoms of life. Characterizing the responses and contributions that fungi make to these environments also includes a full understanding of the dynamics at play within the fungi. We hypothesized that mycoviral infections are more pervasive than currently known, and as such, undertook a systematic approach to identify new mycoviral species by querying the publicly available fungal transcriptomic datasets at the Short Read Archive hosted at NCBI. Our bioinformatic approach provides a robust method for the identification of new viral species from raw sequencing data as we identified 59 viral genomes, 53 of which are described here for the first time.

Results and Discussion
Viral RNA-dependent RNA polymerases (RdRP) are essential enzymes in the genomes of RNA viruses that have no DNA stage. The domain organization of this protein is described as a right hand, where three subdomains are the palm, fingers, and thumb [7] . While some of the motifs found within these subdomains are conserved among all polymerase enzymes, specific structures and motifs are found only in RNA-dependent RNA polymerases. In addition to unique signatures within the sequences themselves, virus-like RdRPs are not encoded in the DNA genomes of eukaryotic organisms, thus viral RdRPs can be specifically targeted as a marker for sequence-based analyses.
We previously determined that contigs assembled from fungal RNA-seq data can be queried for viral RdRP-specific domains [8] . As such, we undertook a systematic analysis of all RNA-seq datasets from Pezizomycotina fungi (division: Ascomycota; subkingdom: Dikarya) available at the Short Read Archive (SRA) at the National Center for Biotechnology Information (NCBI) on July 21, 2017. An initial list of 1,127 unique BioProjects was identified, which was manually curated to identify 569 BioProjects suitable for this analysis (S1 Table). Samples were excluded for a variety of reasons including: individually listed BioProjects were determined to belong to larger groups of experiments using the same fungal strain and were therefore combined and a single representative sample was selected; samples containing a mix of fungi and RNA from other kingdoms were excluded as the host of any putative viral sequences could not be determined; samples where the read sequences were too short for assembly or unavailable for download.

Virus-like RdRP Discovery Pipeline
A computational pipeline was created to automate the steps following selection of a raw reads SRA file available and identification of putative viral RdRP sequences. The majority of the steps used publicly available programs in conjunction with a small number of custom Perl scripts. The overall structure of the pipeline was optimized to efficiently work within a HTCondor-managed system including the use of the DAGMan (Directed Acyclic Graph Manager) meta-scheduler to coordinate the required dependencies between pipeline steps.
Due to the highly specific nature of viral RdRP protein sequences, the key feature of this pipeline was the use of a customized Pfam database which included the viral RdRP families: RdRP_1, RdRP_2, RdRP_3, RdRP_4, RdRP_5, and Mitovir_RNA_pol. To err on the side of over-inclusiveness, HMMER hmmscan was run with an e-value cut-off of 10.0, allowing for the identification of any sequence with limited similarity to the viral protein domains. Putative RdRP sequence(s) were identified in a total of 1067 contigs from 284 of the 569 samples. The actual number of viruses was expected to be less than this total number; reasons include a false positive match to the viral RdRP signatures, contigs that are the forward and reverse strand sequence of a single virus, or contigs that are partial matches along a single virus sequence. To confirm similarity to known viral sequences, contigs were evaluated via a BLASTX search versus the 'nr' database at NCBI followed by manual evaluation and curation of contigs with hits to known mycoviral sequences.

Summary of identified viral sequences
In total, 59 complete, RNA mycoviral genomes were identified in 47 of the 569 SRA samples analyzed. A global analysis of the sequencing data available and the resultant virus sequences along with the set of known mycoviruses available at Virus-HostDB (version March 23, 2018, http://www.genome.jp/virushostdb ) demonstrates a number of patterns and trends (Fig 1). There is an uneven distribution of data being generated within the Pezizomycotina subphylum whereby certain Classes are more abundantly represented than others (illustrated by the width of the band going from "Fungal Class" to "Fungal Genus"). For instance, the Sordariomycetes and Eurotiomycetes host 77% of the sequencing projects while the remaining eight Classes host the other 23% of datasets. Unsurprisingly, fungi known to be key pathogens of plants and humans are present within the Sordariomycetes and Eurotiomycetes, including the well-studied genera Fusarium , Aspergillus , and Penicillium . Indeed, seven genera alone account for 51% of all sequencing projects analyzed in this study, with the remaining 49% of datasets spread among 176 other genera.
Virus discovery as a function of number of sequencing projects favors the less data-rich Classes of fungi. While Sordariomycetes and Eurotiomycetes host 61% of the viruses identified in this study, the discovery rate is 17%. Conversely, the less-studied classes Leotiomycetes, and Pezizomycetes had virus discovery rates of 21%, and 26% respectively. Clearly, however, a certain threshold of data is required for virus discovery, as the Classes Xylonomycetes, Orbiliomycetes, and Lecanoromycetes had a combined total of seven sequencing projects but only one virus was identified.
Comparing the distribution of the known set of Pezizomycotina-hosted mycoviruses with those identified in this study demonstrates similar distribution patterns. Members of the Sordariomycetes and Eurotiomycetes Classes are host to 58% of known mycoviruses, with Leotiomycetes being also abundant. Gains in under-represented Classes are observed, as only two mycoviruses were previously known to be hosted by a Pezizomycetes fungus, and five viruses were identified in this study. Additionally, the first virus from the Class Orbiliomycetes was found in this study.
Classification of the virus sequences identified in this study reveals that 19% of viruses are from unclassified viral families. This is similar to the known set of mycoviruses, where 18% of viruses are also unclassified. Further, dsRNA viruses are the most abundant category within the two viral datasets, with 63% and 54% belonging to those identified in this study and the known set respectively. The ratio of dsRNA to ssRNA viruses in this study is higher than the known dataset, indicating an under-identification of ssRNA viruses (18% versus 27%, this study and known dataset respectively). This result may indicate that the RNA preparation methods for RNA-seq experiments are biased against ssRNA viruses.
An in-depth analysis and characterization of each of the 59 viral sequences is described below (Table 1). Viral genomes have been divided into two major categories: dsRNA viruses and (+)ssRNA viruses then further separated by viral family and, where applicable, genus. In total, viruses were identified from eight of the currently recognized virus families and as such are

Double stranded RNA (dsRNA) Viruses
A total of 34 viruses were categorized as dsRNA viruses due the identity of the top BLASTP match. To further characterize the relationship of these sequences, a phylogenetic tree based on the RdRP amino acid sequences from the 34 identified viruses and the top BLASTP matches was constructed (Fig 2). Globally, five groups are visible, where three are known viral families ( Totiviridae, Partitiviridae, and Chrysoviridae ) and two represent potentially new families.
Properties characteristic of each group of viruses are described in detail below.

Non-segmented RNA Virus (BbNRV1)
Recently, dsRNA viruses have been identified from various fungi that encode two ORFs within the single-segment genome, where the RdRP sequences have more similarity to the RdRPs of partitiviruses (multi-segment genomes) than totiviruses (mono-segment genomes). The virus Beauveria bassiana non-segmented dsRNA virus 1 (BbNRV1), assembled using the RNA-seq data from a transcriptome experiment of the highly virulent isolate B. bassiana ARSEF 8028 [9] , groups with these previously identified viruses (Fig 2). A second virus from this group was also identified in this analysis: Colletotrichum higginsianum non-segmented RNA virus 1 (ChNRV1; this virus served as an internal control for the virus discovery pipeline as it was previously identified via a similar approach from two different Colletotrichum higginsianum RNA-seq datasets [8] . BbNRV1 and ChNRV1, together with four previously identified viruses, form a strongly supported group that is separate from other characterized fungal viruses (Fig 2). It has been noted by multiple researchers that these fungal viruses share similarities with a family of plant viruses, the Amalgaviridae , specifically the mono-segmented genome structure and partitivirus-like RdRP sequence. Therefore, a phylogenetic analysis of the RdRP protein sequence was performed which demonstrates that, while the mono-segmented genome structure is shared, this new clade of fungal viruses is also distinct from the plant amalgaviruses ( Fig 3A). Indeed, the amalgavirus group of plant viruses branches off before the BbNRV1-related and Partitivirus clades. A percent identity matrix generated with RdRP and CP coding sequences further demonstrates a separation between this new group and existing virus families ( Fig 3B). For the RdRP pairwise comparisons, 50% identity easily differentiates the three clades from each other. Additionally, the CP sequence comparisons demonstrate shared similarities within the three clades and less than 20% similarity is shared between the clades.
The genome structure of viruses within this group consists of two predicted ORFs surrounded by variable length UTRs (Fig 3C). The 5'UTR lengths fall into two distinct ranges: less than 40 nt for Penicillium janczewskii Beauveria bassiana-like virus 1 (PjBblV1) [4] , ChNRV1 [8] , and UvHNND1 [10] , or between 260 to 320 nt for the three viruses isolated from B. bassiana [11,12] and Alternaria longipes dsRNA virus 1 (AlV1-HN28) [13] . The protein encoded by ORF1 is predicted to be from 314 to 394 aa and in a +1 frame relative to the protein encoded by ORF2. The intergenic sequence lengths are generally less than 100 nt, with the exception of AlV1-HN28. ORF2 nucleotide sequences range from 1578 to 1773 nt, and encoded proteins, which encode an RdRP domain, range in size from 525 to 590 aa in length. A slippery site sequence identified between ORF1 and ORF2 of ChNRV1 suggested that an ORF1-ORF2 fusion protein may be made via a -1 ribosomal frameshift; indeed, trypsin digest followed by mass spectrometry of a protein near the predicted weight of the fusion protein returned signatures for both the ORF1 and ORF2 protein sequences [8] . Numerous attempts to isolate virus particles definitely produced by a virus within this group remain unsuccessful [4,8,11] .
Due to the shared genome structure and partitivirus-like RdRP protein sequence, early viruses from this group were tentatively grouped with the plant Amalgaviridae family. However, with the identification of additional genome sequences, a more in depth phylogenetic analysis demonstrates the mycoviruses described here are separate. A new genus, "Unirnavirus", has recently been suggested to reflect the single, unified genome segment that contains both ORFs [12] . As percent identity is a common criteria for species demarcation, the pairwise comparisons here provide possible thresholds for determining new species within this group of viruses; excluding the three viruses isolated from different strains of B. bassiana , a threshold of 75% identity in the RdRP protein sequence and 70% in the ORF1 coding sequence would be appropriate criteria to distinguish between the viruses isolated from unique fungal hosts.

Alternaviridae
Initially, a single segment encoding a single gene with an RdRP domain was identified from an Aspergillus heteromorphus sequencing sample (PRJNA250969; unpublished). BLAST alignment to 'nr' identified five related RdRP mycoviral sequences from viruses with multi-segmented genomes. Fusarium graminearum alternavirus 1 (FgAV1) includes at least one additional genomic segment while Alternaria alternata virus 1 (AaV1) [14] and Aspergillus foetidus dsRNA mycovirus (AfdsV) [15] each contain a total of four genomic segments. Using the additional sequences from these related viruses, two more contigs were identified from the Aspergillus heteromorphus Trinity assembly. Thus, the AheAV1 genome contains at least three genomic segments, the largest of which encodes the RdRP enzyme and the two smaller segments that encode proteins of unknown function (Table 1). Recently it has been proposed that a new family be established, " Alternaviridae " [15] we support the formation of this family, and the requisite genus ( Alternavirus ) and propose that AheAV1 is also a member of this new mycoviral family.
Previously it was observed that members of this clade group together, but separate from other families [14,15] . A phylogenetic analysis using the RdRP coding sequence demonstrates that the group containing AheAV1 forms a well supported clade that appears related to the Chrysoviridae and Totiviridae families (Fig 2).  (Fig 2). Indeed, a pairwise percent identity comparison of both the RdRP and HP1 protein sequences demonstrates the very high identity among the viruses from the two Fusarium and also from the two Aspergillus viruses (Fig 4A). This analysis also illustrates a clear delineation between alternaviruses and chrysoviruses.
A common feature to viral genomes with multiple segments is the presence of a conserved sequence within the 5' UTRs of each fragment. Sequence conservation was described within the UTRs of the other putative alternaviruses, including AfV [15] and AaV1 [14] . Alignment of the 5'UTR sequences of the three genomic segments of AheAV1 demonstrates sequence conservation ( Fig 4C). Alignment of the source RNA-seq reads to the viral genome segments reveals a high amount of total reads, with over four million reads mapped across the three segments ( Fig 4B). This represents nearly 25% of the reads used for the analysis (reads that did not align to the fungal genome) and over 5% of the total reads from the sample. In an effort to identify a possible fourth genomic fragment for AheAV1, both the 5'UTR conserved sequence and the high read count were used as criteria to further query the Trinity assembled contigs however no additional contigs were identified.
An alignment of the RdRP protein sequences from the five alternavirus genomes reveals the presence of the expected eight conserved domains found within viral RdRP proteins ( Fig 4D).
Certain differences are shared among these viruses that are unique when compared to known viral sequences. Specifically, the triad within domain VI has an alanine (ADD) instead of the nearly universally conserved glycine (GDD).

Totiviridae
Ten of the viral sequences identified in this study are similar to other known members of the Totiviridae family, nine within the genus Victorivirus and one within the genus Totivirus .

Totiviruses
Hortaea werneckii totivirus 1 (HwTV1), the first viral sequence described for this fungal species to our knowledge (PRJNA356640; [16] ), is a member of the Totivirus genus (Fig 2). Species demarcation within Totivirus requires biological characterization of the virus within a host cell; however, isolation of a virus from a distinct host species and less than 50% sequence identity of the RdRP at the protein level is a proxy for identifying a separate species [17] . HwTV1 was identified from a new fungal host, H. werneckii , and shares less than 44% sequence similarity to the most closely related known totiviruses (S1A Fig

Victoriviruses
The remaining nine viruses in the Totiviridae family were found within two branches of the and CzTV1 immediately precedes an octanucleotide sequence (AGGGUUCC) which has been observed in the 5'UTR of some other victoriviruses including Alternaria arborescens victorivirus 1 (AaVV1) [24] and Epichoë festucae virus 1 (EfV1) [25] . The peak in coverage of CnTV1 immediately precedes an octanucleotide sequence slightly different than previously reported (CGGCCUCC). A similar peak in coverage was not observed for AhoTV1, which also contains the AAGGGUUCC octamer sequence within the 5'UTR, although this viral sequence had the lowest average coverage. The genome structure diagrams drawn above the coverage heat maps highlight features of the viruses that are further described below.
Unlike the frame-shifting observed in totiviruses, the RdRP of Victoriviruses is expressed as a separate protein from the coat protein [26] via a termination-reinitiation strategy [27] . The AUG start codon either overlaps the stop codon of the upstream of ORF1 or slightly precedes it, and is nearly always found in the -1 frame relative to ORF1 [21] . The tetranucleotide sequence, A UG A , is observed for PdTV2, PdTV3, CzTV1, and AspTV1. For CnTV1, TmTV1, and ToTV1, the last nucleotide of the ORF1 stop codon overlaps the ORF2 start codon, creating the pentamer sequences UG A UG , UA A UG , and UA A UG respectively. In each of these seven instances, there is the expected -1 frameshift between ORF1 and ORF2. For CeTV1 and CcTV1, a two nucleotide spacer is observed between the start codon of ORF2 and the stop codon of ORF1 ( AUG CA UGA and AUG AA UAA respectively) that is similar to the sequences observed in SsRV1 and SsRV2 ( AUG AA UAA and AUG AG UAA respectively) [28] .
Thus ORF2 of CeTV1 and CcTV1, as well as SsRV1 and SsRV2, is found in the +1 frame relative to ORF1.
A characteristic of the coat protein sequences of victoriviruses is an abundance of alanine, glycine, and proline (A,G,P) residues, particularly towards the C-terminal end. An analysis of between the CP protein sequences from victorivirus species and totivirus species illustrates a generally elevated percentage of these three amino acids across the entire CP coding sequence of victoriviruses in comparison to totiviruses, with a notable increase observed in the two bins closest to the C-terminus of the protein (S2B Fig). The same pattern is also observed for all nine proposed victorivirus sequences identified in this study.
These analyses demonstrate that these nine sequences share conserved characteristics found in members of the genus Victorivirus . Criteria for species demarcation within this genus are fungal host species and percent identity of the RdRP and CP amino acid sequences [17] , with a cutoff of 60% identity for both. The CP and RdRP sequences of CnTV1, TmTV1, and

Partitiviridae
Five genera have been assigned to the family Partitiviridae , along with fifteen viruses unassigned to a genus [30] . Genome structure of this family requires two distinct genome segments, each containing a single ORF: dsRNA1 and dsRNA2. The RdRP is found on dsRNA1 while the CP is found on dsRNA2, and highly conserved sequences can be found in the 5'UTR of both viral genome segments [30] . To date, viruses isolated from fungi have been classified into one of three of the Partitiviridae genera: Alphapartitivirus , Betapartitiviruse , and Gammapartitivirus . In this study, sixteen viruses were identified that clearly group among the known partitiviruses (Fig 2).

Alphapartitivirus
Five viruses identified in this study phylogenetically group within the Alphapartitivirus genus:  (Table 1), which is within the described range for betapartitiviruses of 2.1 to 2.4 kb [30] . As observed with the majority of the alphapartitiviruses described above, the average coverage per nucleotide for the genomic segments of FpPV1-2516 and TcPV1 was higher for the dsRNA2 segment that encodes the CP than the RdRP segment (S4B Fig).

Gammapartitivirus
Five viruses identified in this analysis grouped with members of the genus Gammapartitivirus  were identified that are phylogenetically related to an emerging group of partitiviruses that are separate from, but related to, the genus Gammapartitivirus (Fig 2); this group has been tentatively named Epsilonpartitivirus [41] . Identification of the PaPlV sequences was expected as this sample was RNA-seq data from Cryphonectria parasitica infected with PaPlV, previously isolated from P. aurantiogriseum by the same researchers [41] . To distinguish this viral genome from that isolated from the original host, the virus has been named PaPV1-cp. Also, the slight modification of the virus name, from partiti-like to partitivirus, demonstrates a recent improved understanding that this is a partitivirus following the identification of a RNA segment encoding a putative CP [41] .
Members of this proposed genus have been identified from both fungal and invertebrate hosts, although currently the viral sequences form two sister clades separated by kingdom ( Fig   2). Genome sizes of the dsRNA segments of putative epsilonpartitiviruses are from 1.7 to 1.9 kb for dsRNA1 and 1.5 to 1.8 kb for dsRNA2 (Table 1). Using a pairwise comparison analysis, a value of 85% identity for the RdRP protein sequence and 75% for the CP sequence would currently be appropriately stringent for species demarcation of viruses isolated from different fungal hosts (Fig 5A).
Observed here, sequences isolated from the PaPlV1-infected C. parasitica sequencing sample contain a number of SNPs in comparison to the source PaPlV sequence; these changes may reflect an adaptive response by the virus to a new fungal host [41] . A positive effect on fungal growth is observed whereby the virus confers a positive effect on fungal growth of C.
parasitica in the presence of 2% salts that has not yet been confirmed in the original host P.
aurantiogriseum . Read coverage along the viral genome segments isolated from C. parasitica is on average low, 18x for dsRNA1 and 5x for dsRNA1 (Fig 5B), similar to the published report [41] .
Both of the viral genome segments of CePV1 share the greatest percent identity with PaPlV1, but below the threshold suggested above, at 84% and 73% for the RdRP and CP respectively (Fig 5A). Similar to other partitiviruses, a multiple sequence alignment of the 5'UTR sequences of the two genome segments of CePV1 reveals regions of sequence conservation (Fig 5C).
These four viruses share many sequence-based similarities. A 13-nt highly conserved sequence (TCACAAYATYAYA) is present in the 5'UTR sequence of both dsRNA genome segments from all four viruses (Fig 5F). Additionally, a 10-nt conserved sequence (TYTCATTARR) is found within the 3'UTR sequence of both dsRNA segments of all four viruses ( Fig 5G). While these four viruses contain the six conserved motifs common to RdRP proteins from other members of the Partitiviridae , including the GDD triad in motif VI, there are distinct differences that are shared that distinguish these viruses from the existing genera ( Fig 5D).
Further, the range of lengths for the RdRP protein, from 569 to 572 aa, and CP protein, from 453 to 501 aa (Table 1), are unique ranges among the currently described genera [30] . A pairwise percent identity analysis of the RdRP and CP sequences from the four viruses suggests a threshold for species demarcation at 90% for the RdRP sequence and 80% for the CP sequence, similar to other genera within the family (Fig 5A).
Alignment of the RNA-seq reads to the viral genomes demonstrates that dsRNA2 has a higher average coverage per nucleotide for both viruses, with approximately 2x more coverage observed between the segments of PbPV1 and 7x between the segments of DcPV1 (Fig 5E). PRJNA250734) (Fig 2).
Phylogenetically, chrysoviruses are grouped into two distinct clusters, where cluster 1 contains members with either three or four segmented genomes, while cluster 2 contains viruses with either four or five segments [45] . The two viral sequences identified in this study, BbCV1 and PrCV1, belong with cluster 1 chrysoviruses; both genomes are comprised of four genomic segments and the RdRP sequences phylogenetically group with the ICTV-recognized members within cluster 1, Isaria javanica chrysovirus 1 (IjCV1) [47] and Penicillium chrysogenum virus (PcCV1) [48] (Fig 2). Further, as detailed below, we recommend that these are both new viral species within the genus Chrysovirus .
To our knowledge, these are the first chrysoviruses identified for both of these fungal hosts, and indeed the first virus from P. raistrickii described, thus satisfying the host-based criteria for species demarcation. Following identification of the RdRP-encoding contig, the identity of the other three segments was determined via a BLASTP search using protein sequences of known chrysoviruses against the predicted protein sequences of the Trinity contigs. The combined length of dsRNAs for both BbCV1 and PrCV1 are 12.46 and 12.56 kb respectively (Table 1), within the observed range of other 11.5 to 12.8 kb of other chrysoviruses [45] . In general, chrysovirus genome segments are numbered by size, however segment 4 of PrCV1 is slightly longer than segment 3 (Table 1), similar to segment 4 of Amasya cherry disease associated chrysovirus (AcdaCV) which is also larger than its respective segment 3 [49] .  These are the first viruses, to our knowledge, identified from both G. esculenta and the morel mushroom, M. importuna .

Alphaendornavirus
A phylogenetic analysis of the RdRP motif from related endornaviruses and the viruses identified in this study reveals MiEV2 and GeEV1 form a well supported clade with other fungal endornaviruses, including Ceratobasidium endornavirus A (CEVA) [53] , and Rhizoctonia solani endornavirus 2 (RsEV2) [54] , that is related to, but separate from alphaendornaviruses isolated from plant hosts, including the type species Oryza sativa endornavirus (OsEV) [55] (Fig 6A).
The genome sizes of MiEV2 and GeEV1 are 15.3 and 14.5 kb respectively, well within the observed range of other Alphaendornaviruses (Table 1) The two criteria for species demarcation within the genus Alphaendornavirus are difference in host and an overall nucleotide sequence identity below 75%. Both viruses described here satisfy these criteria. Indeed, nucleotide identity among Endornaviruses is generally below 50%.
An analysis of the protein sequences of the RdRP motif and the viral helicase motif reveal similar results, as sequences are between 39% and 67% identical for the RdRP sequences and

Betaendornavirus
The two other endornaviruses identified were MiEV1 and MiEV3, which group with the betaendornaviruses. MiEV3, in particular, forms a strong phylogenetic relationship with other characterized betaendornaviruses, while MiEV1 is the outermost member of this clade (Fig 6A).
Unexpectedly, the genome sizes of both viruses are more than 25% larger than other betaendornaviruses; MiEV1 has a genome size of 16.5 kb while MiEV3 has a genome size of 14.3 kb (Table 1)

Tobamo-like virus: ArTlV1
A virus sequence was identified in an Acidomyces richmondensis sample (PRJNA250470; unpublished) with a single segment genome sequence, 10,291 nt in length, that encoded four predicted ORFs ( Table 1). The gene product from ORF2 contains the RdRP domain ( Fig 7B) and a BLAST-P search of the 'nr' database with this sequence identified two related, full length mycoviruses, Macrophomina phaseolina tobamo-like virus 1 (MpTlV1) [5] and Podosphaera prunicola tobamo-like virus 1 (PpTlV1) [56] . As such, this newly identified virus has been named

Acidomyces richmondensis tobamo-like virus 1 (ArTlV1).
For all three mycoviral sequences the first and largest ORF contains a viral methyltransferase and viral helicase domain and ORF2 encodes the RdRP domain (Fig 7B).
ORF3 and ORF4 in MpTlV1 have been putatively identified as encoding a movement protein and coat protein respectively [5] .
A phylogenetic analysis of the RdRP motif from the three mycoviruses compared to select members of the families Virgaviridae , Bromoviridae and genus Idaeovirus demonstrated that ArTlV1 is more closely related to the two previously identified mycoviruses than known plant viruses ( Fig 7A). Further, the group of mycoviruses form a sister clade to the Virgaviridae viruses. Of the viruses currently characterized from this family, only members of the genus Tobamovirus have a non-segmented genome similar to ArTlV1 and relatives; other members have multipartite genomes [57] .
To further characterize this emerging group of mycoviruses, a percent identity analysis was undertaken, comparing the amino acid sequences from the RdRP and viral helicase motifs to members of the Virgaviridae family. The RdRP motif sequences among the mycoviruses share 56 to 63% identity with one another, but less than 44% identity with Virgaviridae viruses ( Fig   7C). Further, sequences from the viral helicase domain were 59 to 62% identical among mycoviruses, and less than 40% shared identity outside of this clade. Due to these values, it appears that all three viruses are unique viral sequences.
Finally, the RNA-seq data was aligned to the ArTlV1 genome sequence, which revealed uniform coverage along the length of the genome, averaging 1,167 reads per nucleotide ( Fig   7B).
To date all viruses of the Virgaviridae family infect a plant host, although many members are transmitted by nematodes or plasmodiophorids, a group of parasitic microscopic organisms unrelated to fungi or oomycetes [57] . Tobacco mosaic virus (TMV), the best characterized Tobamovirus , can infect, replicate, and persist in the ascomycota fungi Colletotrichum acutatum [58] . Further, a virus from the Bromoviridae family, Cucumber mosaic virus (CMV) was shown to naturally infect the phytopathogenic fungi Rhizoctonia solani and as well as replicate within Valsa mali but not C. parasitica , and F. graminearum [59] . Due to the close contact between phytopathogenic fungi and plants a change in host range of a tobamo-like virus is plausible.
Interestingly, the A. richomondensis strain analyzed in this study was isolated from acid mine drainage biofilms [60] and is not known to be phytopathogenic. Acid mine drainage biofilms are complex communities comprised of different fungi along with bacteria, archaea and bacteriophages [61,62] . The role of the mycovirus identified in this study in such an extreme and heterogeneous environment remains to be determined.

Ambiguiviridae
Four viral genome sequences were identified that share similarity to unclassified (+)ssRNA Phylogenetic analysis demonstrates the fungal viruses form a well-supported clade that is separate from both the insect and plant viruses (Fig 8A). A pairwise identity analysis of the RdRP motif region from the viruses analyzed in the phylogenetic tree reveals a range of identities between the fungal viruses, from 36% to 63%, while only sharing 25% to 33% identity with the insect viruses and 26% to 32% identity with the members of the Tombusviridae ( Fig   8D). A similar analysis using the ORF1 protein coding sequence, from only the mycoviral genomes, demonstrates 18 to 38% identity among sequences (Fig 8D).
Together these ten fungal viruses are characterized by a single segment genome, 2.6 kb to 4.5 kb in length, that encodes two ORFs, where the second ORF contains the RdRP domain (Table 1). The amber stop codon 'UAG', found at the end of the first ORF, has been suggested to result in readthrough to produce a fusion protein with the downstream RdRP [67] . All members of this group of viruses contain the amber stop codon, and both ORF1 and ORF2 are found in the same frame, a requirement for readthrough. Additionally, all 10 viruses share a number of key differences in residues commonly conserved within (+)ssRNA viruses. For instance, the second aspartic acid residue in motif A, a feature of (+)ssRNA viruses [68] , is instead a glutamic acid (Fig 8C). Additionally, these 10 viruses have a GD N triad in motif C versus the canonical GD D found in (+)ssRNA viruses (Fig 8C). This triad has previously been observed within some (-)ssRNA viruses [68] , although within (+)ssRNA viruses, modification of the GDD to GDN has repeatedly demonstrated a detrimental effect on enzymatic activity [69][70][71] . Finally, within motif D, a positively-charged lysine, common to nearly all RdRP sequences [68,72] , is a nonpolar proline or valine residue (Fig 8C).
BLAST-X analysis of these nucleotide sequences determined they were both related to a group of unclassified, (+)ssRNA viruses, that includes four other mycoviruses. As described below, these two viruses are new members of a recently proposed family, Yadokariviridae [73] .  [75] . The lengths of the 5'UTRs are all greater than 500 nt, except for PdYV1. Attempts to extend the sequence of the 5'UTR of PdYV1 using the RNA-seq data were not successful; however, as the average coverage per nucleotide for PdYV1 was low, just 23x (Fig 9C), additional experiments with the fungal source material may be fruitful. Lengths of the 3'UTRs for all six viruses are also largely similar, between 200 to 400 nt, except for YkV1 which has a 3'UTR of 1,223 nt. A genome segment belonging to a Yadokarivirus that encodes a coat protein has not been identified. Alignment of the RNA-seq reads to the genomes of AhoYV1 and PdYV1 reveals an average coverage per nucleotide of 151x and 23x respectively (Fig 9B-C).
A phylogenetic analysis of the RdRP protein sequence reveals a well supported clade that is most closely related to the Caliciviridae and separate from other known mycoviral families including Hypoviridae and fusariviruses (Fig 9A). AhoYV1 in particular is most closely related to AfSV2; both viruses were isolated from a member of the genus Aspergillus [74] .
As percent identity between virus RdRP sequences is a common criteria for defining new groups of viruses, the RdRP motif sequences from the viruses in Fig 9A were analyzed to identify a potential threshold value. Indeed a threshold between 26 to 35% identity easily classifies these viruses together, but separate from other known (+)ssRNA viruses (Fig 9D). An additional threshold of 95% would to distinguish each sequence evaluated here as a unique viral species; however, as the pairwise percent identity between AhoYV1 and AfSV2 of 94% is somewhat higher then is generally observed for unique species, a lower threshold may be more appropriate. Identification of additional sequences belonging to this group of viruses will further refine these sequence-based characteristics.
Another feature common to these viruses is the presence of at least one additional virus within the fungal host. Co-incident infection is not an uncommon occurrence in fungi; however, it has been demonstrated for two yadokariviruses, AfSV2 and YkV1, that isolated virus particles contain a second virus, related to Totiviruses [74,75] . Further, Zhang and colleagues demonstrated that YkV1 is trans -encapsidated with the capsid protein from the other virus, YnV1 [75] . As caliciviruses and totiviruses share a distant but strongly supported ancestry following diversification of the picorna-like superfamily [76] , and both AfSV2 and YkV1 are present within virions encapsidated by totivirus capsid proteins, it is intriguing to speculate about the mechanisms at play, and the requirements for such an interaction. A totivirus was also identified in the A. homomorphus sample (AhoTV1, discussed above), however only a partitivirus (PdPV1-HSF6) was identified along with PdYV1. For the remaining two members of this group of viruses, it is not possible to know the identity of the putative coat-protein donor, as PaFlV1 was found to co-infect P. aurantiogriseum along with a fusarivirus, totivirus, unclassified virus, and partitivirus [4] and FpMyV2 was identified in a fungal strain that contained 13 other viruses from a variety of families [38] .
Further study of the fungi that host a member of this group of viruses will answer questions regarding the relationship between the capsid-less virus and the capsid-donor, and may provide an interesting system for studying the evolution of a capsid-less virus. Due to the distinct phylogenetic relationship between these six identified and known (+)ssRNA viruses, along with the unique lifestyle, we support the recommendations to create a new virus family, named Yadokariviridae after YkV1, where yadokari , is the Japanese word for hermit crab [73,77] .

Narnaviridae and Ourmia-like virus
Members of the family Narnaviridae have single molecule, positive-strand RNA genomes ranging in length from 2.3 to 2.9 kb and encode an RdRP as the only polypeptide [78] . There are currently two genera within this family: Narnavirus and Mitovirus , where mitoviruses are characterized as viruses that replicate within the mitochondria of filamentous fungal hosts [78] .
Indeed, a non-interrupted ORF is only predicted in these viruses when using the mitochondrial codon usage where tryptophan is encoded by UGA [79] . Here we describe five new viruses with similarity to members of the genus Mitovirus from RNA-seq datasets from four fungal hosts:  [81] ).
The five mitoviruses have genome sizes within the expected range (Table 1), and a single ORF is present only when using the mitochondrial codon table. The percent of A+U in the coding strand for these five viruses ranges from 53% to 70%; characteristically, mitoviruses have 62 to 73% A+U [78] , although the percent observed in recently identified viruses is as low as 53% [5] . The predicted protein within each genome contains a Mitovir_RNA_pol domain, with a length ranging from 652 to 716 aa. Alignment of the RNA-seq reads to the viral genomes demonstrates end to end coverage for each virus, with average coverage per nucleotide ranging from 51x for OsMV1 to 540x for LjMV1 (S8D Fig). A phylogenetic analysis of the RdRP coding sequence demonstrates strong support for all five newly identified viruses among members of the genus Mitovirus (Fig 7B). Criteria for species demarcation within the genus Mitovirus is not yet full defined, however, viruses with greater than 90% identity within the RdRP protein sequences are considered the same species [20] . Here we observe that StMV1, CfMV1, OsMV1, OsMV2, and LjMV1 have less than 52% identity to any currently available mitovirus sequence (S8B Fig). Taken together, the above characteristics demonstrate that these five viruses are new species within the genus Mitovirus .
Ourmiaviruses are also positive-strand RNA viruses, composed of three genome segments that encode an RdRP, movement protein, and coat protein, isolated from plant hosts [19,82] .
Despite usage of standard eukaryotic codons, phylogenetically the RdRP protein sequence is most closely related to that of Narnavirus [19] . Recently viruses have been identified from fungal hosts with sequence similarity to that of the plant-hosted ourmiaviruses. One such virus was identified in this study: Aspergillus neoniger ourmia-like virus 1 (AnOlV1) (PRJNA250996; unpublished). The single segment, 2.7 kb genome encodes a single polyprotein that contains an RdRP motif ( Table 1)

Hypoviridae
The Hypoviridae family consists of positive sense, single-stranded RNA viruses that are approximately 9.1 to 12.7 kb in length, containing either one long ORF or two ORFs [85] .
Further, these viruses are capsidless and use a host-derived, trans-Golgi network for genome replication [86,87] . Infection by Hypoviruses can often result in reduced fungal virulence, a hallmark symptom of early members of the Hypoviridae family.
Hypoviruses identified from four transcriptomic datasets are described here: Betahypovirus [92] , where CHV1 and CHV2 are classified in the former and CHV3 and CHV4 in the later [85] . A phylogenetic analysis using the RdRP protein sequence of the four viruses identified here, along with known and related hypoviruses , revealed that FgHV1, ShHFV1, and TaHV1 groups with the Alphahypoviruses while StHV1 was found with the Betahypoviruses (Fig   7C). A further analysis of these viral sequences, described below within the context of the two recently proposed genera, determined that ShHV1, StHV1, and TaHV1 are new species within the Hypovirus genus.

MiRV1
In addition to the three endornaviruses and one fusarivirus described above, a fifth viral sequence was identified in the M. importuna sample (PRJNA372858; unpublished). The single-segment sequence was determined to be 10,132 nt in length, with a single predicted ORF of 3,295 aa ( (RsRV-1,RsRV-2, RsRV-3) [105] . Both MiRV1 and SsRV-L contain a single ORF with methyltransferase, helicase and RdRP domains, and neither contain peptidase motifs [104] .
Previous phylogenetic analyses of SsRV-L and the three viruses from R. solani demonstrated a relationship between these viruses and members of the alpha-like virus superfamily, specifically viruses from the Hepeviridae and Alphatetraviridae families [104,105] .
Endornaviridae , whose members infect both fungi and plants, are also a family of alpha-like viruses [106] , but appear to be more distantly related to this emerging group of mycoviruses.
Expanding the phylogenetic analysis to include MiRV1 confirms that MiRV1 is a member of this new clade of mycoviruses (S11A Fig). Identification of additional related viral sequences will help elucidate the relationship of MiRV1 and the rest of the Alpha-like viruses.

Unknown comovirus-like sequences
An RdRP-containing contig with a top match to Turnip Ringspot Virus (TRV) was identified in sequencing samples from Colletotrichum tofieldiae (;PRJNA287627 [107] ) and Zymoseptoria brevis (PRJNA277175; [108] ). Further inquiry also identified the second genomic segment of TRV, which encodes the movement and coat proteins, from both the C. tofieldiae and Z. brevis

Summary and Conclusions
Here we describe the power of a systematic approach for analyzing public RNA-seq datasets to is only conferred to the plant when the fungi is present and is infected with the virus [112] .
Another phenotype, different from canonical hypovirulence, is the reduction in fungicide resistance observed following viral infection of prochloraz-resistant strain of Penicillium digitatum [113] .
Multipartite relationship also include virus-virus interactions within a host as many fungi are concurrently infected by more than one mycovirus; here we identified over 10 such fungi.
Recently, it was described that a positive effect on accumulation of a viral genome was observed when a second, unrelated virus was present in Rosellinia necatrix [75] . Conversely, coinfection of C. parasitica by a totivirus and a suppressor-deficient hypovirus lead to induction of antiviral defense genes and ultimately inhibition of totivirus replication [114] .
Indeed asymptomatic or latent infections can downplay the active antiviral processes at work within the fungus and the important role a successful antiviral defense has on fungal phenotypes. Small RNA-mediated antiviral defense systems, crucial for preventing a deleterious effect on fungal health, have been identified in Cryphonectria parasitica [115] , Colletotrichum higginsianum [8] , and Sclerotinia sclerotiorum [116] .

Pipeline for identification of viral-like sequences
The output file from the E-utilities search command was manually parsed to create a unique list of all BioProjects and fungal species. Multiple fungal species found under one BioProject number were split into separate entries. Most BioProjects included more than one sample; for example different environmental treatments or wild-type versus mutant. In general, the SRA sample selected for analysis was a wild-type-like strain grown on media. When available, the fungal genome sequence was either downloaded from NCBI or Joint Genome Institute (JGI).
The computational steps for identifying viral sequences start with a downloaded raw reads file from SRA. When a fungal genome sequence was available, RNA-seq reads were aligned to the genome using bowtie2 (version 2.2.9) [118] , and the unaligned reads were captured in a separate file using the flag `--un`. A de novo assembly of these unaligned reads was created using Trinity (v2.1.1) [119] and resultant contigs were analyzed for open reading frames (ORFs) by TransDecoder (version 2.1) [120] . Finally, these predicted protein sequences were queried against a custom database of viral RNA dependent RNA polymerases (RdRPs) using HMMscan (version 3.1b2; [121] ). In general, when the data was paired end, the data present within pair one was sufficient data for identification of viral contigs. Occasionally read depth from the virus genome was low, resulting in fragmented viral contigs. In these instances, the pipeline was rerun using both pair one and pair two data. Due to the nature of the `--un` flag of bowtie2, the pairs were treated as individual single-end datasets, instead of paired-end data. Assembly of the virus from Trichoderma harzianum (PRJNA216008), was particularly recalcitrant and as such, all six SRA paired-end datasets were used in the pipeline for identification of ThAV1.

Contig extension and identification
Contigs identified containing putative viral RdRP domains were further analyzed by first aligning the nucleotide sequence to the 'nr' database at NCBI using BLASTX. A variety of manual annotation steps were required following confirmation of a viral sequence. Some viral genomes, such as those matching the Partitiviridae or Chrysoviridae families, were predicted to contain additional viral genes encoded on separate contigs; these ORFs encode for coat proteins and/or hypothetical proteins. S2 Table summarizes the viral RdRP sequences identified and, where applicable, which known BLASTX-identified sequences were used to query the Trinity assembly for additional contigs.
Occasionally contigs were identified where the predicted ORF(s) extended to either the 5'or 3'-most edge; this indicated that the sequence in hand was truncated. In these instances, the RdRP sequence from the best BLAST match was used to query the Trinity contigs to identify putative viral sequences outside the known RdRP contig. Any contigs identified were then stitched together to create a final sequence with a full-length ORF. Start and stop codons were predicted using ORFinder at NCBI [122] .

Sankey Diagram
The online tool SankeyMATIC [123] was used generate the Sankey Diagram in

Pseudoknot Structures
The online tool DotKnot was used to predict pseudoknots in certain viral genome sequences [124,125] . PseudoViewer 3.0 was used for visualization of the predicted pseudoknots [126] .

Read distribution plots
Distribution of reads along the viral genome segment(s) was visualized by first aligning the reads to the viral genome using Bowtie2, then calculating the hits per nucleotide along the viral sequence(s) using the custom Perl script plotRegion_RNAseq_hitsPerNT.pl. The resulting output

Phylogenetic analysis
To identify related sequences for phylogenetic analysis, the predicted amino acid sequence for the RdRP was used to query the 'nr' database at NCBI. Protein sequences of the top hits, generally with e-values approaching or equal to 0.0, were downloaded. Partial sequences were excluded. Known viral sequences that appeared as the top hit for multiple viruses identified in this study were deduplicated. The complete list of Accession IDs used for phylogenetic analysis is available in Supp. Table S2.
To build the trees, amino acid sequences were aligned with MAFFT [127] , and resultant alignments were degapped and trimmed using the tool trimAL [128] with a gap threshold of 0.9 (allowing for a gap in up to 10% of the sequences at a given position). Maximum likelihood phylogenetic trees were constructed with RAxML version 8.2.4 [129] using the LG model for amino acid substitution. RAxML tree files, with branch lengths, were viewed in Dendroscope version 3.5.7 [130] and exported as PDF for figure preparation.

Percent Identity Matrices
Percent identities among protein coding sequences, either full length or motif-specific, were determined via multiple sequence alignment using Clustal Omega [131] . In instances where two different proteins were evaluated and then combined into one matrix, first the RdRP protein sequences were analyzed; the order of the output in the percent identity matrix was then applied to the second set of protein sequences to be analyzed and the option "Order: Input" was selected in Clustal-Omega. Percent identity matrices were converted to heat map plots using a custom R script. This script along with the input matrices are available through the FigShare FileSet (DOI: 10.6084/m9.figshare.7476359).

Accession numbers
Nucleotide sequences of the mycoviral genomes identified and analyzed in this study were deposited at NCBI; GenBank accession numbers are listed in Table 1.