Genome Assembly of the Fungus Cochliobolus miyabeanus, and Transcriptome Analysis during Early Stages of Infection on American Wildrice (Zizania palustris L.)

The fungus Cochliobolus miyabeanus causes severe leaf spot disease on rice (Oryza sativa) and two North American specialty crops, American wildrice (Zizania palustris) and switchgrass (Panicum virgatum). Despite the importance of C. miyabeanus as a disease-causing agent in wildrice, little is known about either the mechanisms of pathogenicity or host defense responses. To start bridging these gaps, the genome of C. miyabeanus strain TG12bL2 was shotgun sequenced using Illumina technology. The genome assembly consists of 31.79 Mbp in 2,378 scaffolds with an N50 = 74,921. It contains 11,000 predicted genes of which 94.5% were annotated. Approximately 10% of total gene number is expected to be secreted. The C. miyabeanus genome is rich in carbohydrate active enzymes, and harbors 187 small secreted peptides (SSPs) and some fungal effector homologs. Detoxification systems were represented by a variety of enzymes that could offer protection against plant defense compounds. The non-ribosomal peptide synthetases and polyketide synthases (PKS) present were common to other Cochliobolus species. Additionally, the fungal transcriptome was analyzed at 48 hours after inoculation in planta. A total of 10,674 genes were found to be expressed, some of which are known to be involved in pathogenicity or response to host defenses including hydrophobins, cutinase, cell wall degrading enzymes, enzymes related to reactive oxygen species scavenging, PKS, detoxification systems, SSPs, and a known fungal effector. This work will facilitate future research on C. miyabeanus pathogen-associated molecular patterns and effectors, and in the identification of their corresponding wildrice defense mechanisms.


Introduction
Cochliobolus miyabeanus ((Ito & Kuribayashi) Drechsler ex Datur.) (anamorph = Bipolaris oryzae (Breda de Haan) Shoemaker) is a common fungal pathogen worldwide. In the U.S., it has been documented from the north, in North Dakota and Minnesota, to the south, from Florida to Texas, as well as in areas of California [1][2][3][4]. It is a major pathogen of rice (Oryza sativa L.) in all areas of the world where this crop is grown [5]. In addition, it has the potential to cause a severe yield limiting leaf spot disease on two North American non-traditional grass crops, switchgrass (Panicum virgatum L.), grown for cellulosic biofuel production [4], and American wildrice (Zizania palustris L.), hypothesized to have originated in North America [6] and grown commercially for its gourmet grain [7]. C. miyabeanus causes fungal brown spot (FBS) on wildrice that can lead to economically disastrous losses in paddy-grown crops [8,9], resulting in a greater reliance on fungicide to bring about profitable yields. In susceptible wildrice, fungal conidia usually germinate by 8 h after deposition on leaves and develop club-shaped appressoria by 18 h. Infection hyphae break through the cuticle, or less frequently through stomata, develop under the cuticle, and later invade inter-and intracellular spaces. Symptoms appear about 18 to 48 h after inoculation as brown-purple to dark spots that enlarge over time into oval lesions with brown to tan necrotic centers, frequently surrounded by chlorotic halos [10]. Lesions tend to coalescence, whitening aerial leaves. Stems and sheaths can also be infected and the weakened stems frequently break, causing considerable kernel loss [3].
To mitigate grain yield reduction, a few wildrice cultivars have been released with improved genetic resistance to FBS [11]; however, the molecular bases of resistance are not known. Further, fungal mechanisms of virulence on wildrice have not been broadly studied in contrast to those of other species of Cochliobolus.
Pathogenicity in many Cochliobolus species is largely due to host-specific toxins (HSTs). The first HST described was victorin, a nonribosomal peptide (NRP), produced by C. victoriae, the causal agent of Victoria blight in oat [12]. It has been proposed that in susceptible oat genotypes carrying the homozygous dominant Vb locus, the fungal toxin binds to a thioredoxin guarded by a NB-LRR protein that in turn triggers apoptosis, facilitating disease for this necrotrophic fungus [13]. C. carbonum race 1 produces HC-toxin, a tetrapeptide that inhibits histone deacetylases involved in DNA repair, modification, and transcription. The locus TOX2 contains essential genes for toxin synthesis which includes HTS1, a nonribosomal peptide synthetase (NRPS) [14], TOXA, a cyclic peptide efflux pump for toxin detoxification [15], TOXC, a fatty acid synthetase [16], TOXF, a branched-chain amino acid transaminase [17], TOXG, an alanine racemase [18], and TOXE, an atypical regulatory sequence that controls expression of TOXA and TOXC [19]. Race T of C. heterostrophus produces a linear polyketide HST (Ttoxin), which generates pores in the inner mitochondrial membrane and subsequent leakage in maize carrying the Texas male sterile cytoplasm (T-urf13) gene. The complex locus TOX1 includes genes for synthesis of the toxin found in two unlinked loci, ToxA and ToxB [20]. ToxA contains two monomodular polyketide synthase (PKS) genes required for toxin production and virulence [21,22]. ToxB comprises a decarboxylase (DEC1) and three reductases (RED1, RED2, and RED3) [20,23] of which only RED2 participates in the toxic peptide synthesis. Lastly, a putative NRPS associated with virulence on barley was recently uncovered in C. sativus through comparative genomics among Cochliobolus species [24].
C. miyabeanus strains that are pathogenic on common rice do not have unique PKSs or NRPSs and are not known to produce an HST [24]. C. miyabeanus, and other Cochliobolus species, make non-specific phytotoxic cyclic sesquiterpenes commonly known as ophiobolins [25], which are also produced by non-pathogenic fungi [26], suggesting that they have functions other than in interactions with plant hosts. Purified ophiobolins have been found to have a number of effects on plants including inhibiting root growth, stimulating electrolyte leakage from roots, and inducing stomatal opening [26]. Ophiobolins also have antimicrobial activity and cause hyphal deformation [27].
Cochliobolus belongs to the class Dothidiomycetes. This class consists of fungi with a wide assortment of life styles that live in ecologically diverse environments. It is thought that members of this class descended from a common ancestor over 280 million years ago, and contemporary species exhibit genomes with macro-, meso-, and microsynteny, variation in genome sizes attributed to the amount of repeated DNA, and yet conserved gene numbers [28]. Plant pathogens within the Dothidiomycetes contain genes with 10 unique Pfam domains and 69 expanded domains that are not present in other plant pathogens. The proteomes of plant pathogens within the Pleosporales are enriched for cysteine-rich small secreted proteins (SSPs) 200 amino acids (aa) in length, some of which are thought to be involved in plant-fungus interactions. Some SSPs are common to all Cochliobolus species while others are unique to single species [24]. The Dothideomycetes have a vast number of genes for production of secondary metabolites, including PKSs, NPSs, and terpene synthases [28]. A comparative study of Cochliobolus plant pathogens showed that some of the secondary metabolites synthesized by NRPS and PKS genes were conserved among Cochliobolus species, while others were unique to a single species. A few of them (NPS1, NPS3, and NPS13) with discontinuous distribution among species have complicated patterns of evolution that include expansion, loss, and recombination of adenylation (AMP) domains that may generate novel plant toxic peptides [24].
Here, we report the draft genome assembly and catalog of genes of a C. miyabeanus strain originally isolated from infected wildrice as well as the transcriptome of the pathogen during an early time point of wildrice colonization. In our analysis we focus on potential effectors, including small secreted proteins and genes potentially involved in pathogenicity on wildrice, with the objective of gaining a better understanding of the mechanisms of pathogenesis to assist in enhancing genetic resistance in wildrice breeding lines.

Fungal strain and DNA extraction
The C. miyabeanus strain TG12bL2 (hereafter referred to as CmTG12bL2) was isolated from a wildrice leaf with FBS symptoms collected from a paddy in Aitkin, Minnesota, USA as previously described [29]. For DNA extraction, the fungus was grown in 2% (w/v) water agar (Bacto Agar, DIFCO) for approximately two weeks until spores were produced. Spores collected from four Petri dishes were added to two sterilized 500 ml glass flasks containing 200 ml of liquid minimal medium [30]. The flasks were shaken at 150 rpm for 6 days at room temperature in ambient light, breaking the mycelium apart every 48 h to promote new growth, and placing in fresh medium. Mycelium was harvested by filtration, freeze dried, and DNA was extracted using the protocol of Raeder and Broda [31]. DNA concentration and quality were measured using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA).

Fungal genome sequencing and de novo assembly
The CmTG12bL2 genome was shotgun sequenced using paired-end reads with 101 cycles with Illumina HiSeq 2000 technology at the Mayo Clinic, Rochester, Minnesota, USA. A paired-end library was prepared according to the Illumina Preparation Guide (http://www.illumina.com). Briefly, 2 to 5 μg of DNA in 100 μl of 10 mM Tris and 0.1 mM EDTA, pH 8, was fragmented using a Covaris E210 sonicator (Woburn, MA) generating double-stranded DNA fragments with blunt or sticky ends with a fragment size mode between 400 to 500 bp. The ends were repaired using Klenow DNA polymerase and T4 DNA polymerase, phosphorylated using T4 polynucleotide kinase, after which an adenine was added to the 3' ends of double-stranded DNA using Exo-Minus Klenow DNA polymerase. Paired-end DNA adaptors with a single thymine base overhang at the 3' end were ligated to DNA fragments and the resulting constructs were separated on a 2% agarose gel. DNA fragments of approximately 500 bp were excised from the gel using GeneCatcher tips and purified using Qiagen Gel Extraction Kits (Qiagen, Valencia, CA). The adapter-modified DNA fragments were enriched by 12 cycles of PCR using primers PE 1.0 and PE 2.0. The concentration and size distribution of the library was determined with a DNA 1000 chip on the Agilent 2100 Bioanalyzer (Agilent Technologies). The library was loaded onto an indexed lane of paired-end flow cell. The reads in the flow cell were sequenced as paired-end indexed reads on an Illumina HiSeq 2000 using TruSeq SBS sequencing kit version 1 and HiSeq data collection v. 1.1.37.0 software. Base-calling was performed using Illumina's RTA v. 1.7.45.0. Reads quality control was performed in Galaxy (https:// galaxyproject.org/).
The fungal genome was de novo assembled using Velvet v.1.2.10 [32]. Hash length (k-mer) ranged from 31 to 91 by two nucleotide increments. Contigs were merged into scaffolds using paired-end information and a coverage cutoff and expected coverage set to 'auto' and an average insert length of 385 bp. The optimal hash length for the assembly was selected based on the maximum N 50 length, large k-mer specificity, and high coverage, without losing overall genome or scaffold length. Parameter optimization for the selected assembly was further refined using the expected coverage (76.56×) and cutoff coverage (38.28×), following software recommendations [33], to discriminate unique genomic regions from repeats, and to further eliminate errors due to low coverage, respectively.

Protein-coding sequence prediction, annotation, and validation
Protein-coding sequences were predicted using GeneMark_ES v.2.3c [34], which employs an intrinsic method with unsupervised training based on an ab initio algorithm. The algorithm features an enhanced intron sub-model to accommodate sequences with and without branch point sites found in several fungi (Ascomycota, Basidiomycota, and Zygomycota). Proteins were annotated against NCBI non-redundant (NCBI_nr) (http://www/ncbi.nlm.nih.gov) and HMMER (http://hmmer.janelia.org) databases. The latter was also used to identify protein profiles (Pfam). For overall analyses with blastp an e-value cut off of 1E-5 was used, except for detecting candidate effectors where a higher e-value cut off was permitted (1E-2). Proteins with a signal peptide or anchor peptide were predicted by SignalP 4.1 [35]. Small secreted proteins (SSPs) were considered those of less than 200 aa without transmembrane domains. Cysteinerich SSPs in the CmTG12bL2 genome were identified as previously described [28]. HCSSPs were resolved as in [28] with at least twice the average of cysteines in the CmTG12bL2 proteome and as in [24] as those SSPs containing more than 2% cysteines.
The CmTG12bL2 predicted genes were validated using three strategies. First, the protein set was compared to a core of 458 highly conserved proteins [36]. This core consisted of highly homologous proteins that are present in six eukaryote genomes (Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Schizosaccharomyces pombe), and was selected from the KOG (clusters of euKaryotic Orthologous Groups) database. Second, the genomes of CmTG12bL2 and C. miyabeanus WK-1C (ATCC 44560 v1.0) (http://genome.jgi.doe.gov) were compared using Gepard [37]. Third, by comparing the genes to the fungal transcriptome at 48 h post-inoculation that was assembled and identified as described below. We adopted the following nomenclature throughout the manuscript: "Cm" refers to gene; "CM" to protein and "t_Cm" to transcript. The number that follows each prefix is the same for gene, protein, and transcript.

Identifying carbohydrate-active enzymes (CAZymes)
For the detection of encoded CAZymes, each CmTG12bL2 protein model was compared (blastp) to proteins listed in the Carbohydrate-Active Enzymes database (www.cazy.org) [38]. Model CmTG12bL2 proteins with a length of over 50% identity to those in CAZy database were directly assigned to the same family (or subfamily when relevant). Proteins with less than 50% identity to a protein in CAZy were all manually inspected for conserved features such as the catalytic residues. Sequence alignments with isolated functional domains were performed in the case of multimodular CAZymes [39]. The same methods were used for all fungi that were compared to CmTG12bL2 proteins. Putative roles of the enzymes were inferred from annotations of proteins returned from searches against the profile-HMMER database (hmmscan) and protein sequences database (phmmer) using an e-value cut off of 1E-33.
Nonribosomal peptide synthetase and polyketide synthase identification and phylogenetic analysis NRPSs and PKSs were identified by comparing the CmTG12bL2 proteins to the well-curated dataset of C. heterostrophus (Ch_NPSs) [40,41] using an in-house script and to other NRPSs and PKSs from NCBI_nr (blastp, 1E-5). Additional motifs in proteins with AMP domains and ketoacyl synthase (KS) domains were identified using HMMER hmmscan (with default settings unless otherwise specified). A degenerate AMP domain in the predicted CmTG12bL2 protein CM_1661, homologous with Ch_NPS6 was identified through manual inspection [42]. AMP domains of CM_3163 were compared to the corresponding C. heterostrophus homologous domains NPS1, NPS3, and NPS13. Finally, putative NRPSs and PKSs from CmTG12bL2 were blasted to proteins of C. miyabeanus WK-1C. A phylogenetic analysis with the CmTG12bL2 AMP domains was done using methods described by Bushley and Turgeon [40] to compare conservation and relationships of AMP domains with those identified in the reference species C. heterostrophus C5, C. heterostrophus C4, and the rice isolate C. miyabeanus WK-1C [24]. Several AMP domains from NRPSs in other Ascomycetes species that encode known chemical products were included as well. Acyl-adenylating enzymes (acyl-CoA-synthetases, CPS1, long chain fatty acids, acyl-CoA-ligases, and ochratoxins) from C. heterostrophus and other ascomycetes were used as outgroups. Protein sequences were aligned using MAFFT and a maximum likelihood phylogenetic tree was constructed in RAxML using the best protein model determined by ProTest (RT-REV-F) as in [24,40].

C. miyabeanus transcriptome assembly and analysis
Total RNA was extracted with the RNeasy Mini Kit (Qiagen Inc, Valencia, CA) according to manufacturer's instructions and treated with Ambion DNAse I (Life Technologies, Carlsbad, CA) to eliminate traces of genomic DNA. Only samples collected at 48 hai were sequenced for the transcriptome analysis. The Illumina TruSeq RNA Sample Preparation Guide was followed to prepare the samples for sequencing according to manufacturer's recommendations (http:// www.illumina.com). Illumina library preparation was performed as previously described [43]. The samples were barcoded, multiplexed, and sequenced using a single-end read with 50 cycles using an Illumina HiSeq2000 machine at the Biomedical Genomic Center at the University of Minnesota. Read quality control was carried out in Galaxy using the Tuxedo suite. Only reads with an average Q score 30 were aligned onto the CmTG12bL2 draft genome for transcriptome assembly using TopHat v1.4.1 [44] implemented in Galaxy.
The CuffLinks suite of tools [45] was used for transcript assembly and quantification. Transcript abundance was estimated using RPKM (Reads Per Kilobase of transcript per Million mapped reads) [46]. RPKM values were log 2 transformed and data distribution was visualized using JMPin (SAS Institute Inc., Cary, NC, USA). Experimental variability of the log 2 of RPKM between biological replicates was estimated using Pearson's correlation coefficient for the 9,960 transcripts that were present in all replicates. For subsequent data analysis only transcripts found in at least three of four replicates were used. Transcripts were functionally categorized using Blast2GO (www.blast2go.com) against the NCBI non-redundant (NCBI_nr) and Swis-sProt/InterPro databases. An enrichment analysis was performed with the 10% most abundant transcripts using a Fisher exact test with a false discovery rate (FDR) of 0.05 implemented within Blast2GO [47] to identify overrepresented or enriched GO terms using the whole set of transcripts as reference.

Quantitative RT-PCR (qRT-PCR) experiments
qRT-PCR experiments were done to validate expression of the transcriptome analysis at 48 hai and to compare the expression levels to an additional time point at 24 hai in planta and to the fungus grown in vitro. cDNA synthesis for all the treatments was carried out with the iScript cDNA synthesis kit (Bio-Rad Laboratories Inc., Hercules, CA). CmTG12bL2 specific primers for the selected genes Ecp6 (Cm_2799), CYP53 (Cm_8068), salicylate hydroxylase (Cm_9653), β-1,4-endoglucanase (Cm_1858), β-1,4-endoxylanase (Cm_8238), and β-1,4-glucosidase (Cm_28)) were designed using the SciTools software at IDT (http://www.idtdna.com/scitools/ Applications/RealTimePCR). The genes expression levels in planta were compared to those of the fungus grown in vitro (reference sample). qRT-PCR reactions were done in 96-wells plates using an Applied Biosystems 7500/7500 Fast Real-Time PCR system (Applied Biosystems, Foster City, CA). Each reaction of 20 μL contained 10 μL iTAQ™ Universal SYBR Green Supermix (Bio-Rad Laboratories Inc.), 0.1 μL of each primer diluted to 0.1μM, 3 μL of cDNA template and 6.8 μL of nuclease-free water. PCR condition were 95°C for 2 min, 40 cycles of 95°C for 3 s, 60°C for 30 s, and 95°C for 15 s, followed by 60°C for 1 min, 95°C for 15 s, 60°C for 15 s (melt curve generation). The delta-delta Ct method was computed by the Applied Biosystems software and used for relative quantification of gene expression. Expression data were normalized against an endogenous control (glyceraldehyde-3-phosphate dehydrogenase gene). Two to three biological replications with at least two technical replicates were analyzed per gene. Primer efficiencies for the genes analyzed were 100% ±10 (SD). Data are presented as the Log 2 of relative gene expression (fold change). Expression values in the reference sample were set to 0 (Log 2 of 1). Significance of gene differential expression between treatments was tested using T-tests with the significance threshold set at 0.05.

C. miyabeanus fungal genome assembly, gene prediction, and validation
Read statistics and quality are summarized in S1 Table. High quality reads were assembled with various k-mer lengths (S1 Fig).  (Table 1) and contained 2,378 scaffolds with a N 50 value of 74,921 bp. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession LNFW00000000. The version described in this paper is version LNFW01000000.
The total number of protein-coding sequences initially predicted by GeneMark-ES was 12,142. After discarding partial genes (1,142), due to the presence of internal stop codons or lack of either initiation or stop codons, 11,000 coding sequences remained for further analyses. The average C+G content was 52.63% for predicted genes, 52.74% for exons, and 45.69% for introns, versus 50.5% for the entire genome. Additional genome assembly features are shown in Table 1. Interrogation (blastp, 1E-05) of the 11,000 predicted proteins with the NCBI_nr database resulted in 10,395 matches (94.5%), of which 92.3% were annotated as hypothetical proteins, mostly belonging to other Cochliobolus species (92.2%). A total of 7,567 motifs were identified when comparing (blastp, 1E-05) all predicted proteins to collections of protein profiles (http://hmmer.janelia.org/). CmTG12bL2 gene models were also validated (blastp, 1E-05) against a core dataset of 458 highly conserved eukaryote proteins [36]. All but two of the 458 core proteins had at least one predicted CmTG12bL2 homolog (S2 Table). Of the two core proteins that could not be directly identified: KOG1291 (Mn 2+ and Fe 2+ transporters of the NRAMP family) and KOG2749 (mRNA cleavage and polyadenylation factor IA/II complex), the latter matched a partial protein. Additionally, two CmTG12bL2 proteins, CM_8436 and CM_2839, each matched two core proteins, but only one of each CmTG12bL2 protein (highest percentage of identity) is presented. Thus, 99.1% of the core dataset could be identified in our predicted protein set. The CmTG12bL2 genome assembly was compared to the C. miyabeanus WK-1C genome (http:// genome.jgi.doe.gov). The ordered scaffolds showed good pair-wise alignment (S3 Fig) validating the assembly over relatively long distances. An additional validation was done by mapping fungal transcripts against the draft genome. Overall, 97.6% of the predicted genes that were expressed at 48 hai in at least three biological replicates were mapped to the genome sequence.

Genes involved in plant-pathogen interactions
Comparative genome analyses identified a number of genes in CmTG12bL2 previously shown to be required for pathogenicity in other Cochliobolus species (Table 2). For instance, the C. heterostrophus CGBI gene encodes a signaling G-protein ß-subunit involved in conidia production and female fertility as well as appressorium and mycelium pigmentation, mycelium morphogenesis, and virulence [48]. The CmTG12bL2 putative homolog has eight exons and seven introns and encodes a polypeptide with two conserved domains, a WD40 domain and a phosducin-like domain that is a cytosolic regulator of G-proteins from the thioredoxin-like superfamily. The C. heterostrophus CPS1 gene encodes an acyl CoA ligase-like protein, which is required for normal virulence in maize. Because it is present in several pathogenic and saprophytic Ascomycetes CPS1 may also have a role in stress tolerance [49]. The CmTG12bL2 homolog contains two large AMP-binding domains, and a DMAP1-binding domain at its amino terminal end. The HDC1 to HDC4 genes of C. carbonum encode histone deacetylases [50,51]. HDC1 is required for virulence (penetration efficiency) in maize, normal conidia size, and growth in complex polysaccharides [50]. The CmTG12bL2 homolog is identical to the 505 amino acid (aa) sequence of HDC1 but is longer at the carboxyl terminus. An 880 aa protein of CmTG12bL2 was 99% identical to the protein kinase ccSNF1 from C. carbonum, including the predicted nuclear localization signal [52]. ccSNF1 regulates expression of extracellular fungal enzymes for degradation and uptake of carbohydrates and is required for full virulence in maize. A homolog of the DNA-binding transcriptional repressor CreA [53] for fungal degrading enzymes of plant cell walls, and putatively regulated by ccSNF1 in C. carbonum, is also present in the CmTG12bL2 genome. The BKM1 gene of a rice isolate of C. miyabeanus has a homolog in CmTG12bL2. Disruption of BKM1 causes defective mycelia and colony growth, loss of conidiation and pathogenicity on rice leaves [54]. A homolog of the Ch_NPS6, required for virulence and insensitivity to hydrogen peroxide in rice [55] is also present in the CmTG12bL2 genome.
The CmTG12bL2 genome has genes that encode homologs of proteins involved in causing disease but are not required for pathogenicity (Table 2). An example is a predicted homolog of CHAP1 that in C. heterostrophus encodes a redox-regulated transcription factor necessary for gene activation for resistance to oxidative stress, although it is not involved in virulence in maize [56]. The CmTG12bL2 genome has homolog proteins to five C. carbonum cell wall degrading enzymes, endopolygalacturonase (PGN1), exo-α-1,4-polygalacturonase (PGX1), two ß-1,4-xylanases (XYL2 and XYL3), and glucohydrolase (cellulase; CEL2), which were proven to be dispensable because strains with mutations in those genes were still pathogenic in maize [57,58,59,60]. Additionally a CmTG12bL2 protein was similar to C. carbonum ALP1, a trypsin-like serine protease that is not required for pathogenicity in maize [61]. Several of the CmTG12bL2 genes are expected to have similar functions in the C. miyabeanus-wildrice interaction, particularly genes such as the iron extracellular siderophore NPS6, and BKM1 that are involved in virulence/pathogenicity in the rice isolate of C. miyabeanus [54,55].

Small secreted proteins (SSPs)
The fungal secretome is essential for interactions with the surrounding environment and potential hosts. Within the SSP group are cysteine-rich secreted proteins (candidate effectors) that could interfere with plant defense mechanisms. The CmTG12bL2 proteome includes a total of 187 proteins less than 200 aa long, with signal peptides and without transmembrane domains. The number of SSP proteins is comparable to the number found in other Dothidiomycetes, although less numerous than those found in other Cochliobolus species [28] (S3 Table). Most of the CmTG12bL2 SSPs (95.7%) have similarities to hypothetical proteins from other Cochliobolus species in the NCBI_nr database including C. miyabeanus WK-1C. SSP features (S3 Table) are within or close to the range limits found in Cochliobolus species or other Dothidiomycetes [24,28]. The percentage of SSPs with Pfam domains in CmTG12bL2 (13.9%) is higher, albeit close to that found in other Dothidiomycetes. Examples of proteins with motifs identified in CmTG12bL2 SSPs are hydrophobins, hydrophobic peptides with predicted functions as surfactants [62], CAZymes, essential to the hydrolysis of carbohydrates of plant cells walls, and proteins with carbohydrate-binding modules (described elsewhere in the manuscript). Other proteins included a fungal-specific CFEM protein with eight conserved cysteines, some of which have been implicated in fungal pathogenicity [63], a protein with an Hce2 domain that constitutes the mature part of the Ecp2 effector protein from the tomato pathogen Cladosporium fulvum (detailed in the next section), and a protein similar to Asp f 13-like from C. lunatus that contains a cerato-platanin domain. Proteins with this domain are phytotoxic and cause cell necrosis and induce plant defenses [64] but also have structural functions in the fungal cell wall. Additional SSPs identified included a clathrin adaptor complex small chain protein involved in the clathrin-mediated pathway for endocytosis of small molecules and a peptidase inhibitor I9 (S4 Table).

Effector-like secreted proteins
Some of the effector-like proteins present in CmTG12bL2 are similar to proteins known to be involved in pathogenicity, or to interfere with plant defense mechanisms. One example is CM_6804 with a pathogen effector Hce2 motif, a putative necrosis-inducing factor. This protein is similar to the extracellular protein 2 (Ecp2) of the biotrophic pathogen Cladosporium fulvum (S4 Fig), that is secreted into the apoplast during colonization of tomato leaves and is considered to have a role in virulence [65]. Cm_6804 is 488 bp long, with two exons and one intron, and encodes a 142 aa peptide with a predicted signal peptide of 19 aa. Similar proteins have been found in Mycosphaerella graminicola and M. fijiensis [66], Gibberella zeae, C. sativus ND90Pr, and C. heterostrophus (S4 Fig) and they could play an important role in pathogenesis.
The CM_2799 protein is similar to C. fulvum Ecp6, which is secreted into the apoplast of tomato leaves during colonization and is involved in virulence [67]. Ecp6 is found in many other plant pathogenic and saprophyte fungi [67,68]. The three Epc6 LysM domains act as carbohydrate binding modules with affinity to chitin tri-, penta-, and hexa-oligosaccharides and compete with chitin-binding plant receptors, thus preventing initiation of defense responses by the plant immune system [69]. CM_2799 has three typical LysM domains (pfam01476) with one cysteine at the beginning and end of each domain (S4 Fig), which could form intramolecular disulfide bonds. In addition, the protein has a cysteine at the C-terminus. Similar peptides are found in other fungi [69] including the hemibiotrophs C. sativus NDPr90, Collectotrichum truncatum, Setosphaeria turcica, and Mycosphaerella graminicola and the necrotroph Cochliobolus heterostrophus [70,71].
Two proteins from C. miyabeanus, CM_2749 and CM_6024, have similarity to the Nep1-like (necrosis and ethylene inducing peptide-like or NLPs) family of proteins. In some fungal pathogens NLPs are expressed in planta between the biotrophic and necrotrophic phase in dicot hosts [72,73], or at the end of the symptomless phase in some monocot hosts [74]. NPLs elicit a hypersensitive-like response in dicots [75]. In monocots, however, NLPs do not appear to induce similar responses as the disruption of NLP does not affect pathogenicity or virulence; instead, a role in microbe inhibition has been suggested. CM_6024 homologs are present in C. sativus, C. heterostrophus and also in Collectrichum higginsianum, where a necrosis-inducing protein (ChNLP1) is expressed only during the switch to the necrotrophic phase in Nicotiana benthamiana [71].
Other larger secreted proteins with potential roles in pathogenicity included predicted cutinases, peptidases, fungal transporters belonging to the major facilitator superfamily (MFS), and ATP-binding cassette (ABC) group proteins, in addition of CAZymes. Some of these are described in further detail below.

Carbohydrate-active enzymes (CAZymes)
Fungal plant pathogens need access to nutrients located mostly in host cell protoplasts. Thus, they secrete a wide array of proteins to breach plant barriers (i.e. cuticle and cell walls) and degrade complex carbohydrates for carbon acquisition. The CmTG12bL2 genome encodes 604 carbohydrate binding-and catalytic protein-modules distributed in 530 protein-coding genes. They are involved in binding, modification and breakdown of plant carbohydrates, fungal cell wall biosynthesis, remodeling and turn over, as well as N-and O-glycoprotein synthesis and processing. The number of CAZymes identified was in the upper range of those found in other Ascomycetes (Table 3) and substantially higher than in Saccharomyces cerevisiae, the obligate biotroph Blumeria graminicola, the hemibiotroph Mycosphaerella graminicola, and the necrotrophs Pyrenophora tritici-repentis, P. teres f. teres, Leptosphaeria maculans, and Alternaria brassicicola, but close to the number found in the hemibiotrophs Gibberella zeae and Magnaporthe oryzae, and the necrotrophs Phaeosphaeria nodorum SN15 and Cochliobolus heterostrophus. The number and variety of CmTG12bL2 CAZymes are within the range of those found in other Cochliobolus species without considering CAZyme auxiliary activity modules [28].
We identified 69, 29, three, and nine families of glycoside hydrolases (GH), glycosyltransferases (GT), polysaccharide lyases (PL), and carbohydrate esterases (CE), respectively, in addition to 12 families of auxiliary activity (AA) enzymes [76] in the CmTG12bL2 genome. There are 13 families of non-catalytic carbohydrate binding modules (CBMs) (S5 Table) in addition to five proteins with distant similarity to plant expansins. More than 30 CmTG12bL2 genes had combined catalytic and non-catalytic CAZy modules.
To infect and obtain nutrients from wildrice tissues, C. miyabeanus needs to breach the cuticle and cell walls composed of cellulose, cross-linking glycans (hemicellulose), and pectin. Numerous CAZymes capable of degrading the backbone chains of those layers as well as removal of substitution residues for complete and efficient degradation, were predicted through electronic annotation (blastp, 1E-22) (S6 Table). A schematic representation of the possible function of some of the CmTG12bL2 CAZymes that potentially allows the fungus to breach the structural barriers of wildrice tissues and acquire nutrients to sustain plant colonization is presented in Fig 1 [85].

Nonribosomal peptide synthetases and polyketide synthases
Fungi produce a plethora of natural products that include NRPs, polyketides, terpenoids, and alkaloids, among others. They play important roles in growth and development, reproduction, response to oxidative stress, and pathogenicity toward plants and microorganisms [86]. NRPs are synthesized through a ribosome-independent pathway by mono-or multimodular NRPSs where each module is composed of a set of core domains. The minimal module consists of an adenylation (AMP) domain, a peptidyl (P) carrier protein or thiolation (T) domain and a condensation (C) domain whose biochemical functions are described elsewhere [86]. In some enzymes the C domain is replaced by a thioesterase NADP(H) dependant reductase domain [40]. Additional domains for epimerization and methylation may be present in the enzyme, adding diversity to the final polypeptide chain [87]. Previous phylogenetic analysis indicated that mono-and bimodular subfamilies contain more conserved domain structures and may have originated earlier than enzymes with more modular composition [40].
In the CmTG12bL2 genome we identified one α-aminoadipate reductase (AAR) and twelve putative NRPSs (Table 4), including proteins with homology to Ch_NPS2, Ch_NPS4, Ch_NPS6, and Ch_NPS10. The last four are the most conserved among Dothidiomycetes and are involved in virulence, morphology, cell surface hydrophobicity, sensitivity to stresses, and fitness [88]. CM_4358 has a monomodular structure similar to Ch_NPS10, with A, T, and NAD binding domains and a short-chain dehydrogenase, which confers substrate specificity and provides a catalytic site. Mutations in the latter gene affect colony morphology and tolerance to oxidative stress [86]. Ch_NPS10 has the highest rate of conservation among 18 Dothidiomycetes [28]. CM_8122 is a multimodular enzyme with a similar structural organization to Ch_NPS2. The latter catalyzes the synthesis of an intracellular siderophore (ferricrocin) involved in iron storage and essential for development of asci and ascospores in C. heterostrophus [89] but not required for virulence in maize. CM_10529 is highly similar to Ch_NPS4, which is also well conserved among filamentous ascomycetes and is involved in cell surface hydrophobicity in C. heterostrophus, Alternaria brassicicola, and Giberella zeae, and also plays a role in conidial cell wall development and rate of germination in A. brassicicola [90]. CM_1661 has high similarity to Ch_NPS6 (Table 4), an extracellular siderophore that in C. heterostrophus, as well as its homologs in a rice-infecting strain of C. miyabeanus, and other filamentous ascomycetes are involved in virulence. In C. heterostrophus, NPS6 also confers tolerance to oxidative stress [42], and it is thought to supply iron to the fungus during host colonization [55]. CM_194 and CM_5231 are similar to Ch_NPS12, but the latter CmTG12bL2 protein has a lower percent identity and lacks the ferric reductase-like membrane component (Pfam01794) and instead has a spore coat protein U domain (pfam05229) usually found in bacterial NRPSs. CM_10197 has homology to the hybrid NRPS/PKS protein Ch_NPS7/PKS24 (Table 5). CM_10647 is like an AAR, proteins that are phylogenetically close to multimodular NRPSs and involved in lysine biosynthesis in fungi [40]. CM_8006 has similarity to a less conserved NRPS in Cochliobolus species, the multimodular Ch_NPS3. Three additional NRPS-like proteins (CM_9450, CM_4038, and CM_6570) were found that are similar to hypothetical proteins of C. heterostrophus C5 and C. miyabeanus WK-1C. Each of the four AMP domains of CmTG12bL2 for NPS2 and NPS4 proteins grouped with the corresponding domain in the other three Cochliobolus isolates. This was also the case for the AMP domains of the CmTG12bL2 homolog of less conserved multimodular NPS3. The monomodular NPS11 was not present in CmTG12bL2, nor was it found in C. miyabeanus WK-1C [24]. However, the single NPS11 AMP domain of both C. heterostrophus strains grouped with the first AMP domain (AMP1_2) of the two bimodular NRPSs: GliP, responsible for gliotoxin production in A. fumigatus (EAL88817), and SirP, responsible for sirodesmin polysaccharides and substituted moieties by pectinolytic enzymes. Homogalacturonan: endo-and exopolygalacturonase, pectin and pectate lyase, pectin acetylesterase, pectin methylesterase; Xylogalacturonan: endo-and exogalacturonase, β-xylosidase; Rhamnogalacturonan: rhamnogalacturonase, rhamnogalacturonan lyase, rhamnogalacturonan acetylesterase, α-L-rhamnosidase, β-1,4-galactosidase, β-1,4-galactanase, β-glucuronidase, α-L-arabinofuranosidase, α-L-arabinanase, β-1,6-galactosidase, β-1,3-galactanase α-L-fucosidase. The glycan symbol nomenclature used was according to [85].  Table 5. In Cochliobolus species, NPS1, NPS3, and NPS13 proteins are discontinuously distributed. Their AMP domains display duplications/deletions, fusions, and/or recombinations that are thought to give rise to novel NRPSs within these expanded clades [24]. CM_3163 is a tetramodular protein that has a homolog in C. miyabeanus WK-1C (W6YWH5) and belongs to this expanded group. We analyzed the placement of the AMP domains with regard to those in C. miyabeanus WK-1C, and the two C. heterostrophus strains (C4 and C5) (S5 Fig). Each AMP domain of the NPS1/NPS3/NPS13 NRPS-like proteins from both C. miyabeanus isolates always clustered together. Additionally, the AMP1_4 of CM_3163 grouped with maximum bootstrap support (100%) to AMP1_1 of the pseudogene Ch_NPS13 while AMP2_4 and  Most fungal polyketides are synthesized by type I PKSs and are monomodular enzymes with multiple catalytic domains carrying out repeated biosynthetic reactions. These rapidly evolving proteins catalyze the condensation of Co-enzyme A, either from CoA thioesterified carboxylic acids in reducing PKSs, or from acetyl-or malonyl-CoA in nonreducing PKSs [41] to form carbon chains or cyclic forms of varying lengths. A minimal module of PKS consists of a β-ketosynthase (KS), acyltransferase (AT), and acyl carrier protein (ACT) or PPT attachment site (PP) whose roles in the enzyme have been previously described [41]. Additional modules with specific functions, β-ketoreductase (KR), dehydrogenase (DH), enoyl reductase (ER), methyltransferase (ME), and thioesterase (TE) add modifications or cyclize the polyketide compounds. Based on the modules present, the PKSs can be subdivided into reducing or non-reducing enzymes, the latter lacking some or all of the reducing domains (KR, DH, and ER) [41].
Nineteen PKSs were identified in the CmTG12bL2 genome. Overall, 15 proteins matched the C. heterostrophus C4 PKS set (Ch_PKS). Those included ten (Ch_PKS3, Ch_PKS5, Ch_PKS9, Ch_PKS12, Ch_PKS14, Ch_PKS15, Ch_PKS18, Ch_PKS19, Ch_PKS23, and Ch_PKS24) that are common to all Cochliobolus species and additional five (Ch_PKS6, Ch_PKS7, Ch_PKS8, Ch_PKS21, and Ch_PKS22) with a discontinuous distribution among the species of the genus. Four others are homologs of PKS6, PKS13, PKS14, and PKS16 of C. sativus ( Table 5). Predictions of protein domains, excluding PKS24, indicate that eight PKSs have the seven ancestral modules of the fully reduced type I PKSs, nine are partially reducing (lacking one or more DH, KR, or ER domains), and one is a non-reducing type. In the CmTG12bL2 genome, two of the 21 PKSs reported for C. miyabeanus WK-1C were not detected, including the duplicated and expanded PKS14. Overall, the number of PKSs and NRPSs found in the CmTG12bL2 genome was similar to the number in other members of the Pezizomycotina, and the organization of structural domains are identical or similar to those found in other Cochliobolus species.

Predicted transporter and detoxification systems
Cytochrome P450 monooxygenases (CYPs) are heme-thiolate enzymes involved in degradation and detoxification of xenobiotics [91]. CYPs catalyze chemical modifications in lipophilic compounds from primary and secondary metabolism to create more hydrophilic derivatives. We identified 113 CYPs in the CmTG12bL2 genome (S7 Table), similar to the number found in other members of the Pezizomycotina [92]. Examples of enzymes predicted to participate in primary metabolism include eburicol 14-α demethylase (CM_837) that is a member of the CYP51 family, a sterol C-22 saturase from the fungal-specific CYP61 family (CM_7374) involved in membrane ergosterol biosynthesis, members of the CYP52 family (CM_10978, CM_255, and CM_1621) that catalyze initial degradation of n-alkanes and fatty acids, and a CYP56 family member (CM_6988) that participates in formation of the N, N'-bisformyl dityrosine spore cell wall component [91]. The CmTG12bL2 genome encodes CYPs from families known to be involved in biosynthesis of aflatoxins (CYP59 and CYP62), fumonisins (CYP65 and CYP505), tricothecenes (CYP65, CYP68, and CYP526), and gibberellin (CYP68) (S7 Table). Additionally, other CYPs could be involved in detoxification and degradation of xenobiotics. For example, CM_10633 is similar to one of the pisatin demethylase genes, PDA6-1 from Nectria hematococca (CYP57) involved in degrading the pterocarpan phytoalexin produced by Pisum sativum [93]. CM_8068 is a likely ortholog of CYP53A15 from C. lunatus, a benzoate para-hydroxylase with O-demethylase activity that catalyzes degradation of benzoic acid and derivatives as well as natural toxins [94]. CM_6446, a CYP504A member, and CM_5407 from family CYP504B have similarity to phacA, a phenylacetate 2-hydrolase from Aspergillus nidulans [95] and to phacB, a 3-hydroxyphenylacetate 6-hydrolase [96], respectively. Both participate in phenylacetate degradation to Krebs cycle intermediate compounds.
Fungal transporters provide resistance to a variety of drugs, exogenous mycotoxins and fungicides, and contribute to pathogenicity by delivering mycotoxins outside fungal cells [97]. In the CmTG12bL2 genome, as in other Ascomycetes, the major facilitator superfamily (MFS) and the ATP-binding cassette (ABC) superfamily, with 279 and 46 genes respectively, account for most of the transporters identified. The CmTG12bL2 MFS transporters are distributed in 22 families (Fig 2). Five families have the majority of the predicted MFS proteins: the sugar porter (SP), the Drug:H + antiporter-1 (12 Spanner) (DHA1), the drug:H + antiporter-2 (14 Spanner) (DHA2), the monocarboxylate porter (MCP) and the anion:cation symporter (ACS). Examples of MFS found include a monosaccharide transporter similar to MstC, the low affinity glucose:H + symporter from A. niger [98], and a high-affinity xylose-proton symporter GXS1 [99], a protein with similarity to Flr1 of Saccharomyces cerevisiae, a MFS involved in resistance to the antifungal drug fluconazole and several chemically unrelated drugs [100], a nicotin acid permease, and H + :biotin symporter involved in uptake of vitamins [101].
Transporters within the ABC superfamily typically have two homologous components each containing a nucleotide-binding domain (NBD) and six transmembrane-spanning helices that form the transmembrane domain (TMD 6 ); however, half-size transporters are also found [97,102]. Most of the CmTG12bL2 ABC transporters are in the ABC1 and ABC2 superfamilies (Fig 3), with the majority within the families that participate in efflux of toxins [103]. Predicted CmTG12Lb2 proteins are similar to those in the multidrug resistance exporter family, ABCB (i.e., Mdr1 from A. fumigatus that confers resistance to antifungal compounds [97]), and the pleiotropic drug resistance family ABCG (i.e., AtrB transporter of A. nidulans associated with resistance to cycloheximide, a variety of fungicides, and toxins including antifungal plant compounds [102]). Additional CmTG12Lb2 proteins within the ABCG family have similarity to CDR1 and CDR2 of Candida albicans, that play a role in resistance to antifungal azoles; and to STS1, the suppressor of sporidesmin toxicity, of Saccharomyces cerevisiae that confers resistance to sporidesmin and others drugs such as cycloheximide [104]. Putative homologs of the metal resistance protein YCF1 from S. cerevisiae in the ABCC family are also present.

C. miyabeanus transcriptome in planta
As expected, the vast majority of the reads generated were of plant origin. The totality of reads mapped to the CmTG12bL2 genome yielded, depending on the particular replicate, 1.6% to 3.0% fungal reads. Pairwise correlation coefficient for log 2 transformed RPKM between biological replicates ranged from r = 0.87 to r = 0.93, indicating low variation between samples. The average number of CmTG12bL2 transcripts at 48 hai across replications was 10,787 ± 70, of which 10,674 that were present in at least three out of the four replicates, were selected for further analyses. Average log 2 transformations of the transcripts showed a double exponential distribution (data not shown). For the enrichment analysis (Fisher test), we only investigated transcripts (1,039) located in the three upper quartiles of the distribution.
We also found 32 SSP-encoding genes with high expression including five that were overrepresented and belonged to the GO categories of "extracellular region", "structural molecular activity", "heterocyclic compound binding", and "catalytic activity". Within the group of effector homologs, only t_Cm_2799 a putative Ecp6 homolog, was over-represented.
No NRPS or PKS genes were over-represented in the transcriptome at 48 hai, and only PKS7 was within the most expressed genes.
Among CYPs, seven over-represented transcripts were associated with metabolic processes, cell and heterocyclic compound binding. These included monooxygenases from the CYP504 family (t_Cm_6446), putatively involved in phenylacetate degradation, and CYP53 (t_Cm_8068), a benzoate para-hydroxylase, and transcripts of the CYP532 family associated with xenobiotic metabolism, as well as members of families CYP552, CYP68, and CYP52.

qRT-PCR experiments
Transcript accumulation at 48 hai of the six genes selected from the enriched pool, Ecp6, CYP53, salicylate hydroxylase, β-1,4-endoglucanase, β-1,4-glucosidase, and β-1,4endoxylanase were validated by qRT-PCR experiments. The relative quantification of the expression of those genes in planta at 24 hai and 48 hai was always significantly higher than that in vitro. In addition, levels of expression of CYP53, β-1,4endoxylanase and β-1,4-glucosidase were similar at the two time points considered, while Ecp6 and salicylate hydrolase were higher at 48 hai

Discussion
Here, we report the de novo genome assembly of a strain of C. miyabeanus, CmTG12bL2, isolated from American wildrice. We also assembled and studied the fungal transcriptome during the initial phases of plant infection and colonization. C. miyabeanus is found infecting Oryza species all over the world [5], and in North America it has the potential to cause yield limiting disease on American wildrice [3] and switchgrass [4]. Despite its importance, relatively little is known about the mechanisms of pathogenicity, or host-pathogen interactions [105]. Unlike other Cochliobolus pathogens, C. miyabeanus does not appear to utilize HSTs during plant colonization [24,106]. In our transcriptome analysis we focused on validating the gene repertoire of the genome analysis and identifying transcripts potentially involved in pathogenicity and others produced in response to host defenses.
The  [24,28]. The length of our assembly is similar to the C. miyabeanus WK-1C reference genome (31.4 Mb) downloaded from the JGI website (January, 2014; http://genome.jgi.doe.gov/Cocmi1/Cocmi1.download.html) and reported by Condon et al. [106]. Importantly, 92% of the CmTG12bL2 predicted proteins are putative homologous to those in WK-1C, including NPRS, PKS, and most of the SSPs. The majority of the proteins in the set that did not match those in the rice isolate were hypothetical proteins, while the rest (1%) carried known domains and could be associated to DNA replication (DNA polymerases, exonucleases) and repair (Rad51 proteins, endonucleases), retrotransposon activities (retrotransposon gag proteins, integrases, and reverse transcriptase), transcription factors (Zinc-finger, CCHC-and C2H2-types) and proteins of unknown functions (DUF domains). The CmTG12bL2 genome shares a set of proteins with other Cochliobolus species that are involved in plant-pathogen interaction, including pathogenicity and virulence. Even though most of them were expressed, and some of them at high levels such as the trypsin-like serine protease Alp1 [61], they were not found statistically over-represented in the transcriptome at 48 hai.
Overall, a relatively large number of the CmTG12bL2 genes (~10%) were annotated as potentially being secreted. Many of those genes have roles in host recognition and/or pathogenicity, and were highly expressed or statistically overrepresented in the transcriptome analysis. Some examples include hydrophobins involved in masking spore recognition by hosts and in sensing the host surface [62], and predicted cutinases that catalyze the disruption of the fatty acid cutin, the main constituent of plant cuticle. The CmTG12bL2 genome harbored a substantial number of gene-encoding CAZymes with more than 8% of their transcripts in the enriched transcriptome pool that could facilitate ingress and colonization of wildrice tissues. Those included enzymes catalyzing reactions for degradation of cellulose, such as endoglucanases, cellobiohydrolases, and β-glucosidases, together with lytic-polysaccharide monooxygenases of family AA9 that are responsible for oxidative cleavage of cellulose. A few AA9 CAZymes are bound to non-catalytic CBM1, credited with providing a more efficient binding to cellulose [83]. Others were enzymes able to depolymerize other plant cell wall glycans, particularly xylan, such as xylanases and β-galactosidases, as well as acetylxylan and feruloyl esterases. The upregulation of some of these genes in planta was validated by qRT-PCR analyses. Additionally, a transcript that could be involved in sucrose degradation was highly expressed. Plant sucrose derivatives are used as carbon or energy sources by fungi. Transcripts of predicted MFS transporters putatively involved in sugar uptake and transfer were overrepresented, including a putative homolog of the glucose/xylose symporter1 (GXS1) [99], a low affinity glucose monosaccharide transporter (MstC), [98], and a sugar and polyol transporter (SPT1) that could facilitate growth and adaptation to a variety of nutrient conditions. A reduced percent of the fungal secretome (17.5%) consisted of small cysteine-rich proteins of less than 200 aa. SSPs are important for pathogenic actions since they can play roles as effectors promoting disease and/or altering host defense mechanisms [107]. Over 96% of all SSPs were proteins that matched those of the rice C. miyabeanus WK-1C isolate. At 48 hai, 17% of SSPs were highly expressed but only a few were over-represented in the transcriptome enrichment analysis. Within those few having known domains are SSP predicted to be involved in ribosomal activities (guanyl-specific ribonuclease Pb1; translation initiation factor SUI1), assistance in protein folding (peptidyl-prolyl cis-trans isomerase FKBP10), and cell wall stability and resistance to antifungal agents (cell wall mannoprotein PIR3). One particular SSP contained a cerato-platinin (CP) domain. Proteins with CP domains are preferentially located in fungal cell walls, and participate in growth and development in many fungi [108,109,110]. Other roles for CP proteins are in plant-fungal interactions not only as elicitors of plant defense mechanisms but also as effectors because they are able to bind chitin and its oligomers, thus, avoiding PAMP-triggered immunity detection during plant invasion [110]. A putative homolog to the extracellular effector Cf_Ecp6 of Cladosporium fulvum [67] is overrepresented at 48 hai. The higher expression of this gene in planta compared to that in vitro was confirmed in our qRT-PCR analysis. Cf_Ecp6 has high affinity for short chitin oligosaccharides and competes efficiently with extracellular plant receptors for degraded chitin fragments preventing activation of PTI in plant hosts [69]. In our transcriptome analysis at 48 hai we identified plant transcripts (data not shown) with very high homology to OsCERK1, the rice chitin elicitor receptor kinase 1. OsCERK1 forms a hetero-oligomer receptor complex with a glycoprotein, chitin-binding receptor CEBiP that upon recognition of chitin fragments triggers production of reactive oxygen species, diterpenoid phytoalexins, and expression of basal defense genes in rice [111,112]. Even though a transcript similar to CEBiP was not identified at this time point, four other wildrice LysM containing proteins (receptor-like proteins) were expressed. Thus, C. miyabeanus might avoid elicitation of wildrice defense mechanisms by sequestering chitin fragments.
Proteins synthesized by NRPS and PKS genes are important in fungal fitness, response to environmental cues, and interaction with other organisms [86]. In other Cochliobolus species a great number of these metabolites are HSTs [12,14,21,22,24]. A subset of NRPS genes are discontinuously dispersed among Cochliobolus species, and could evolve rapidly by recombination, rearrangement and gain/loss of their AMP domains [40] to produce new phytotoxins. For instance, a protein of this class found in C. sativus pathotype 2, is thought to play a major role in causing virulence on barley cultivar Bowman [24]. One CmTG12bL2 protein (CM_3163) belongs to the rapidly evolving and expandable NPS1/NPS3/NPS13 group and has a homolog in the WK-1C strain. In the phylogenetic analysis, adenylation domains of CM_3163 are closely related to AMP1_1 of Ch_NPS13 (CmTG12bL2 AMP1_4), AMP2_4 of Ch_NPS3 (CmTG12bL2 AMP2_4 and AMP3_4) and AMP3_4 of Ch_NPS3 (CmTG12bL2 AMP4_4). Both, duplication of portions of NRPS genes (i.e. NRPS3), and either recombination or gene fusion (i.e. NRPS13) may have generated the two genes unique to C. miyabeanus strains. The fact that AMP domains of the mono-modular NPS11 from C. heterostrophus and a monomodular NRPS-like gene from C. miyabeanus clustered separately, the first one with AMP1_2 and the second with AMP2_2 of bi-modular NRPSs GliP of A. fumigatus and SirP of Leptosphaeria maculans, suggests differential retention of AMP domains from an ancestral bi-modular NRPS gene.
Overall, 19 PKSs were detected within the CmTG12bL2 predicted set of proteins, including the 10 PKSs common to all Cochliobolus species. However, CmTG12bL2 isolate has only one of the two PKS14 genes found in C. miyabeanus WK-1C [24]. Overall the number of NRPS and PKS found in our assembly is within the ranges of these proteins reported for Cochliobolus species [24]. The wildrice isolate appears to contain the same suite of NPSs and a similar number of PKSs as the rice C. miyabeanus WK-1C isolate [24] with no additional unique genes.
Cytochrome P450 monooxygenases have been associated with adaptation of fungi to new niches because they catalyze the degradation of chemical substances, and might facilitate pathogenesis [92]. Reduction in phenolic compounds content has been reported in rice leaves at 48 hai with a pathogenic strain of C. miyabeanus [113]. Three enzymes with homology to benzoate 4-monooxygenases (CYP548, CYP552, and CYP583) and another similar to a benzoate para-hydrolase of family CYP53 were highly expressed during CmTG12bL2 colonization of a wildrice cultivar with improved resistance to FBS. A putative ortholog, BPH, from C. heterostrophus was upregulated during maize infection, suggesting that maize defenses could involve benzoate biosynthesis in [114]. In C. lunatus, a similar protein (bph) [94], was shown to be a key enzyme in benzoate detoxification in a dose-and time-dependent manner. Thus, some CmTG12bL2 cytochrome P450 monooxygenases could participate in degrading phenolic compounds synthesized by wildrice during fungal infection. A putative salicylate hydroxylase was upregulated at 24 hai and at 48 hai. Similar proteins in Fusarium graminearum degrade the signaling molecule salicylic acid, necessary for plant defense [115]. Salicylic acid-induced transcripts were found in the wildrice transcriptome (data not shown) and could be related to defense against the fungus. Identification of fungal genes implicated in the detoxification or inhibition of host compounds is important for identifying the plant defense mechanisms used for counterattack of fungal colonization.
Our results are in agreement with a study of proteomics of C. miyabeanus during infection on rice [105] and the Bipolaris sorghicola transcriptome during sorghum infection [116]. Particularly the findings of fungal oxidative stress activity, expression of CAZymes such as α-Larabinofuranosidase, xylanase and glucanase, acetylxylan esterase, and LysM domain containing proteins, as well as expression of cutinases, and Alp1 transcripts, suggesting commonalities of virulence mechanisms among Cochliobolus/Bipolaris species infecting monocots. The CmTG12bL2 genome and transcriptome assemblies and analyses contribute to a better understanding of fungal pathogenicity and open new avenues for targeted mutagenesis in this pathosystem (i.e. cysteine-rich SSP, putative effectors molecules, and CYPs). Further, it offers genomic resources for the interpretation of the C. miyabeanus infected wildrice transcriptome.

Conclusions
The genome of Cochliobolus miyabeanus isolate CmTG12bL2 pathogenic on wildrice was sequenced using Illumina short-read technology, together with the transcriptome in planta after infection. Proteins involved in ROS scavenging, plant tissue degradation and carbohydrate binding, SSPs, a known effector, and detoxification systems were expressed in the infected leaves. This study advances our understanding of fungal pathogenicity on a distant relative of rice, American wildrice. The predicted gene and protein sets will facilitate targeted mutagenesis to infer the functions of pathogenicity and effector genes, as well as comparative transcriptomics of Cochliobolus pathogens when colonizing different hosts (i.e. wildrice, switchgrass, and common rice). Further, it may help in refining host breeding strategies for developing more effective genetic resistance against C. miyabeanus.  Table. Basic statistics and quality measures of the Cochliobolus miyabeanus TG12bL2 sequencing process. a The overall % GC of all bases in the sequences followed a normal distribution overlapping the theoretical distribution. Consistently, there were no overrepresented sequences or k-mers. b Quality scores across all bases sequenced. c Percentage of duplicated sequences relative to unique sequences indicating that some sequences had 10 or more duplicates. (DOCX) S2 Table. Best hits of C. miyabeanus TG12bL2 genes to the CEGMA (Core Eukaryotic Gene Mapping Approach) gene list from six eukaryotic genomes: Homo sapiens (Hs), Drosophila melanogaster, Caenorhabditis elegans (CE), Arabidopsis thaliana (At), Saccharomyces cerevisiae (Y), and Schizosaccharomyces pombe (SP). Function. Cellular processes and signaling: M: Cell wall/membrane/envelope biogenesis, O Posttranslational modification, protein turnover, chaperones, T: Signal transduction mechanisms, U: Intracellular trafficking, secretion, and vesicular transport, V: Defense mechanisms, Y: Nuclear structure, Z: Cytoskeleton. Information storage and processing: A: RNA processing and modification, B: Chromatin structure and dynamics, J: Translation, ribosomal structure and biogenesis, K: Transcription, L: Replication, recombination and repair. Metabolism: C: Energy, D: Cell cycle control, cell division, chromosome partitioning, E: Amino acid transport and metabolism, G: Carbohydrate transport and metabolism, H: Coenzyme transport and metabolism, I: Lipid transport and metabolism, P: Inorganic ion transport and metabolism, Q: Secondary metabolites biosynthesis, transport and catabolism. Poorly characterized: R: General function prediction only, S: Function unknown. (XLSX) S3 Table. Comparison of small secreted proteins in C. miyabeanus CmTG12bL2 to those reported for eighteen Dothidiomycete genomes. (a) HMMER using hmmscan algorithm to compare protein sequences against collections of profiles (only Pfam protein family was used). Proteins with transmembrane regions as detected by Phobius were removed from the analysis. (b) Values of HCSSPs calculated as in [28] (as least twice the general average of cysteines in the CmTG12bL2 proteome), and (c) as in [24] (over 2% of cysteines). All SSPs were under 200 amino acids in length. (XLSX) S4 Table. Pfam domains identified in C. miyabeanus TG12bL2 small secreted proteins. Pfam domains were identified using HMMER software (http://hmmer.janelia.org) with the algorithm/program hmmscan and a gathering cut off to compare protein sequences against collections of profiles (only Pfam protein family was used). Proteins with transmembrane regions as detected by Phobius were removed from the analysis. (XLSX) S5