Transcriptome-wide analysis reveals the progress of Cordyceps militaris subculture degeneration

Background The entomopathogenic mushroom Cordyceps militaris is an important medicinal and food resource owing to its various medicinal components and pharmacological effects. However, the high frequency of strain degeneration during subculture seriously restricts the large-scale production of C. militaris, and the mechanism underlying strain degeneration remains unclear. In this study, we artificially cultured C. militaris for six generations and compared changes during fruiting body growth. The transcriptome of six generations of C. militaris strains were sequenced with the Illumine Hiseq4000. Results The subcultured C. militaris strains degenerated beginning at the third generation, with incomplete fruiting body growth beginning at the fourth generation. Over 9,015 unigenes and 731 new genes were identified. In addition, 35,323 alternative splicing (AS) events were detected in all samples, and more AS events occurred in the second, fourth and sixth generations. Compared with the first generation, the third generation (degenerated strain) included 2,498 differentially expressed genes (DEGs) including 1,729 up-regulated and 769 down-regulated genes. This number was higher than the number of DEGs in the second (1,892 DEGs), fourth (2,006 DEGs), fifth (2,273 DEGs) and sixth (2,188 DEGs) generations. Validation of RNA-seq by qRT-PCR showed that the expression patterns of 51 DEGs were in accordance with the transcriptome data. Conclusion Our results suggest that the mechanism of C. militaris strain degeneration is associated with gene involved in toxin biosynthesis, energy metabolism, and DNA methylation and chromosome remodeling.


Introduction
Cordyceps militaris, a tonic herb used in traditional Chinese medicine, is an economically valuable, edible entomopathogenic mushroom that is used clinically as a substitute of Cordyceps

Strain and sample preparation
Wild-type of C. militaris, designated YCC, was collected from Maoshan hilly area (at altitude 267.5 m) without specific permissions, Jiangsu province, China and isolated by tissue separation in the asexual phase by the Laboratory of Hi-Tech Processing of Sericultural Resources, Sericultural Research Institute, Chinese Academy of Agricultural Sciences, China. For the collection of different generations following asexual development, the strain was inoculated onto potato dextrose agar (PDA) plates (20% potato, 2% dextrose, 1.5% agar and 1% peptone, w/v) and incubated at 23˚C on a shaker at 150 rpm for 7 d in the dark. Mycelia were inoculated into rice medium, cultivated in the dark at 23˚C for 10 d, and then kept at 23˚C under a 17:7 h dark/light cycle for fruiting body production [18]. The mycelia and fruiting body of this strain were subcultured for the acquisition of second, third, fourth, fifth, and sixth generations, designated YCCZ2, YCCZ3, YCCZ4, YCCZ5, and YCCZ6, respectively. Mycelium from all six generations were collected for DNA and RNA extraction.

RNA extraction and sequencing
Total RNA from each sample was extracted using RNeasy Plant Mini kit (Qiagen Co. LtD., Beijing, China) and the residual genomic DNA was digested using RNase-free DNase I according to the manufacturer's protocol. Poly (A) mRNA was enriched using oligo (dT) beads and fragmented using fragmentation buffer. Finally, 100 ng purified and enriched mRNA was used to construct a cDNA library for each sample using NEBNext1 Ultra TM RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA). cDNA fragments of 200 bps (± 25 bps) were selected and purified by gel-electrophoresis and used as templates for amplification with PCR for end-repair and poly (A) addition. Purified library products were evaluated using the Agilent 2200 Tape Station and Qubit 2.0 software (Life Technologies, Carlsbad, CA, USA). RNA-Seq was performed at Gene Denovo Co., Ltd. (Shenzhen, China) using the HiSeq TM 4000 (Illumina, San Diego, CA, USA).

Transcriptome assembly and function annotation
Prior to sequencing, raw data were filtered to produce high-quality clean data. Clean reads were then assembled de novo into longer contigs based on overlapping regions using the Trinity platform (http://trinityrnaseq.sourceforge.net/) [19]. All subsequent analyses were performed using clean data. For annotation, clean data were mapped to the C. militaris genomic data (BioProject acession no. PRJNA225510) from the NCBI transcriptome reference database.
All splice junction sites of the same gene (! 5 reads) were determined and compared with the reference splice junction sites ( 1 base) using TopHat align to distinguish new alternative splicing (AS) events. DEGs were filtered and identified according to the edgeR criteria (http:// www.Bioconductor.org/packages/release/bioc/html/edgeR.html). Functional annotation (GO terms) were downloaded from Uniprot database (http://www.uniprot.org/uniprot). For GO enrichment analysis, GO annotations for each DEG were retrieved by mapping to GO terms in the database at http://www.geneontology.org. For KEGG pathway analysis, KEGG orthology terms for DEGs were retrieved from the KEGG pathway database (http://www.genome.jp/ kegg/). Cluster analysis of gene expression patterns was performed using cluster software [20] and Java Treeview software [21]. All expressed genes in the current transcriptomes were annotated based on BLAST homology searches and searched against the Swiss-Prot and TrEMBL databases by double-direction BLAST. To further explore the function of DEGs in different samples, KEGG enrichment analyses were performed using hypergeometric tests with the Blastall software.
Validation of RNA-Seq by quantitative real-time PCR qRT-PCR was carried out to validate the quality of RNA-Seq data. Total RNA was extracted as described above. Each RNA sample was treated with RNase-free DNase I (TaKaRa, Shiga, Japan) following the manufacturer's protocol to remove any residual genomic DNA. DNase Itreated RNA (2 mg) was subjected to reverse transcriptase using oligo (dT) primer and Prime-Script TM Reverse Transcriptase (TaKaRa, Shiga, Japan) according to the manufacturer's protocol. Total RNA (2 μg) was used to synthesize cDNA with PrimeScript™ RT reagent Kit (Perfect Real Time, TaKaRa, Japan). The C. militaris housekeeping gene, glyceraldehyde-phosphate dehydrogenase (Cmgapdh), was used as an internal control for normalization. Primers for the qRT-PCR of 51 DEGs were designed with Premier 6.0 software and are shown in S1 Table. Three biological replicates were performed per sample.

Data analysis
Sequence similarity was analyzed using Blast software. Multiple sequence alignment analysis was carried out with Clustalx W software. All experiments were repeated at least three times, and the results are expressed as mean ± S.D. Statistical evaluation was done using ANOVA (SPSS 18.0), with p < 0.05 considered significant.

Changes in the morphological characteristics of the C. militaris fruiting body in different generations
The growth of C. militaris fruiting body is an important morphological characteristics in evaluating strain degeneration during subculture. C. militaris was subcultured in rice medium for six generations, and changes in morphological characteristics of fruiting body were observed after 40 days of cultivation (Fig 1). In the third generation, the fruiting body became stronger and shorter, while the size of fruiting body was significantly reduced in the fourth generation. The strain appeared to be completely degenerated in the fifth generation, and only mycelia grew in the sixth generation. As shown in Fig 1, the subculture of C. militaris induced strain degeneration beginning in the third generation.

Transcriptome sequencing and assembly
As shown in Table 1, the average number of clean reads per sample was 53,118,184. After quality filtering, over 6×10 9 bps of clean data was obtained across the different samples (S2 Table),  and more than 98% HQ clean reads were identified (S1 Fig). About 90% reads were mapped to the C. militaris genome sequence (S3 Table). In total, 9,651 genes were detected in all transcriptomes, including 9,015 known genes and 731 novel genes. The number of new gene in each generation from YCCZ1 to YCCZ6 was 668, 705, 699, 704, 702 and 699, respectively ( Table 1). The six samples showed similar matching results, with~90% of reads matching the predicted genes in the genome and~70% being unique matches (S2 Fig). The above results revealed that our sample data were reasonable with good correlations among biological replicates and that sequencing data could be used for following analyses.  (Fig 2A). The most DEGs were thus observed in the third generation, when the C. militaris fruiting body began to experience abnormal growth (Fig 1), indicating that these DEGs may play key roles in strain degeneration. In addition, a scatter plot was protracted based on DEGs analyses. Compared with YCCZ1, difference in expression levels of the DEGs in YCCZ3 and YCCZ5 was significantly higher than those in YCCZ1 and YCCZ2 (Fig 2B-Fig 2F). Furthermore, a correlation heat map (CHM) was constructed and sample cluster analysis was performed based on DEGs in different samples. The heat map of significant DEGs shows the relationship of samples YCCZ1 to YCCZ6 (Fig 3). Taken together, all results suggest that these specific DEGs may be involved in the strain degeneration progress.

Analyses of differentially expressed genes (DEGs)
GO and KEGG pathway analyses of AS genes GO analysis was performed to summarize and explore the functional categories of the genes that underwent AS in different samples. GO terms were annotated for a total of 5,014 AS genes in YCCZ1, 6,230 in YCCZ2, 5,638 in YCCZ3, 6,170 in YCCZ4, 5,951 in YCCZ5 and 6,320 in YCCZ6 (S3 Fig). The Intergenic (10,863) and IR (5,222) AS events predominated, accounting for 30.8% and 14.8%, respectively, of all AS events. The number of AS genes in YCCZ2, YCCZ4 and YCCZ6 was higher than those in other samples. The proportions of enriched GO terms among AS genes were globally similar between YCCZ1 (wild-type) and YCCZ5 (degenerated) based on biological process, molecular function, and cellular component ontolopies (Fig 4). For the biological processes category, genes with AS events were predominantly enriched in GO terms that were relevant to cellular process and metabolic process. For the molecular function category, most AS genes were annotated with the term binding, followed by the term catalytic activity. Moreover, 6,944 AS genes were annotated with KEGG pathways information ( Table 2, S4  Fig). Enriched pathway analysis showed that these genes were significantly enriched in 17 pathways mainly relating to antigen processing and presentation, cell adhesion molecules, phagosome, endocytosis, and natural killer cell-mediated cytotoxicity.

Validation of DEGs by qRT-PCR
The expression of 51 DEGs was validated using qPCR, as shown in Fig 5. According to qRT-PCR, the DEG expression levels were consistent with those of obtained using RNA-Seq. Among these genes, 32 genes were up-regulated and 13 were down-regulated. Functional relative analysis shown that 15 of these genes are involved in toxin biosynthesis and detoxification (Fig 5C), 14 genes are related to carbohydrate and energy metabolism, nine genes are involved in DNA methylation and chromosome remodeling (Fig 5A), eight genes are involved in biosynthesis of secondary metabolites (Fig 5B), and six genes are involved in fruiting body formation, sexual development and light-induced brown film formation (Fig 5D). These results suggest that C. militaris strain degeneration involves genes related to toxin biosynthesis, energy metabolism, DNA methylation, and chromosome remodeling.

Discussion
C. militaris,a model fungus belonging to the Ascomycetes,is an entomogenous fungus that forms a fruiting body on several types of media, such as silkworm pupa, solid rice medium, germinated soybean medium, and soybean broth. It also forms mycelia during submerged fermentation, producing several bioactive substances [19,22].
Irreversible C. militaris strain degeneration is a common phenomenon that occurs with high frequency during the process of subculture and preservation. Such degeneration seriously affects large-scale industrialized production of C. militaris, leading to considerable economic losses [23]. There are many factors that may lead to fungal degeneration, including gene mutations, changes in mating type, DNA methylation, and fungal and viral infections [24]. Our previous work indicated that mutations in the 18S rRNA gene and mating-type (MAT) region, as well as down-regulated of CmMAT gene expression levels, may play important roles in C. militaris degeneration [25]. Quantitative real-time PCR analysis detected no CmMAT1-2-1 gene expression in the degenerated strain. Expression levels of the CmMAT1-1-1 and CmMAT1-1-2 genes were significantly down-regulated to only 7.5% and 4.4%, respectively, that of the wildtype strain.
The C. militaris genome (147 × coverage) is predicted to have a total genome size of 32.2 Mb and over 9,684 genes [17]. In this study, 9,746 genes were detected, including 731 novel genes, which fulfilled the criterion for it being referred to as the C. militaris genome, suggesting that the genome size and gene number in C. militaris is similar to those of Metarhizium  anisopliae (10,582 genes) [26] and Metarhizium acridum (9,849 genes) [27]. In addition, over 5,000 novel AS events of seven AS types were detected in each of the subcultured strains, indicating significantly higher rates of AS than in that observed in the wild-type strain. AS is an important mechanism for regulating gene expression and generating proteome diversity [28,29]. Previous studies have shown that AS plays a decisive role in the generation of receptor diversity and regulation of growth and development [30]. Many genetic diseases have been closely linked to higher than normal rates of AS [31]. Therefore, we speculated the higher AS rates in filamentous fungi might be involved in the progression of strain degeneration. Gene mutation [32], methylation modification [33], and DNA recombination can alter a strain's genotype and can induce strain degeneration. In this study, we characterized and manually annotated the functions of four DEGs involved in methylation modification and in DNA recombination and repair. Methyltransferase type 11 (CCM_00251) and 6-O-methylguanine-DNA methyltransferase (CCM_02107) were significantly up-regulated in later generations and are crucial for genome stability, preventing mismatch during DNA replication and transcription [34,35]. Two DNA repair genes, DNA excision repair protein (CCM_00249) and DNA double-strand break repair/VJ recombinationXRCC4 (CCM_04567), were also up-regulated in later generations. Degenerated strains exhibit traits that are generally heritable, including significantly reduced sporulation, inability to form normal fruiting bodies, and significantly reduced secondary metabolites contents. Fruiting body and light-induced brown film formation areimportant characteristics that are altered in C. militaris strain degeneration [14]. Here, we analyzed the DEGs involved in fruiting body and light-induced brown film formation and found that the expression of α-1,3-mannosyltransferaseAlg3 (CCM_04231) was up-regulated while that of sexual development activator VeA (CCM_04531)was down-regulated. The expression of lightinduced brown film formation-related genes phosducin I (CCM_05237) and phosducin II (CCM_09459) were significantly up-regulated beginning at the second generation.
Toxic substances such as mycotoxins, active oxygen and other nonnutritional xenobiotic compounds, naturally exist in the environment and can be harmful to organisms [40]. Some scholars have speculated that accumulation of toxin and active oxygen molecules may cause fungal strain degeneration [14]. Here, the expression of several genes involved in detoxification was up-regulated during strain degeneration, including that encoding streptothricinacetyltransferase (CCM_00152), which degrades streptothricin [41]; gamma-glutamyltranspeptidase (CCM_02065), which reduces glutathione [41]; MFS multidrug transporter (CCM_ 02974), which confers resistance to antibiotics [42]; glutathione S-transferase (CCM_03461), which processes pesticides, heavy metals, fluoride and eliminating reactive oxygen species (ROS) [43,44]; 30 kDa heat shock protein (CCM_06821), which is involved in stress response, signal transduction, and xenobiotic compounds metabolism [45]; and alcohol dehydrogenase (CCM_09345), which ameliorates the harmful effects of high concentration of ethanol [46]. However, expression of the gene encoding heavy metal tolerance protein (CCM_06182), which binds heavy metals ions, was down-regulated during strain degeneration [47]. In addition, levels of non-nutritional xenobiotic compounds increased during strain degeneration induced by subculture.
The subcultured C. militaris strains degenerated beginning at the third generation, with incomplete fruiting body growth beginning at the fourth generation. Transcriptome analysis on the progress of C. militaris subculture degeneration showed that over 9,015 unigenes and 731 new genes were identified, and 35,323 alternative splicing (AS) events were detected. Compared with the first generation, the third generation (degenerated strain) included 2,498 differentially expressed genes (DEGs) including 1,729 up-regulated and 769 down-regulated genes. Validation of RNA-seq by qRT-PCR showed that the expression patterns of 51 DEGs were in accordance with the transcriptome data. In summary, our results indicated the mechanism of C. militaris strain degeneration is associated with gene involved in toxin biosynthesis, energy metabolism, DNA methylation, and chromosome remodeling.