De Novo Assembly of Auricularia polytricha Transcriptome Using Illumina Sequencing for Gene Discovery and SSR Marker Identification

Auricularia polytricha (Mont.) Sacc., a type of edible black-brown mushroom with a gelatinous and modality-specific fruiting body, is in high demand in Asia due to its nutritional and medicinal properties. Illumina Solexa sequenceing technology was used to generate very large transcript sequences from the mycelium and the mature fruiting body of A. polytricha for gene discovery and molecular marker development. De novo assembly generated 36,483 ESTs with an N50 length of 636 bp. A total of 28,108 ESTs demonstrated significant hits with known proteins in the nr database, and 94.03% of the annotated ESTs showed the greatest similarity to A. delicata, a related species of A. polytricha. Functional categorization of the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways revealed the conservation of genes involved in various biological processes in A. polytricha. Gene expression profile analysis indicated that a total of 2,057 ESTs were differentially expressed, including 1,020 ESTs that were up-regulated in the mycelium and 1,037 up-regulated in the fruiting body. Functional enrichment showed that the ESTs associated with biosynthesis, metabolism and assembly of proteins were more active in fruiting body development. The expression patterns of homologous transcription factors indicated that the molecular mechanisms of fruiting body formation and development were not exactly the same as for other agarics. Interestingly, an EST encoding tyrosinase was significantly up-regulated in the fruiting body, indicating that melanins accumulated during the processes of the formation of the black-brown color of the fruiting body in A. polytricha development. In addition, a total of 1,715 potential SSRs were detected in this transcriptome. The transcriptome analysis of A. polytricha provides valuable sequence resources and numerous molecular markers to facilitate further functional genomics studies and genetic researches on this fungus.


Introduction
Auricularia polytricha (Mont.) Sacc., a type of edible black-brown mushroom rich in various nutrients such as polysaccharides, edible fibers and proteins, is widely distributed in Asia, tropical America, and other regions around the world. A. polytricha has been a common culinary ingredient in East Asia for hundreds years. This species is in high demand in Asia due to its nutritional and medicinal properties [1].
A. polytricha is classified within the Basidiomycota, Agaricomycetes, Incertae sedis, Auriculariales, Auriculariaceae, Auricularia [2]. The fruiting bodies of A. polytricha lacks the obvious tissue differentiation of pileus, gill and stipe, which is distinct from other Agaricomycetes fungi such as Schizophyllum commune, Coprinopsis cinerea, Agaricus bisporus and Lentinula edodes, and the fruiting bodies are able to survive for a long period of drought. Despite the unique appearance in the fruiting body of A. polytricha, it is developed from filamentous mycelium like other mushrooms. Unfortunately, insufficient genomic information is available for A. polytricha, which has limited our understanding of the molecular mechanism underlying the formation and development of its special fruiting body.
In recent years, next-generation sequencing techniques (such as Roche 454, Illumina Solexa GA and ABI SOLID) have emerged as a cutting-edge approach for high-throughput sequence determination, providing powerful and cost-efficient tools for advanced research in many areas, including genome sequencing; genomewide profiling of epigenetic marks and chromatin structure using ChIP-seq, methyl-seq and DNase-seq; metagenomics studies; and de novo transcriptome sequencing for non-model organisms [3,4]. Transcriptomic information is used in a wide range of biological studies and provides fundamental insight into biological processes and applications, such as profiles of the gene expression levels during different space-time conditions, gene discovery and molecular marker mining [5,6,7]. Several studies have reported the genome or transcriptome sequencing of various mushrooms, including S. commune [8], C. cinerea [9], L. edodes [10,11], Ganoderma lucidum [12,13], Agrocybe aegerita [14] and Cordyceps militaris [15,16]. Fruiting body formation is one of the most complex developmental processes in mushrooms [17]. Transcriptomes and comparative expression profiles from different developmental stages have been generated to explore many genes associated with fruiting body formation. A set of transcription factors (TFs) acting downstream of the mating-type loci has been identified in the model mushroom S. commune, and a model for the regulation of mushroom development has been constructed based on the phenotype variations that occur with different mutations of TFs [8,18].
Due to the deficiency of information on the complex molecular mechanisms of A. polytricha development, the objective of the current study was to use the Illumina sequencing technology to determine the transcriptomes of mycelium and fruiting body of A. polytricha. A total of 36,483 ESTs were obtained by de novo assembly, and an analysis of the differentially expressed genes (DEGs) revealed candidate genes involved in the fruiting body formation of A. polytricha, including homologs of TFs which play important roles in the fruiting body development of other mushrooms. The assembled, annotated transcriptome dataset and gene expression profiles provide an invaluable genomic resource for future research in A. polytricha. Furthermore, the ESTbased SSR markers identified in this study can be applied to analyze the genetic diversity and to construct a transcript map of A. polytricha, which will be essential for accelerating mushroom breeding programs in the future.

Mushroom Strains and Cultivation
The dikaryotic A. polytricha strain APM2-16 is a hybrid of two monokaryotic strains, App7 and M2S16. App7 was isolated from the wild dikaryotic strain APTJ6101 by protoplast regeneration, and M2S16 was derived from the cultivated strain MHJY002 by single-spore isolation.
The mycelia of App7, M2S16 and APM2-16 were grown on liquid complete yeast medium (CYM) separately for one week at room temperature [19]. APM2-16 was cultivated on packs of growth medium consisting of 80% sawdust, 19% wheat bran and 1% CaSO 4 in the dark at 25uC. When the packs were fully covered with mycelia, the culture surfaces were wounded using a knife, and all of the packs were transferred to conditions that induced fruiting (light, 15-25uC, .85% humidity). Some tissues like brains, consisted with many 1-3 mm primordia, grown on the wounded position after 5-7 d. The next two or three days, the primordia grown up to 3-5 mm young fruiting bodies, which resembled small cups. Then the following 7-15 d, the cup-like fruiting bodies were growing up and expanding to form ear-like fruiting bodies, which were more than 100 mm in longitudinal diameter and containing many mature basidiospores. The samples containing liquid cultured mycelia, primordia, young fruiting bodies and mature fruiting bodies were immediately frozen in liquid nitrogen and stored at 280uC until RNA extraction.

cDNA Library Preparation and Illumina Sequencing
Total RNA from dikaryotic mycelium and adult fruiting body of APM2-16 was extracted using TRIzol Reagent (Invitrogen Life Technologies, USA) according to the manufacturer's protocol. The integrity of the RNA was detected by agarose gel electrophoresis and the concentration was determined using a NanoDrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). The RNA quality of the dikaryotic mycelia and adult fruiting bodies for RNA-seq was further verified using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). A total of 20 mg RNA was equally isolated from mycelia and adult fruiting bodies for mRNA isolation, cDNA library construction and sequencing according to the manufacturer's instructions (Illumina, San Diego, CA, USA). The two cDNA libraries were sequenced at Beijing Genomics Institute (BGI)-Shenzhen, Shenzhen, China. The raw Illumina sequencing data for the mycelium (accession number: SRX319468) and adult fruiting body (accession number: SRX319472) generated in this study were submitted to the NCBI Sequence Read Archive database (SRA).

De novo Assembly of Sequencing Reads
Before assembly, the 90-bp raw paired-end reads were filtered to obtain the high-quality clean reads by removing adaptors, lowquality sequences (reads with unknown bases ''N'') and reads with more than 20% low quality bases (quality value #10). The clean reads from mycelium and fruiting body were mixed together and de novo assembly of the clean reads was performed using short reads assembling program Trinity [20]. Briefly, clean reads with a certain overlap length were combined to form longer fragments without N (contigs). The individual reads were assigned to the respective contigs by paired-end mapping. Trinity was able to detect contigs from the same transcript and determine the distances between these contigs. Finally, Trinity connected these contigs into sequences that could not be extended on either end, which were called transcripts. When multiple samples from the same species were sequenced, the transcripts assembly should be taken into further process of sequences splicing and redundancy removing. After clustering with BLASTx, the transcripts were divided into two classes: clusters (similarity .70%) and singletons. The clusters were the transcripts produced by alternative splicing or homologous genes. The ESTs containing the longest sequence of every cluster and all of the singletons were used for further bioinformatic analysis.

Gene Annotation and Analysis
The sequence orientations of the ESTs were determined by BLASTx alignment (E value ,10 25 ) against the NCBI non-   above databases, ESTScan was used to predict the sequence direction [21]. For functional annotation, EST sequences were aligned by BLASTx to the above protein databases (E value ,10 25 ). The BLAST results of the best hit were extracted for EST description. Based on the results of nr annotation, the Blast2GO program was employed to obtain GO annotation according to molecular function, biological process and cellular component [22]. The EST sequences were also aligned to the COG database to predict and classify their possible functions [23]. The KEGG database was used to study the metabolic pathways in A. polytricha [24].
For the identification of TF families represented in the A. Polythricha transcriptome, the ESTs were searched against all the TF protein sequences in the Fungal Transcription Factor Database (FTFD, http://ftfd.snu.ac.kr) using BLASTx with an E-value cut-off of ,10 25 [25]. To identify TF families, the protein sequences of all ESTs were functionally annotated using Inter-ProScan [26].

Analysis and Experimental Verification of Differentially Expressed Genes
The clean reads from dikaryotic mycelium and mature fruiting body were mapped on the ESTs respectively. If less than 10 reads from this two different cDNA libraries were mapped back on to one EST, this EST was considered to be expressed in low abudance and did not analyze the diversity of expression level among two developmental stages further. The RPKM (reads per kilobase per million reads) was applied to measure the gene expression levels [27]. The calculated gene expression was used to compare the differences in gene expression between mycelium and fruiting body. The probability of an EST being expressed equally between the two samples was calculated, and the false discovery rate (FDR) method was introduced to determine the threshold pvalue in multiple tests. An FDR #0.001 and an absolute value of the log2 ratio $2 were used as thresholds to determine the significance of gene expression differences between the two developmental stages [28]. For functional enrichment analysis, the p-values for a significant overrepresentation of a particular GO category were calculated using the hypergeometric distribution. Pathway enrichment analysis was performed similarly using the KEGG database.
To assess the reliability of the sequencing-based approach in identifying differentially expressed genes, semi-quantitative RT-PCR was performed to detect gene expression levels using specific primers. Approximately 2 mg of total RNA from mycelium and fruiting body, treated with DNaseI (Fermentas), was used as a template to synthesize first-strand cDNA by reverse transcriptase M-MLV (TaKaRa). The gene-specific primers (Table S1) were designed based on the concerned EST sequences. Actin (comp28675_c0) and GAPDH (comp26162_c0) primers were used to standardize RNA samples for each RT-PCR. The expression of each gene was confirmed by performing independent PCR reaction three times.

Gene Expression Profiles of TFs in Different Developmental Stages
Some TF genes have been determined their important roles in other mushrooms. The homologous TF genes in A. delicata genome was found using BLASTp. According to the functional annotation result of A. polytricha transcriptome, the ESTs with top hit against the corresponding protein of A. delicata were considered the homologous TF genes. The ESTs found in the same BLAST results or with high homology to one another were selected as allelic variants of different parts or alternative spliced transcripts of the same gene. In order to determine the function of homologous TF genes of A. polytricha in the development of fruiting body, the gene expression profile of these genes were estimated using semiquantitative RT-PCR. Total RNA was isolated from the liquidcultured mycelia of App7, M2S16 and APM2-16, as well as the primordia, young fruiting bodies and mature fruiting bodies of APM2-16. About 2 mg of total RNA of six different developmental stages was reverse-transcribed by reverse transcriptase M-MLV (TaKaRa). The gene-specific primers (Table S1) were designed based on one of the EST sequences of homologous TFs. Actin (comp28675_c0) and GAPDH (comp26162_c0) primers were amplified as endogenous loading control for testing the validity of template preparation. The expression of each gene was confirmed in at least three rounds of independent RT-PCR reactions.

In silico SSR Identification
The software SSR Locator was employed to identify mono-to hexa-nucleotide SSR motifs [29]. Mono-nucleotide repeats of more than 15, di-nucleotide repeats of more than 8, tri-nucleotide repeats of more than 5, tetra-nucleotide repeats of more than 4 and penta-and hexa-nucleotide repeats of more than 3 were utilized as search criteria for SSRs.

Sequencing of A. polytricha Transcriptome and De Novo Assembly
For the purpose of generating an overview of the A. polytricha gene expression profiles of the different developmental stages, RNA was extracted from A. polytricha dikaryotic mycelia and mature fruiting bodies. High-throughput sequencing was per-formed with the Illumina HiSeq 2000 sequencing platform. After quality filtering, 10,130,920 and 10,434,094 clean paired-end sequence reads were obtained from the mycelium and fruiting body, respectively, with the Q20 percentage over 98% (Table 1). Then, using Trinity software, a total of 20,565,014 high-quality reads (consisting of 1,804,546,499 bp) were assembled into 524,847 contigs. The contigs were further assembled into 45,830 transcripts. After clustering with BLASTx, 36,483 ESTs were assembled from the A. polytricha transcriptome, including 5,022 distinct clusters and 31,461 singletons ( Table 2). The mean length of ESTs was 531 bp and the N50 length was 636 bp ( Table 2). The contig size distribution revealed that more than half of the ESTs (32,602; 89.4%) were between 200 and 1000 bp in length ( Table 2). The GC content of A. polytricha ESTs was 61.31%, which was much higher than L. edodes (48.31%) and A. aegerita (approximately 53%) [11,14]. GC content analysis revealed that A. polytricha transcripts have a high GC content similar to the Auricularia delicata genome (58.6%).
Without a reference genome for assembly, an approximation of transcriptome coverage was made. The genome size of A. delicata, an affinis species of A. polytricha, was 75 Mb. The genome sizes of sequenced species in Agaricomycetes were ranged from 25.04 Mb (Piriformospora indica) to 100.24 Mb (Leucoagaricus gongylophorus) (NCBI, http://www.ncbi.nlm.nih.gov). Based on the hypothesis that the genome size of A. polytricha was ranged from 25 Mb to 100 Mb, approximately 1.8 Gb of transcriptome sequence data likely represented 18 to 72-fold coverage of the A. polytricha genome. The de novo assembly of lower eukaryotic transcriptomes is straightforward and more than 30-fold coverage is required for the full lengths of most yeast transcripts, whereas the de novo assembly of higher eukaryotic transcriptomes is much more challenging because of the larger sizes of the datasets and the difficulties involved in identifying alternatively spliced variants [30,31]. Furthermore, a reference-based assembly can reconstruct full-length transcripts with less than ten-fold coverage [32]. Therefore, higher coverage and genomic sequencing of A. polytricha should be conducted for a more complete assembly of the transcriptome. In spite of the limitation in coverage, a large set of A. polytricha gene fragments can provide adequate information for further gene investigation and identification of molecular markers.

Functional Annotation and Characterization of A. polytricha Transcripts
A total of 28,108 (77.04%) ESTs exhibited a significant hit with known proteins in the nr database and matched 13,273 unique protein accessions. It is possible that the ESTs matched to the same protein accessions, represented different alleles or alternative splicing events for one gene in A. polytricha. This result indicated that the Illumina paired-end sequencing project generated more than 13,273 genes of A. polytricha. Homology analysis indicated that 94.04% of annotated ESTs showed the greatest similarity to the known proteins of Aricularia, including A. delicata (26,430), A. auricula-judae (2) and A. polytricha (2). Additionally, 854 (3.04%) annotated ESTs were best matched to sequences from other species of Agaricomycetes, such as Gloeophyllum trabeum (86), Serpula lacrymans (69), Trametes versicolor (49), C. cinerea (32), S. commune (41), Laccaria bicolor (31) and so on (Table S2). Despite the limited protein information of A. polytricha in nr database, the genomic sequencing results for A. delicata provided much useful information for transcriptome annotation in this study. Interestingly, the remaining annotated ESTs (720) showed significant hits with various species belonging to other groups of fungi, plants and animals. However, 9,095 (32.96%) ESTs did not show significant similarity with any other proteins in nr database, and may represent A. polytricha-specific genes.

Functional Classification
GO is an international standardized gene functional classification system. Based on nr annotation, GO assignments were used to classify the functions of A. polytricha transcripts. Out of 28,108 nr hits, a total of 11,513 ESTs were assigned at least one GO term (Table S3). GO has three ontologies: cellular component, molecular function and biological process. WEGO software was applied to perform GO functional classification for the ESTs annotated with GO terms [33]. The 11,513 ESTs were categorized into 47 functional groups (Figure 1). Under the main category of cellular component, cell (2,811 ESTs,24.42%) and cell part (2,811 ESTs, 24.42%) represented the majorities of the category. Binding activity (6,289 ESTs, 54.62%) and catalytic activity (6,491 ESTs, 56.38%) were dominant in the category of molecular function. Among the various biological processes, cellular process (5,635 ESTs, 48.94%) and metabolic process (6,585 ESTs, 57.20%) were most highly represented (Figure 1).
In the COG database, every protein is assumed to be evolved from an ancestor protein. To further evaluate the completeness of the A. polytricha transcriptome and the effectiveness of the annotation process in this study, the annotated sequences were used to search for the genes involved in COG classifications [23]. A total of 9,705 ESTs were assigned to one or more COG Table 3. ESTs used for validation of gene expression profile in A. polytricha transcriptome data. EST (Figure 2). These results were different slightly to those obtained for L. edodes, A. aegerita, C. cinerea, L. bicolor and S. commune [11,14].
To identify the biological pathways in A. polytricha, the 28,108 annotated sequences were mapped to the reference canonical pathways in the KEGG database. Pathway-based analysis helps further elucidate the biological functions and interactions of genes. A total of 2,676 ESTs had significant matches in the KEGG database and were assigned to 239 KEGG pathways (Table S4) (Table S4). These results showed that amino acid, carbohydrate and lipid metabolisms were active in this mushroom. In addition to the genes assigned to the metabolism pathways, 1,124 ESTs were classified into genetic information processing involving transcription, translation, folding, sorting, degradation, replication and repair, and 408 ESTs were sorted into cellular processes comprising cell communication, cell growth and death, cell motility, transport and catabolism. Moreover, a total of 216 ESTs were classified into membrane transport, signal transduction, signaling molecules and interaction. The KEGG annotations provided a valuable resource for investigating specific processes, functions and pathways involved in the A. polytricha vegetative growth of the mycelium and the development of the fruiting body.
Next, ESTs encoding putative TFs were identified by sequence comparison to the known TF gene families of fungi in the Fungal Transcription Factor Database [25]. In total, 1,349 putative TFencoding ESTs, containing 924 ESTs similar to the sequences in FTFD, were identified, representing 3.70% of the A. polytricha transcriptome (Table S5, Table S6). A homology search for   Figure 3, Table S5). According to the InterPro results of A. polytricha transcriptome and A. delicata genome, the distribution of TFs among the various known protein families in A. polytricha and A. delicata were very similar (Figure 3), which differed somewhat from that of S. commune [8]. The A mating-type genes are potential homeobox TFs that contain a DNA-binding motif known as the homeodomain [34]. The matA locus of S. commune consists of two subloci: Aa and Ab. The Aa locus contains two divergently transcribed genes, which encode the Y and Z homeodomain proteins of the HD2 and HD1 classes, respectively, while Ab locus contains four genes encoding HD1 homeodomain proteins and two genes encoding HD2 [8]. Almost all of the eight A mating-type genes had low expression levels in the mycelium and fruiting body of S. Commune [8]. Although no reference genome of A. polytricha could be used to analyze the mating-type loci, some homologous A mating type ESTs were identified from the A. polytricha transcriptome for further investigations.

Analysis of Differentially Expressed Genes
The mycelium and the fruiting body are two main developmental stages of mushrooms. To identify differentially expressed genes (DEGs) between the two developmental stages of A. polytricha, clean reads from dikaryotic mycelium and mature fruiting body were mapped back onto the ESTs respectively. The expression levels for each EST were calculated using the RPKM method [27]. A total of 12,834 (35.18%) ESTs were expressed in low abundance at both developmental stages. Out of 23,649 ESTs, 2,057 ESTs were significantly differentially expressed between the mycelium and fruiting body (FDR #0.001, |log 2 -ratio| $2), including 1,020 ESTs down-regulated and 1,037 up-regulated in the fruiting body ( Figure 4A). Interestingly, 53 ESTs out of 2,057 DEGs were specifically expressed in mycelium and 377 ESTs in fruiting body. The absolute value of the log 2 -ratio ranged from 2.75 to 14.32 (Table S7).

1) Validation of the DEG results by RT-PCR
analysis. To validate expression profiles obtained by RNA-seq analysis, semi-quantitative RT-PCR were conducted to confirm the expression levels of 8 selected ESTs, with comp28675_c0 (actin) and comp26162_c0 (GAPDH) serving as the reference genes ( Figure 4B, Table 3). All of these ESTs were amplified successfully and resulted into a single band. Four ESTs exhibited higher expression level in mycelium, including comp29169_c0, comp29186_c0, comp28312_c0 and comp29269_c0, while the other three were up-regulated in fruiting body, including comp27863_c1, comp10899_c0 and comp27971_c0. Comp21959_c0 exhibited same expression level in mycelium and fruiting body ( Figure 4B). The expression patterns of these ESTs were consistent with the reads abundance of Illumina sequencing, suggesting that the DEG analysis was reliable.
2) Expression profiles of homologous TFs related to mushroom development. The expression profiles of ESTs encoding putative TFs were also investigated. Out of 1,349 putative TF encoding ESTs, 78 (5.78%) DEGs were detected when comparing the expression patterns of the mycelium and   (Table S6). Most differentially expressed ESTs between mycelium and fruiting body were classified into homeobox, C2H2 zinc finger and Zn2Cys6. Notably, all of the DEGs classified into homeobox were up-regulated in fruiting body. Whereas number of down-regulated ESTs encoding conserved protein domain Zn2Cys6 was more than four times that of up-regulated ESTs in fruiting body (Table S5). The homeobox and Zn2Cys6 transcription families might play important roles in the development of the fruiting body. TFs have been identified in some mushrooms, such as A. bisporus, C. cinerea and S. commune [8,9,35]. In S. commune, numerous TFs were found to be differentially expressed between developmental stages, suggesting that TFs were important developmental controls [8]. A set of TF genes, including c2h2, fst3, fst4, hom1, hom2, gat1 and bri1, that are up-regulated during the formation of the primordia and/ or mature mushrooms were inactive, demonstrating their important roles in the mushroom development of S. commune downstream of the mating-type loci [8,18]. These TF genes have orthologs in A. polytricha transcriptome (Table 4). According to the DEG analysis of transcriptome, the ESTs that were homologous with fst3 of S. commune, were significantly down-regulated in the fruiting body and the ESTs homologous with c2h2, fst4, hom1, hom2, gat1 and bri1 in the mycelium were similar to those of the fruiting body (Table 4).  The homologous A. polytricha ESTs of these seven TFs were selected for semi-quantitative RT-PCR to estimate the gene expression profiles in different developmental stages, including the mycelium of two monokaryotic parents App7, M2S16, and the mycelium, primordium, young fruiting body and mature fruiting body of dikaryon hybridization APM2-16 ( Figure 5, Table 4). Except the homolog of c2h2, all of the ESTs were amplified successfully and resulted into same size of bands between two monokaryotic parents. The expression levels of fst3 and hom1 homologs were the highest in the dikaryon mycelium and decreased gradually during the developmental of the fruiting body. The homolog of fst4 exhibited similar expression levels between mycelium, primordium and fruiting body. The homologous ESTs of hom2 and gat1 showed higher expression level in the young fruiting body than that in dikaryotic mycelium. That two different size bands were amplified in App7 and M2S16 using primers for homologous EST of c2h2 might represent a pair of allelic genes. The expression profiles of the candidate allelic genes were different in dikaryotic mycelium, primordioum, young and adult fruiting body. The expression level of bri1 was lower than that of the other six genes and barely expressed in mature fruiting body. The expression profiles of these seven genes in dikaryotic mycelium and fruiting body, were conformed to the transcriptome data. However, the expression profiles of these genes were different from those in S. commune, A. bisporus and L. bicolor [18,35]. This difference suggests that the developmental switches for fruiting body formation in A. polytricha and other agarics involve two distinct processes that occur after two monokaryons fuse to form dikaryons. WC-1 and WC-2, two proteins that form a photoreceptor and TF complex (WCC) and were cloned from the ascomycete Neurospora crassa, bind the promoters of light-regulated genes to activate transcription [36,37]. The dst1 and phrA genes are homologous genes of WC-1, cloned from C. cinerea and L. edodes, respectively [38,39]. Furthermore, it has been shown in L. edodes that the WC-1 homolog PHRA interacts with the WC-2 homolog PHRB in vitro [40]. Searches of the genome databases suggest that S. commune and L. bicolor each have WC-1 and WC-2 homologs [41]. The expression of WC-1 and WC-2 homologous ESTs in dikaryotic mycelium and mature fruiting body of A. polytricha was similar ( Table 4). The results from semi-quantitative RT-PCR showed that no significant differences of the expression level of WC-1 and WC-2 were existed in whole developmental processes of fruiting body, except that in the mycelium of M2S16 which may be caused by the incompatible primers for this parent ( Figure 5B). The expression profile of WC-1 homolog EST was consistent with the expression pattern of phrA but not consistent with dst1 [38,40]. However, the expression profile of WC-2 homolog EST in A. polytricha was different slightly to that of phrB of L. edodes [40]. The deduced PRIB protein of L. edodes, a TF contains a Zn2Cys6 motif and a bZIP motif, was found to be most abundantly transcribed in the primordium and in the early stages of fruiting body formation [42]. In the A. polytricha transcriptome, one homologous EST for PriB was found. The result of semi-quantitative RT-PCR showed that the expression level of A. polytricha PriB in monokaryon was lower than that in different developmental stages of dikaryon ( Figure 5B). The exp1 transcription factor is involved in cap expansion and autolysis in C. cinerea [43]. The expression level of homologous exp1 gene of A. polytricha was higher slightly in young and adult fruiting body than in mycelium and primordium ( Figure 5B). Although the appearance of fruiting bodies for A. polytricha was different to the umbrella-like fruiting body for C. cinerea. the homologous exp1 gene of A. polytricha may also play an important role during the ear-like fruiting body expansion. Another HMG protein, pcc1, plays an important role in promoting clamp cell development during the C. cinerea A-regulated pathway [44]. The homologous pcc1 gene of A. bisporus was more highly expressed in the mushroom than in the mycelium and was downregulated in the fruiting body of L. bicolor [35]. That the expression levels of the pcc1 ortholog in different development stages of A. polytricha were similar indicated that the gene function of pcc1 in different species was different. ( Figure 5B, Table 4). Briefly, no significantly different expression level were existed in the dikaryotic mycelium and fruiting body except fst3, and the expression profiles of this homologous TFs in the course of A. polytricha fruiting-body formation was not completely same to other mushrooms. These results indicated that the formation of ear-like fruiting body of A. polytricha was different to other agaricomycetous mushrooms. However, the phenotypes resulting from the knockdown or knockout of these TFs should be determined to explore their actual functions in A. polytricha mushroom development.
3) Discovery of genes related to fruiting body development by enrichment analysis. To study the functions of DEGs, functional annotation was adopted for these identified genes. GO annotation of DEGs was extracted and subjected to GO functional enrichment analysis. No DEG was found in the clusters of cell junction, nucleoid, symplast, virion, virion part, metallochaperone activity, protein binding transcription factor, receptor, immune system process, locomotion, positive regulation of biological process and reproductive process ( Figure 1). These results indicated that the functions of the ESTs sorted into the twelve above-listed groups might have nothing to do with the development of the fruiting body. The number of ESTs with GO annotations up-regulated in the fruiting body (347) was greater than the number of down-regulated ESTs (303) ( Table 5). GO functional enrichment analysis showed that many DEGs involved in ''heme binding'', ''iron ion binding'', ''electron carrier activity'', ''oxidoreductase activity'', ''oxidation-reduction process'' and ''electron transport'' (Table 5). More ESTs up-regulated in the fruiting body were involved in ''metallopeptidase activity'', ''ammonium transmembrane transporter activity'', ''serine-type peptidase activity'', ''urease activity'', ''endopeptidase activity'', ''ammonium transmembrane transport'', ''cell adhesion'' and ''urea catabolic process''. These findings suggest that the biosynthesis, metabolism and assembly of proteins were more active in fruiting body development. In contrast, the GO categories ''phosphatidylinositol phospholipase C activity'', ''hydrolase activity, acting on glycosyl bonds'', ''oligopeptide transport'' and ''diacylglycerol metabolic process'' were over-represented in the up-regulated ESTs of the mycelium (Table 5). KEGG pathway analysis showed that more DEGs participated to amino acid metabolism were existed in the metabolism group. In the category of cellular processes, more DEGs were involved in transport and catabolism. Signal transduction and translation were two subgroups containing more DEGs belong to environmental information processing and genetic information processing (Table  S4). KEGG enrichment analysis showed that the down-regulated ESTs in the fruiting body were enriched for in carbohydrate metabolism relating to ''Glyoxylate and dicarboxylate metabolism''. Large numbers of up-regulated ESTs in the fruiting body were associated with ''Glyoxylate and dicarboxylate metabolism'', ''fructose and mannose metabolism'' and ''amino sugar and nucleotide sugar metabolism'' under the category of carbohydrate metabolism ( Table 6). The enzymes of ''fructose and mannose metabolism'' and ''amino sugar and nucleotide sugar metabolism'' participate in the biosynthesis of polysaccharides. The monosaccharide compositions of A. auricula polysaccharides were as follows: glucose (72%), mannose (8%), xylose (10%) and fucose (10%) [45]. Genes involved in different pathways might have stage-dependent expression patterns in mushrooms [14]. The polysaccharide components in the mycelium and the fruiting body were different in A. polytricha. Except the ESTs with ''amino sugar and nucleotide sugar metabolism'' and ''fructose and mannose metabolism'' were highly expressed in the fruiting body, more ESTs up-regulated in the fruiting body were involved in ''tyrosine metabolism'' and ''tryptophan metabolism'' (Table 6). These results were in close accordance with the GO enrichment analysis. It indicated that the ESTs participating in these biosynthetic and degradation metabolism processes were essential to fruiting body development and that the analysis of this transcriptome was accurate and reliable.
Amino acid metabolism, including the metabolism of tryptophan and tyrosine, was more active in the fruiting body (Table 6). Furthermore, the pathway of steroid hormone biosynthesis was highly expressed in the fruiting body, whereas terpenoid backbone biosynthesis was highly expressed in the mycelium (Table 6). This difference implies that different metabolites are synthesized in the mycelium and fruiting body of A. polytricha, and the nutrients might be different between these two developmental stages. Interestingly, the up-regulated ESTs in the fruiting body were enriched in the tyrosine metabolism, in which tyrosinase (EC 1.14.18.1) is involved in the biosynthesis of melanins ( Figure 6). One out of two ESTs encoding tyrosinase (EC 1.14.18.1) were significantly up-regulated in the fruiting body. The expression levels of another EST were same in mycelium and fruiting body. Tyrosinase (EC 1.14.18.1), also often known as polyphenol oxidase (PPO) initiating an enzymatic brown reaction leading to the formation of brown or black pigments, is widely distributed in nature [46]. PPO from A. bisporus is liable to progress to enzymatic browning and pigmentation during development, post-harvest storage and pathogen infection [47,48]. In L. edodes, the brown pigmentation of mycelial cells, one of the typical light-responses, is also thought to be attributable to the accumulation of melanin [40]. Thus, the blackbrown color of the A. polytricha fruiting body could be related to the accumulation of melanins. The WC-2 homolog PHRB can bind to the promoter region of the L. edodes tyrosinase gene (Le.tyr) and these two genes are induced by light exposure, suggesting that PHRB can regulate the expression of the Le.tyr gene in a lightdependent manner [40]. Although the color of the mycelium will not become significantly darker under conditions of light compared to dark conditions, the formation and color of the A. polytricha fruiting body will be influenced by the light intensity. Therefore, it is worth further study to determine the functions of Le.phrA, Le.phrB and Le.tyr homologs during the different developmental stages of A. polytricha.

Identification of SSRs
Transcript/EST-based molecular markers are an important resource for the construction of genetic maps, comparative genomics, genetic diversity analysis, functional genetic variation determination and molecular marker-assisted selection in breeding [49,50]. For further assessment of the assembly quality as well as the development of molecular markers, all of the 36,483 ESTs generated in this study were used to mine potential microsatellites. Using the software SSR Locator, a total of 1,715 potential SSRs were detected in 1,502 (4.1%) ESTs of A. polytricha, of which 171 sequences contained more than one SSR, and 57 SSRs were present in compound formation (Table 7). On average, one SSR could be found every 11.3 kb in the transcriptome. The density of microsatellites in A. polytricha was similar to reports for other agarics, such as A. delicata, C. cinerea, L. bicolor and G. lucidum [51,52]. Among the 1,715 potential SSRs, the tri-nucleotide SSRs represented the largest fraction (42.4%), followed by hexanucleotide (30.6%) and penta-nucleotide (14.1%) SSRs. Only a small fraction of tetra-nucleotide (6.1%), di-nucleotide (5.8%) and mono-nucleotide (1.0%) SSRs were identified in this transcriptome (Table 7). Over 50% of tri-and hexa-nucleotides were found in CDSs of the G. lucidum genome, whereas more than half of mono-, di-, tetra-and penta-nucleotides were distributed in intergenic regions [52]. Such a propensity for tri-and hexa-nucleotides in the coding regions may exist to suppress the other categories of SSRs, thus reducing the incidence of frameshift mutations in coding regions caused by nontriplet repeats [53]. The analysis of GC content in motif of microsatellites showed that most of the SSRs were rich in GC except mono-and di-nucleotide (Figure 7). In trinucleotide repeats, the microsaterllites with 50%-100% GC content were 10 times than that with 0-50% GC content. It was indicated that the GC content was more abundant in SSRs of the A. polytricha transcriptome. The identification of SSRs provides a very cost-effective option for the development of functional markers for various genetic studies.

Conclusions
To the best of our knowledge, this study is the first to analyze the difference between the mycelium and the fruiting body of A. polytricha by de novo transcriptome sequencing. Functional annotation and enrichment analysis of DEGs revealed that more ESTs up-regulated in the fruiting body stage were involved in tyrosine and tryptophan metabolism, fructose and mannose metabolism, amino sugar and nucleotide sugar metabolism, steroid hormone biosynthesis. Furthermore, many putative TF ESTs were identified by sequence comparison to known TF gene families of fungi. The expression profiles of some orthologs of TFs related to fruiting body formation in other agarics were investigated. The expression patterns of these TFs were different slightly to other mushrooms. Whether these genes play important role in the course of fruiting body formation should be further analyzed and verified, and extensive studies must be performed to understand the development of A. polytricha. However, numerous SSRs were predicted and can be used for subsequent marker development employed in strain typing, population genetics, phylogenetics, genetic linkage and QTL analysis. As such, the current study provides the largest number of transcripts to date and lays the initial groundwork for developmental and genetic research on A. polytricha.