High-Throughput Sequencing and Characterization of the Small RNA Transcriptome Reveal Features of Novel and Conserved MicroRNAs in Panax ginseng

microRNAs (miRNAs) play vital regulatory roles in many organisms through direct cleavage of transcripts, translational repression, or chromatin modification. Identification of miRNAs has been carried out in various plant species. However, no information is available for miRNAs from Panax ginseng, an economically significant medicinal plant species. Using the next generation high-throughput sequencing technology, we obtained 13,326,328 small RNA reads from the roots, stems, leaves and flowers of P. ginseng. Analysis of these small RNAs revealed the existence of a large, diverse and highly complicated small RNA population in P. ginseng. We identified 73 conserved miRNAs, which could be grouped into 33 families, and 28 non-conserved ones belonging to 9 families. Characterization of P. ginseng miRNA precursors revealed many features, such as production of two miRNAs from distinct regions of a precursor, clusters of two precursors in a transcript, and generation of miRNAs from both sense and antisense transcripts. It suggests the complexity of miRNA production in P. gingseng. Using a computational approach, we predicted for the conserved and non-conserved miRNA families 99 and 31 target genes, respectively, of which eight were experimentally validated. Among all predicted targets, only about 20% are conserved among various plant species, whereas the others appear to be non-conserved, indicating the diversity of miRNA functions. Consistently, many miRNAs exhibited tissue-specific expression patterns. Moreover, we identified five dehydration- and ten heat-responsive miRNAs and found the existence of a crosstalk among some of the stress-responsive miRNAs. Our results provide the first clue to the elucidation of miRNA functions in P. ginseng.


Introduction
Panax ginseng C.A. Meyer, belonging to the genus Panax of the family Araliaceae, is an economically significant medicinal plant species and has attracted a lot of attention for the phytomedicinal use since ancient times [1]. Evidence from laboratory and clinical trials showed that the roots of P. ginseng could play significant roles in immune system modulation, anti-stress, anti-cancer and antidiabetes in human beings [2][3][4][5][6]. The major bioactive ingredients of ginseng roots are triterpene saponins, which are known as ginsenosides. To date, more than 30 ginsenosides have been identified [7]. Among them, Rb1, Rb2, Rc, Rd, Re, Rf, Rg1 and Rg2 are most relevant to the pharmacological effects of ginseng roots [7][8][9]. In addition to ginsenosides, ginseng roots also contain some other bioactive compounds, such as polysaccharides, phenolics, and flavonoids [10][11][12].
P. ginseng is a part shade, low-growing perennial plant native to Asian [7]. It is a tetraploid plant species (2n = 4x = 48) with a large genome. Although the whole genome of P. ginseng has not been decoded, the whole chloroplast genome and a large number of expressed sequence tags (ESTs) have been sequenced [13][14][15][16][17]. The cDNA and gene sequences currently available includes 31,741 unique sequences assembled from 217,519 high quality 454 sequencing reads [13], 11,412 ESTs obtained from normal cDNA library, and about 563 other cDNA and gene sequences (http://www.ncbi.nlm.nih.gov/). The genetic transformation system of P. ginseng has been developed and several ginsenoside biosynthesis-related genes have been characterized through this system. For example, overexpression of the squalene synthase gene PgSS1 in adventitious roots of transgenic P. ginseng resulted in a considerable increase of phytosterol and triterpene saponin (ginsenoside) content [4]. RNA interference (RNAi) of the squalene epoxidase gene PgSQE1 and the dammarenediol synthase gene DDS in transgenic P. ginseng led to the reduction of ginsenoside production [18,19]. It provides great information and tools for studying gene functions and regulatory networks in P. ginseng. microRNAs (miRNAs) are a class of small endogenous noncoding RNAs. They originate from long primary transcripts (pri-miRNAs), which have stem-loop structures [20]. In plants, miRNAs regulate target genes either through direct cleavage of transcripts [21,22] or translational repression at post-transcriptional level [23], or, in some case, by methylation at transcriptional level [24]. It has been shown that miRNAs play important regulatory roles in plant development and stress responses [25][26][27][28][29][30]. For example, miR172 regulates the development of flowers by targeting APETALA2 (AP2) and its homologs with two tandem AP2 DNA-binding domains [31][32][33]. miR160 is involved in root development through the regulation of auxin response factors (ARFs) [32][33][34]. An increasing number of stress-responsive miRNAs are being identified [28,30,35].
Since the first discovery of plant miRNAs from Arabidopsis, plant miRNAs have been intensely studied using experimental and computational approaches [36,37]. The number of miRNAs identified has reached to 4011 according to miRBase release 18 [38]. However, there is no information available for miRNAs from P. ginseng. In this study, we constructed a small RNA library of P. ginseng and carried out high-throughput Illumina sequencing. We identified a total of 101 conserved and non-conserved miRNAs from the small RNA dataset. Computational analysis of miRNAs and their precursor sequences revealed some features of P. ginseng miRNAs. Many non-conserved P. ginseng miRNAs showed tissue-specific expression and some of them responded to dehydration and heat treatments. Consistently, many predicted targets of non-conserved P. ginseng miRNAs are associated with metabolism, signal transduction and stress responses.

Features of P. ginseng Small RNA Population
In order to identify small RNAs in P. ginseng, we generated a small RNA library using pooled RNA isolated from roots, stems, leaves and flowers. The library was then sequenced by the highthroughput Illumina sequencing technology [39]. A total of 13,326,328 raw sequence reads were obtained. After removing adaptors, low quality sequences and those smaller than 18 nt, 12,000,591 clean reads with sizes ranged from 18 to 30 nt were collected (Table S1). They were represented by 1,502,664 unique small RNA sequences, of which 114 were sequenced more than 10,000 times, whereas 1,178,394 (78.4%) and 152,436 (10.1%) were sequenced only one and two times, respectively. It indicates that a large, diverse and highly complicated small RNA population exists in P. ginseng. Additionally, 30,240 small RNAs, representing 1,283,459 reads, could be mapped to the chloroplast genome of P. ginseng using the SOAP2 program with no mismatches allowed [16,40], indicating some small RNAs are probably originated from the chloroplast genome.
Among 12,000,591 clean reads, the majority (71.22%) range from 20 to 24 nt in length, wherein the 21 (20.24%) and 22 (18.03%) nt ones are two major size groups ( Figure 1A). The group of 24 nt clean reads also accounts for a relatively high proportion (13.54%), whereas it is significantly less compared with those identified from Arabidopsis [41], Medicago truncatula [42], Brachypodium distachyon [43], Arachis hypogaea [44], Citrus trifoliate [45], Gossypium hirsutum [46], Oryza sativa [47] and Zea may [48], where the 24 nt group is the most abundant. Therefore, we analyzed the pattern of low molecular weight (LMW) RNA from the pooled total RNA using gel electrophoresis as previously described [49]. Unexpectedly, the results of electrophoresis show that the most abundant group of small RNAs is 24 nt ( Figure 1B). The inconsistency of results from electrophoresis and high-throughput sequencing was also observed in grape, in which a nearly equal number of 21 nt and 24 nt small RNAs was detected by electrophoresis, whereas the number of 21 nt sequence reads are more than five times of the 24 nt reads [50]. The factors causing the inconsistency remain to be elucidated.
Size distribution analysis of the 1,502,664 unique small RNA sequences showed that the group of 24 nt small RNAs was the biggest, which accounted for 38.24% of total unique sequences ( Figure 1C). It suggests that 24 nt small RNAs are the most diverse in P. ginseng, although their average of abundance is less compared with those in many other plant species. The groups of 21 and 22 nt have the highest percentage of small RNAs beginning with a U nucleotide, whereas the 24 nt group enriches with sequences having an A at the 59 end ( Figure 1D). It indicates a great proportion of miRNAs and trans-acting small interfering RNAs (ta-siRNAs) in the 21 and 22 nt groups and significant amount of repeat-associated siRNAs (ra-siRNAs) and natural antisense transcript-derived siRNAs (nat-siRNAs) in the 24 nt group, since miRNAs and ta-siRNAs tend to begin with a U and ra-siRNAs and nat-siRNAs prefer an A at the 59 end [51][52][53][54][55]. It is consistent with features of small RNAs from other plant species, such as Arabidopsis and Populus [54,56]. The diversity of 24 nt ginseng small RNAs may indicate the complexity of P. ginseng genome, although the whole genome has not been decoded.

Identification of Conserved miRNAs
Using the SOAP2 program, we mapped 1727 of 1,502,664 unique small RNA sequences to known miRNA precursors with no more than 2 mismatches [38,40]. Among them, 661 and 41 unique sequences, representing 305,744 and 406 reads, were mapped to mature miRNA and miRNA* regions, respectively, whereas the other 1025 sequences, representing 126,128 reads, were mapped to other regions of known miRNA precursors. Further alignment of the small RNAs mapped to miRNA regions of known precursors allowed us to group the 661 sequences into 73 miRNA candidates, members of 33 conserved miRNA families (Table S2). Similarly, the 41 unique sequences mapped to miRNA* regions were grouped into 22 miRNAs* putatively produced from 9 gene families, including MIR156, MIR166, MIR167, MIR171, MIR172, MIR390, MIR396, MIR482, and MIR4376 (Table S3).
It has been suggested that the expression profiles of miRNA genes can be estimated by high throughput sequencing-based technology [73,74]. Among the 33 conserved miRNA families, MIR166 was sequenced 228,538 times, which accounted for 74.8% of total conserved miRNA reads, suggesting it is the most abundant miRNA family in the P. ginseng tissues analyzed ( Figure 2). The second abundant miRNA family is MIR159, which was sequenced 61,264 times and accounted for 20.0% of total conserved miRNA reads. The other conserved miRNA families are less abundant and each has less than 1.5% of total conserved miRNA reads. Among them, 15 have less than 10 reads in our small RNA data set (Table S2). Compared with other plants, P. ginseng has significantly different expression profiles for some miRNAs. For example, MIR156 is highly expressed in peanut [44], whereas its expression in P. ginseng is low ( Figure 2). These results suggest the differential expression of miRNAs in P. ginseng and indicate the specificity of miRNA expression in different plant species.
We mapped the 73 conserved miRNAs to unigenes of P. ginseng using the SOAP2 program and then analyzed secondary structures of the unigenes using the mfold program [40,75]. As a result, four miRNA precursor sequences were identified. Two of them are precursors of MIR482. The other two generate miR2118 and miR4376 ( Figure S1). Further mapping of the small RNAs to miRNA precursors allowed the identification of three additional miRNA* sequences: one for miR482a, one for miR4376, and the other for both miR482b and miR2118 (Table S3). No precursor sequences for other conserved miRNAs were identified. It could be due to the limited number of P. ginseng unigenes.

Identification of Non-conserved miRNAs
In order to identify non-conserved miRNAs as much as possible, we used two computational approaches. First, we mapped small RNAs to 31,371 unigenes of P. ginseng using the SOAP2 program with no mismatches allowed [40] and analyzed the secondary structures of unigenes with small RNA mapped using the mfold program [75]. Next, we used MIREAP software (http://sourceforge.net/projects/mireap/) to predict precursors from P. ginseng unigenes. In total, we identified 25 miRNA precursors. These precursors produced a total of 28 mature miRNAs. We then submitted the precursors and the corresponding mature miRNAs to miRBase for official names [38]]. Based on sequence similarities, the 25 miRNA precursors were grouped into 9 miRNA gene families, although some members of a family generated distinct mature miRNA sequences (Table 1, Figure S1). Except MIR6135 and MIR6140 families, which have 11 and 4 members, respectively, the other 7 non-conserved miRNA gene families have only one or two members. The results showing few members for non-conserved miRNA families have been previously found in other plant species, such as Arabidopsis [76]. In addition to miRNAs, we identified a miRNA* sequence for the non-conserved miRNA, miR6136b, which verified computational prediction of non-conserved miRNAs (Table 1).

Special Features of miRNA Genes in P. ginseng
The precursors of miR482a and miR2118 cluster in a unigene, PUT-183a-Panax_ginseng-17125 ( Figure 3A). Existence of two precursors is verified by the finding of miR482a* and miR2118* in our small RNA data set. Clustering of miRNAs is a universal phenomenon in animal [77][78][79]. It has also been observed for some plant miRNAs, such as some members of the MIR156 family in wheat and tobacco [80,81], some members of the MIR166 family in Arabidopsis, rice, wheat and soybean [21,81,82], moss miR1219a and miR1219b [83], some member of the MIR395 family in Arabidopsis and wheat [41,81], switchgrass miR2118a and miR2118b [84], loblolly pine miR950a and miR950b [85], and Pinus pinaster miR1314a and miR1314b [73]. Both miR482a and miR2118 are located in the 39 arm of hairpin structures and are unidirectional in the unigene ( Figure 3A). The presence of two miRNA precursors in a unigene indicates that the generated miRNAs are co-regulated and are probably involved in the network of a cellular process [77][78][79]. Consis-tently, miR482a and miR2118 show a high sequence similarity ( Figure 3B), which enable them to regulate same targets and/or different members of a gene family in P. ginseng.

Conserved and Non-conserved Targets of Conserved miRNAs
Plant miRNAs show perfect or near-perfect complementarities to their targets [93,94]. It allows an effective prediction of miRNA targets through computation. To predict the targets of miRNAs in P. ginseng, we used psRNATarget, a useful web server, which searches potential targets using an improved iterative parallel Smith-Waterman algorithm and a weighted scoring schema and calculates penalty scores for mismatched patterns in the miR-NA:mRNA duplexes within a 20-base sequence window [95,96]. With the application of penalty score cutoff threshold 0-3, we identified a total of 99 unigenes from 31,371 assembled P. ginseng unigenes to be targets of 28 conserved miRNA families (Table S4). The number of targets for each miRNA family ranges from one to eight. Each unigene is predicted to be the target of a conserved miRNA except the TIR-NBS disease resistance protein gene, FW1NBNE01A3PPX, which is targeted by pgi-miR482 and pgi-miR1510. No targets were found for MIR160, MIR168, MIR319, MIR894, and MIR1024 in the unigene set. It could be due to the limited number of P. ginseng unigenes or the low expression levels of their targets in the P. ginseng materials used for EST library construction.
Twenty six of the 99 predicted targets are conserved among various plant species, suggesting the conserved regulatory roles of some miRNAs in plants (Table S4). Most of the conserved targets are transcription factor genes, including six squamosa promoterbinding-like protein genes (SPLs) regulated by miR156, four miR167-targeted auxin response factor genes (ARFs), three GRAS family transcription factor genes regulated by miR171, two homeobox-leucine zipper protein genes cleaved by miR166, and each of the NAC-domain protein gene, CCAAT-box binding factor gene, AP2 domain-containing protein gene and growthregulating factor gene targeted by miR164, miR169, miR172, and miR396, respectively. In addition to transcription factor genes, the other seven conserved targets include four F-box family protein genes regulated by miR393 or miR394, an ATP sulfurylase gene cleaved by miR395, a laccase gene targeted by miR397, and a miR2118-regulated TIR-NBS-LRR resistance protein gene. Using the modified 59 RLM-RACE method, we experimentally validated eight conserved targets of six miRNA gene families in P. ginseng, confirming the functional conservation of some miRNAs ( Figure 4, Table S4).
Consistent with previous reports showing species-specific regulatory roles of many conserved miRNAs in various plant species, such as Populus trichocarpa, Pinus taeda and Digitalis purpurea [28,35,85,97], seventy three of the 99 predicted miRNA targets are not conserved in other plants (Table S4). Among them, sixty could be important in plant development and defense responses, whereas the other thirteen are function-unknown. Using the modified 59 RLM-RACE method, we experimentally validated contig2586, which encodes the Ca 2+ -transporting ATPase, to be a target of MIR4376. It suggests that conserved miRNAs may play species-specific functions by the cleavage of non-conserved targets.

Targets of Non-conserved miRNAs
Using the computational approach, we predicted 31 targets for seven of nine non-conserved miRNA gene families ( Table 2). Many of the predicted targets are associated with metabolism, signal transduction and stress responses, and about one-third of the targets encode proteins with unknown functions. These results are consistent with our previous findings from P. trichocarpa and confirm that a significant fraction of non-conserved miRNAs are part of the regulatory networks associated with specific growth conditions or developmental processes [35]. Interestingly, nonconserved miR6139 and miR1439 conserved between rice and P. ginseng were predicted to target the ZIP metal ion transporter gene (PUT-183a-Panax_ginseng-20083) for cleavage ( Table 2 and  Table S4), suggesting both miR6139 and miR1439 are probably involved in the homeostasis of metal ion and play a role in plant response to salt stress [98,99]. Genes to be targeted by more than one miRNAs have been previously observed in other plant species [28]. For instance, five disease resistance protein genes were found to be targets of miR472, miR482 and/or miR1448, and fourteen pentatricopeptide repeat-containing protein (PPR) genes were regulated by both miR475 and miR476 in P. trichocarpa [28]. Because of the capability in cleaving the same target gene, miR6139 and miR1439 are probably members of a regulatory network. Tissue-specific Expression of miRNAs in P. ginseng In order to further predict the functions of miRNAs in P. ginseng, we analyzed the expression of mature miRNAs with known precursor sequences, including conserved miR482a/b, miR2118 and miR4376 and all of non-conserved miRNAs, in leaves, stems, flowers, roots, and embryogenic calli using the poly(A) miRNAbased qRT-PCR method as described [28,35,100,101]. This method verifies the specificity of PCR products by single peak of dissociation curves and can readily discriminate the expression of mature miRNAs having as few as only one nucleotide difference [101]. As a result, the majority of analyzed mature miRNAs were specifically PCR-amplified and many of them showed tissuespecific expression patterns ( Figure 5). For example, miR6135i, miR6143a and miR6143b-5p exhibited the highest expression in embryogenic calli, indicating their potential roles in embryogenic callus development and/or maintenance. On the contrary, miR4376, miR6135e,2/j, miR6136a.1, miR6136a.2, miR6136b, miR6140a, miR6140c, miR6140d and miR6142 showed low expression in embryogenic calli, whereas their expression was relatively high in stems and/or flowers. Differential expression was also found for many other miRNAs analyzed ( Figure 5). It indicates tissue-specific regulation of miRNA expression in P. ginseng. miR482a/b, miR2118 and miR6139 showed very similar expression patterns with the highest in stems, followed by flowers and roots ( Figure 5). Among them, miR482a and miR2118 are generated from two precursors clustering in the same unigene PUT-183a-Panax_ginseng-17125 ( Figure 3A). Similar expression patterns were previously observed for other clustered miRNAs that regulated same genes or different transcripts encoding functionally related proteins [77,102,103]. Consistently, both miR482 and miR2118 were predicted to target genes encoding disease resistance proteins (Table S4). It indicates that miR482a and miR2118 are involved in relevant cellular processes in P. ginseng. miR6135i and miR6135e.2/j are members of a miRNA family. However, their expression patterns are distinct. miR6135i showed the highest expression in embryogenic calli, the lowest expression in roots, and moderate expression in stems and flowers, whereas miR6135e.2/j exhibited the highest expression in flowers, followed by stems, leaves and roots, and showed the lowest expression in embryogenic calli ( Figure 5). It suggests that miR6135i and miR6135e.2/j could be functionally different in P. ginseng, although they belong to the same gene family. Interestingly, the expression patterns were also different between miR6136a.1 and miR6136a.2, two miRNAs generated from distinct regions of a precursor, FW1NBNE01BEGNB ( Figure 5). Similarly, distinct expression patterns were observed for miR6143b-5p and miR6143b-3p produced from the unigene PUT-183a-Panax_ginseng-16660 ( Figure 5). These results indicate that the level of miRNAs is not only associated with the transcription of primary miRNAs but also related to the maturation process and/or degradation rate of mature miRNAs. The expression pattern of miR6136a.2 is similar to miR6136b, a miRNA generated from the antisense transcript of FW1NBNE01BEGNB ( Figure 5). It implies the existence of functional relationship between miR6136a.2 and miR6136b.

Responses of Non-conserved miRNAs to Abiotic Stresses
In vitro cultured embryogenic callus has been proved to be a great model for studying the role of miRNAs in meristem development, embryogenesis and abiotic stress responses in various plant species, such as rice [104,105], Japanese Larch [106], and sweet orange [107]. To test whether non-conserved miRNAs are involved in plant response to environmental stresses, we analyzed the expression patterns of non-conserved miRNAs in embryogenic calli treated with dehydration and heat, two significant stressors experienced during the growth and development of P. ginseng, using the miRNA-specific ploy(T) adaptor RT-PCR method [101]. As a result, we identified five dehydration-and ten heatresponsive miRNAs, which showed more than two-fold changes between at least two time-points treated with dehydration and heat, respectively (Figures 6 and 7A). Among them, miR6135e.2/j, miR6135i, miR6138, miR6140a and miR6143b-3p responded to heat treatment only, whereas the other five, including miR6136b, miR6135k, miR6139, miR6140d, and miR6141, responded to dehydration and heat. It suggests the existence of a crosstalk among some non-conserved miRNAs in response to dehydration and heat stresses.
Due to low expression levels of non-conserved miRNAs, we used the other precise and sensitive miRNA-specific stem-loop RT-PCR method to further verify the expression patterns of ten heat-responsive miRNAs [108,109]. The results suggest that the expression patterns of miRNAs from stem-loop RT-PCR and poly(T) adaptor RT-PCR are largely consistent ( Figure 7B), validating the ploy(T) adaptor RT-PCR method in analysis of miRNA expression. Moreover, we cloned and sequenced some PCR amplicons. All of the miRNAs analyzed were found to be specifically amplified and had integral 39 ends (Table S5). These results verify the specificity of PCR-amplification and suggest the integrity of miRNAs in the samples analyzed.
Among the five dehydration-responsive miRNAs, miR6315k was down-regulated after stress. The expression of miR6136b, miR6139, miR6140d and miR6141 was fluctuated ( Figure 6). Similarly, different expression patterns were observed for ten heatresponsive miRNAs. miR6135i and miR6136b were induced after heat treatment for 6 h. miR6135e.2/j was highly up-regulated at 12 h after stress. However, the expression of miR6135k, miR6138, miR6139 and miR6141 was down-regulated after treatment. The expression of miR6140a, miR6140d and miR6143-3p was fluctuated in response to heat stress (Figure 7). These results imply that the underlying biological functions vary among stressresponsive miRNAs.

Analysis of Other Small RNAs
In order to characterize other small RNAs, we mapped the remaining 1,502,563 unique small RNA sequences to 31,371 P. ginseng EST sequences using the SOAP2 program with no mismatches allowed and then searched for EST sequences with small RNA mapped [40]. Among the 31,371 ESTs, 14,743 have corresponding small RNAs. 77.65% of 14,743 generated less than five small RNAs. About 2% (294 ESTs) produced more than 50 unique small RNAs. It includes 18 with more than 400 unique small RNAs mapped. Among them, eleven produced from rRNAs, three from chloroplast genome, and one from mitochondrion DNA. The other three were predicted to encode ginsenoside biosynthesis-related proteins [110]. It suggests the complexity of small RNAs in P. ginseng. Full-length cDNA cloning and transgenic analysis may help to further characterize the small RNAs.
The availability of whole genome sequence of P. ginseng chloroplast enabled us to further characterize the small RNAs derived from chloroplast [16]. Blast analysis of 1,502,563 unique small RNA sequences against the chloroplast genome showed that 45,971 small RNAs, representing 2,520,635 clean reads, generated hits over the chloroplast genome. Size distribution of clean reads revealed the 22 nt group, which represents 49.69% of the 2,520,635 reads, to be the biggest size group of chloroplast-derived  Figure 8A). The reads of the 24 nt group account for 2.11% of the 2,520,635 reads, which is significantly less compared with the percentage of the 24 nt group in the total 12,000,591 clean reads ( Figure 1A). Size distribution analysis of the 45,971 unique small RNA sequences showed the 21 nt group and the 22 nt group, which account for 20.33% and 17.32% of the total unique small RNA sequences, respectively, to be two major groups ( Figure 8B). Further examination of the chloroplast-derived small RNAs revealed that they were mapped to both sense and antisense orientation in the P. ginseng chloroplast genome ( Figure 8C). There are two clusters of small RNAs abundant in the loci annotated as chloroplast rRNAs, which are located in the inverted repeats (IRa and IRb) of chloroplast genome [16]. These results suggest the diversity of chloroplast-derived small RNAs in P. ginseng. Chloroplast-derived small RNAs have been previously identified in various species, such as tobacco [111] and Cucumis [112]. In consistence with our results, two clusters of small RNAs derived from chloroplast rRNAs were also observed in melon, suggesting chloroplast small RNAs may be highly conserved.

Discussion
The estimated genome of P. ginseng is about 3.12610 3 million letters of genetic code. It is 21.5 and 7.43 times of the genome of Arabidopsis and rice, respectively, indicating that the P. ginseng genome could be much more complicated than many other plant genomes [113,114]. Consistently, we found in this study that a large, diverse and complex small RNA population existed in P. ginseng. Moreover, the population of P. ginseng small RNAs identified is far from exhausted because 88.5% of the total 1,502,664 unique small RNA sequences were sequenced only once or twice. Analysis of the size distribution of P. ginseng small RNAs revealed that the 24 nt group, which mainly contains repeatassociated siRNAs (ra-siRNAs) and transposable element-derived siRNAs (TE-siRNAs), is the most diverse (Figure 1), indicating the existence of a large number of repetitive DNA and transposons in the P. ginseng genome [115][116][117]. Many repetitive elements are located in the heterochromatic regions including centromeres and telomeres. ra-siRNAs generated from heterochromatic regions play significant roles in developing or/and maintaining the repressed chromatin state through DNA methylation and heterochromatic histone modification [55,[118][119][120]. It suggests that the existence of a complex and diverse small RNA population is particularly important to maintain the growth and development of plants, such as P. ginseng, which have a large genome.
From the P. ginseng small RNA data set, we identified a total of 73 conserved and 28 non-conserved miRNAs, which belong to 33 and 9 miRNA families, respectively. However, only four precursors of conserved miRNAs were identified from total 31,371 unigenes, suggesting the EST sequences available are far from sufficient for the identification of miRNA precursors. On the other hand, we were able to identify 25 non-conserved miRNA  precursors from the same unigene set. It indicates that the number of non-conserved miRNAs is probably much larger than that of conserved miRNAs in P. ginseng. Among four precursors for conserved miRNAs, two (MIR482a and MIR2118) cluster in a transcript (Figure 3). Both miR482 and miR2118 were predicted to regulate disease resistance proteins (Table S4). Consistently, the expression patterns of miR482 and miR2118 are similar in the tissues analyzed. It suggests that miR482 and miR2118 are involved in defense responses in P. ginseng through relevant regulatory networks. Among the non-conserved miRNAs, miR6136a.1 and miR6136a.2 are generated from the same precursor FW1NBNE01BEGNB, whereas its antisense transcript produces miR6136b. miR6136a.1 showed the highest expression in flowers, followed by leaves, whereas miR6136a.2 and miR6136b were mainly expressed in stems ( Figure 5). The expression of miR6136b was altered in embryogenic calli treated with dehydration and heat, whereas both miR6136a.1 and miR6136a.2 appeared to be not responsive to the stresses (Figures 6 and 7). These results suggest the complexity of miRNA regulatory networks in P. ginseng.
Environmental factors are critical to plant growth and development. Increasing evidence indicates that miRNAs are involved in plant response to various environmental stresses, such as drought, salinity, cold, nutrient starvation, oxidative stress, mechanical stress and UV-B radiation [28,30,35,100,[121][122][123][124][125][126][127][128][129][130][131][132]. In this work, we examined the expression profiles of non-conserved miRNAs under dehydration and heat treatments and identified 5 dehydration-and 10 heat-responsive P. ginseng miRNAs, which appeared to be an incomplete set of dehydration and heatresponsive miRNAs in P. ginseng, considering that the small RNAs analyzed in this study were isolated from plants growing under normal conditions. More stress-responsive miRNAs could be identified if plant tissues treated with dehydration and heat were used. Additionally, we found that a crosstalk could exist among some miRNAs in ginseng response to dehydration and heat stresses. However, the underlying mechanism remains to be elucidated.

Plant Materials
P. ginseng C.A. Meyer cv. Shizhu was grown in a field nursery at Kuandian, Liaoning province, China. Leaves, stems, flowers and roots were collected from five-year-old plants and frozen in liquid nitrogen for total RNA extraction and small RNA library construction. No specific permits were required for the described field studies. The location of the field nursery is not privately-owned or protected in any way and the field studies did not involve endangered or protected species. Embryogenic callus was induced as described [133]. Briefly, ginseng seeds were harvested and then stratified for about 6 months. Pericarp was removed before sterilization in 70% ethanol for 1 min and 1% sodium hypochlorite for 20 min. After washing with sterile water, cotyledon explants were dissected out from seeds and cultured on MS media containing 1.0 mg.L -1 2,4-D and 3% sucrose in the dark for one month. Explants with embryogenic calli were then cultured at 2462uC under a 16/8-h (light/dark) photoperiod with light supplied by white fluorescent tubes at an intensity of 24 mmol.m -2 .s -1 . Embryogenic calli were maintained by regular 3-week subcultures.

Stress Treatments
For dehydration treatment, embryogenic calli were transferred to a 9 cm Petri dish with a dry sterile filter paper. The Petri dish was sealed with Parafilm and kept in a clean bench for 0, 1, 3, 6, 12 and 24 h at 24uC. For heat treatment, the calli on a callus induction medium were incubated at 37uC in the dark for 0, 1, 3, 6, 12 and 24 h. After each treatment, the calli were collected and blotted briefly with a piece of dry filter paper. Tissues were immediately stored in liquid nitrogen until use.

Small RNA Library Construction and High-throughput Sequencing
Total RNA was extracted from leaves, stems, flowers and roots of P. ginseng using the TRIzol Reagent (Invitrogen, USA). Equal amounts (5 mg) of total RNA from each tissue were pooled. The quality and quantity of total RNA were analyzed using Agilent 2100 ( Figure S2). Small RNA library was prepared as described [134]. Briefly, 10-30 nt small RNAs were purified from a 15% denaturing polyacrylamide gel and then ligated with the 59 and 39 adapters. After being reverse-transcribed by Superscript II reverse transcriptase (Invitrogen, USA), small RNAs were amplified by PCR. High-throughput sequencing of small RNAs was performed using an Illumina G1 Genome Analyzer at BGI-Shenzhen, China.

Electrophoretic Analysis of Small RNAs
Low molecular weight (LMW) RNA from 200 mg pooled total RNA was resolved on a 15% denaturing polyacrylamide gel and stained with ethidium bromide as described by Lu et al. [49]. The pictures were taken with the Bio-Rad Universal Hood II Gel Imager.

Data Sets
The 20621 PlantGDB-assembled unique transcripts (PUTs) of P. ginseng were downloaded from PlantGDB (http://www. plantgdb.org, April 25th, 2011) and further assembled with singletons identified previously by EGassembler using the default parameters (http://egassembler.hgc.jp/) [13,135]. The resulting 31371 unigenes were then used for small RNA mapping. Known mature miRNAs and miRNA precursors were downloaded from miRBase for the analysis of conserved miRNAs in P. ginseng [38].

Computational Identification of Small RNAs
Adapters were removed and low-quantity sequences were filtered from raw small RNA sequence data using the PHRED and CROSS MATCH programs [136]. The 18-30 nt clean small RNAs were mapped to known miRNA precursors using the SOAP2 program with no more than 2 mismatches allowed [40]. Small RNAs with matched precursors were then checked for mature miRNAs and miRNA* manually. For identification of non-conserved miRNAs, two approaches were used. First, small RNAs were mapped to 31371 unigenes of P. ginseng using the SOAP2 program with no mismatches allowed [40]. Unigenes with small RNA mapped were analyzed for secondary structures using the mfold program [75]. Second, small RNAs were mapped to P. ginseng unigenes and miRNAs were predicted using the MIREAP software (http://sourceforge.net/projects/mireap/) with the following parameters applied. It includes minimal miRNA length (18), maximal miRNA length (26), minimal miRNA (reference) length (20), maximal miRNA reference length (24), uniquencess of miRNA (280), maximal energy (-30), minimal space (5), maximal space (450), minimal mature pair (16), maximal mature bulge (3), maximal duplex asymmetry (4), and flank sequence length (20). The resulting hairpin structures from both approaches were then manually checked. The criteria applied to manual miRNA annotation includes (1) at least 2 reads for miRNAs, (2) a single-stranded, hairpin precursor with the free energy (dG) value less than -30, (3) four or fewer mismatched miRNA bases suggested by Meyers et al (2008) [137], (4) no more than two asymmetric bulges within the miRNA/miRNA* duplex, and (5) at least 0.85 of the minimal folding free energy index (MFEI) calculated as described [138].

Computational Prediction, Annotation and Experimental
Validation of miRNA Targets miRNA targets were predicted from the assembled P. ginseng unigene set by the psRNATarget server using default parameters [95]. Unigenes were annotated by blastx analysis against the Nr protein database using default parameters (http://www.ncbi.nlm. nih.gov/). The transcriptional direction of unigenes was determined based on the direction of predicted proteins. Unigenes without a hit in the Nr protein database were not considered as miRNA targets because the transcriptional direction was unknown without further experiments. Experimental validation of miRNA targets were carried out on mRNAs isolated from leaves or pooled tissues of leaves, stems, flowers and roots of P. ginseng using the modified 59 RNA ligase-mediated (RLM)-RACE method as described previously [35]. The nesting and nested primers used for PCR were listed in Table S6.

Quantitative Real-time Reverse Transcription-PCR (qRT-PCR) Analysis of miRNAs
Total RNA was isolated from leaves, stems, flowers, roots and calli using the TRIzol Reagent (Invitrogen, USA) and digested with RNase-free DNase I (Promega, USA) to remove genomic DNA. Two qRT-PCR methods, including the poly(T) adaptor RT-PCR method [101] and the stem-loop RT-PCR method [108,109], were used for miRNA quantification. The poly(T) adaptor RT-PCR method used about 1 mg DNase-treated total RNA as the starting material, which was polyadenylated using the Poly(A) Tailing kit (Ambion, USA) as described previously [101] and then reverse-transcribed into single-strand cDNA in a 20 ml reaction mix with SuperScript III Reverse Transcriptase (Invitrogen, USA) and 0.5 mg Poly(T) adapter (59-GCGAGCA-CAGAATTAATACGACTCACTATAGG(T) 12 VN-39, Ambion, USA). qRT-PCRs were performed in triplicates using the Bio-Rad CFX96 Real-Time PCR System C1000 Thermal Cycler (Bio-Rad, USA). Each PCR reaction was carried out in a final volume of 20 ml containing 10 ml 26SYBR Premix Ex Taq II (TaKaRa, Japan), 0.25 mM each of the miRNA specific forward primer and the universal reverse primer (Table S7), and the cDNA reverse-transcribed from 100 pg total RNA using the following conditions: 95uC for 20 sec, 40 cycles of 95uC for 15 sec, 60uC for 15 sec and 72uC for 15 sec. 5.8 S rRNA was used as an endogenous reference as described previously [28,35,100,101]. Standard deviations were calculated from three PCR replicates. The specificity of PCR products was verified based on single peak of dissociation curves. The relative abundance of miRNAs was determined using the 2 -DDCq method, where Cq represents the threshold cycle [139].
For the stem-loop RT-PCR method, about 100 ng DNA-free total RNA was hybridized with a miRNA-specific stem-loop RT primer. The hybridized miRNA molecules were then reversetranscribed into cDNA as described [108,109]. cDNA was diluted and then PCR-amplified in triplicates using the Bio-Rad CFX96 Real-Time PCR System C1000 Thermal Cycler (Bio-Rad, USA). Each PCR reaction was performed in a final volume of 20 ml containing 10 ml 26SYBR Premix Ex Taq II (TaKaRa, Japan), 0.25 mM each of the miRNA specific forward primer and the universal reverse primer (Table S8), and the cDNA reversetranscribed from 10 ng total RNA using the following conditions: 95uC for 20 sec, 40 cycles of 95uC for 10 sec, 60uC for 10 sec and 72uC for 6 sec. The endogenous reference, 5.8 S rRNA, was analyzed on the cDNA template converted from total RNA using random primers (Invitrogen, USA). In order to verify the specificity of PCR-amplification, some amplicons were cloned and sequenced. Figure S1 Predicted hairpin structures of P. ginseng miRNA precursors. Mature miRNA sequences are indicated in red. miRNA* sequences are indicated in blue. (DOC) Figure S2 The quality and quantity of total RNA used for small RNA library construction. One ml of total RNA were diluted with 1 ml of RNase-free water and then denatured at 70uC for two min. Total RNA was analyzed using Agilent 2100. The results showed that the sample had a RNA integrity number (RIN) of 8.1 and a 28s/18s rRNA ratio of 1.8, suggesting the RNA samples were not degraded. The concentration of original total RNA is 302 ng/ml, which is good for small RNA library construction. (DOC)