Transcriptome-Wide Identification and Characterization of MicroRNAs from Castor Bean (Ricinus communis L.)

Background MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression and play essential roles in numerous developmental and physiological processes. Currently, little information on the transcriptome and tissue-specific expression of miRNAs is available in the model non-edible oilseed crop castor bean (Ricinus communis L.), one of the most important non-edible oilseed crops cultivated worldwide. Recent advances in sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used high-throughput sequencing technologies to identify and characterize the miRNAs in castor bean. Results Five small RNA libraries were constructed for deep sequencing from root tips, leaves, developing seeds (at the initial stage, seed1; and at the fast oil accumulation stage, seed2) and endosperms in castor bean. High-throughput sequencing generated a large number of sequence reads of small RNAs in this study. In total, 86 conserved miRNAs were identified, including 63 known and 23 newly identified. Sixteen miRNA isoform variants in length were found from the conserved miRNAs of castor bean. MiRNAs displayed diverse organ-specific expression levels among five libraries. Combined with criteria for miRNA annotation and a RT-PCR approach, 72 novel miRNAs and their potential precursors were annotated and 20 miRNAs newly identified were validated. In addition, new target candidates for miRNAs newly identified in this study were proposed. Conclusions The current study presents the first high-throughput small RNA sequencing study performed in castor bean to identify its miRNA population. It characterizes and increases the number of miRNAs and their isoforms identified in castor bean. The miRNA expression analysis provides a foundation for understanding castor bean miRNA organ-specific expression patterns. The present study offers an expanded picture of miRNAs for castor bean and other members in the family Euphorbiaceae.


Introduction
The castor bean (Ricinus communis L., Euphorbiaceae, 2 n = 20) is one of most important non-edible oilseed crops and its seed derivatives are often used in aviation oil, lubricants, nylon, dyes, inks, soaps, adhesive and biodiesel. Among all the vegetable oils, seed oil of castor bean is distinctive due to its high level of ricinoleic acid (over 85%), a fatty acid consisting of 18 carbons, a double bond between C9 and C10, and a hydroxyl group attached to C12. In particular, owing to its excellent solubility in ethanol or methanol, seed oil of castor bean was considered as an ideal and unique feedstock for biodiesel production [1][2][3]. Because of its high economic value, castor bean is widely cultivated in tropical, sub-tropical and warm-temperate countries, particularly India, China and Brazil [4]. Due to increased demand for production of castor bean seed oils in many countries, breeding and improvement of varieties are drawing a great attention from breeders [5]. Particularly, genetic improvement of varieties by genetic engi-neering techniques offers great promises in castor bean [6,7]. Enhanced efforts should be paid for elucidating the molecular mechanism underlying the regulation of growth and development.
The microRNAs (miRNAs) are endogenous noncoding small RNAs which play significant roles in the regulation of gene expression. Post-transcriptional gene regulation by miRNAs constitutes one of the most conserved and well characterized gene regulatory mechanisms. In higher plants, miRNAs play significant roles in different developmental stages by regulating gene expression at transcriptional and post-transcriptional levels [8][9][10][11][12]. Identification and characterization of miRNAs and their targets in diverse species has been a major focus in recent years [13][14][15][16]. Although a number of miRNAs have been identified from diverse plants, information on identification and characterization of miRNAs in the family Euphorbiaceae, an important resource plant group, is very limited. So far, the miRNA database miRBase [17][18][19] (Release 19, January 2013, http://www.mirbase. org) contains 63 miRNAs identified from castor bean, 28 miRNAs identified from rubber tree (Heven brasiliensis) and 10 miRNAs identified from Manihot esculenta in Euphorbiaceae. Although 63 miRNAs had been identified from castor bean in the previous study [20], little information on the transcript level and their tissue-specific expression of miRNAs, however, is available in castor bean. Identification and characterization of miRNAs will contribute to the understanding of the molecular basis of regulating developmental and physiological processes in castor bean.
Recently, high-throughput sequencing technologies have been proven to be a powerful strategy to profile miRNA expression pattern and detect novel miRNAs at unprecedented perspectives [21][22][23][24]. In particular, high-throughput sequencing technologies can be reliably used to measure modest changes in miRNA abundance among different samples; such changes are unlikely to be identified by sequencing low numbers of clones (i.e., traditional small RNA library sequencing) or hybridization-based methods such as small RNA blot and miRNA array analyses. Highthroughput sequencing technologies can not only discover novel miRNAs (which produce transcripts in low abundance) due to their ability to generate millions of reads with a determined length, but also characterize their expression among tissues according to their relative abundance. MiRNAs of diverse plants such as maize [25], common bean [26], peanut [27], safflower [28], cucumber [29], soybean [30], cabbage [31], Panax ginseng [32] and Pinus densata [33], have been investigated using high-throughput sequencing technologies in recent years.
In this study, we performed deep sequencing and bioinformatic analyses of caster bean tissues (leaves, roots, developing seeds and endosperms) to identify and characterize conserved and novel miRNAs, as well as expression patterns of miRNAs among different tissues and at different stages of seed development. We expected that the conserved, novel and differentially expressed miRNAs obtained in this study provide a basis for further investigation of the physiological roles of identified miRNAs and the molecular mechanism underlying the regulation of growth and development in castor bean.

Library Construction, Sequencing and Characterization of Small RNA Transcriptomes in Castor Bean
In order to identify and characterize conserved and novel miRNAs in castor bean, we constructed five small RNA libraries from leaves, root tips, developing seeds at the initial stage (seed1) and at the oil fast accumulation stage (seed2) and endosperms, and obtained sequence reads through Solexa high-throughput sequencing technologies. Initially, a total number of 14,259,011 (leaf), 13,467,037 (root tip), 11,423,439 (seed1), 11,334,893 (seed2), 12,955,198 (endosperm) raw reads were obtained. After filtering the low quality reads, adaptor and contaminant sequences, the clean reads were 14,187,024, 13,317,609, 11,098,154, 11,089,507 and 12,553,234 for leaf, root tip, seed1, seed2 and endosperm libraries, respectively. Based on these sequences we analyzed the length distribution and found that among the unique size distribution pattern, most of the reads were distributed between 21 and 24 nt ( Figure 1). This observation was consistent with the typical size of miRNA from Dicer digestion products. Among which, sequences with the length of the 21 nt and 24 nt were shown to be significantly in abundance, specifically, the sequences with length of 21 nt was highest abundance in leaf, root tip and seed1 libraries, accounting for 56.82%, 37.22% and 28.42% of the sequence number, respectively, whereas sequences with the length of 24 nt were the highest abundance in seed2 and endosperm, accounting for 33.35% and 33.17% of the sequence number.
Subsequently, we annotated all the reads fall into the length of 16-26 nt from all the five libraries (including leaf, root, seed1, seed2 and endosperm) and obtained 1,742,976, 2,758,394, 2,411,289, 2,944,394, 3,557,270 unique reads (the sequence of a particular type with non-redundancy) for leaf, root, seed1, seed2 and endosperm libraries, respectively. Among them, non-coding small RNAs annotated (snRNAs, snoRNAs, tRNAs, rRNAs and miRNAs) occupy 7,050,077, 5,569,288, 4,714,941, 3,012,704 and 3,742,076 reads in leaf, root, seed1, seed2 and endosperm, respectively (Table 1). In addition, a small proportion of reads could be mapped to coding sequences, which are likely to be RNA degradation products; a small proportion of reads could be mapped to intron sequences, which are likely to be related to the splicing of the host gene to produce pre-miRNA molecules.

Identification of Conserved miRNAs in Castor Bean
To identify conserved miRNAs in castor bean small RNA libraries, the unique reads (excluded reads mapped to snRNAs, snoRNAs, tRNAs and coding sequence or intron sequence) 11,637,637,9,950,773,7,612,537,8,844,149,9,647,553 from five libraries were subjected to the homolog search against miRBase 19. A number of 6,193,105, 3,677,233, 2,405,407, 2,100,760, and 2,283,045 reads in leaf, root, seed1, seed2 and endosperm libraries respectively, were homologous with known castor bean miRNAs, which accounts for 53.2%, 36.9%, 31.6%, 23.8%, and 23.7% of unique reads from each library, respectively (see Table 1). These observations suggest that known miRNAs are only a small portion and there still may be complicated ingredients in Solexa sequenced data.
In total, 86 conserved miRNAs were detected, covering 26 miRNA families. As shown in Table 2 and Table S1, the most abundant are miR169 (12 members), miR170/171 (nine members), and miR156/157 (eight members). Of the 86 miRNAs, 69 miRNAs were expressed in all five libraries, which accounted for 80.2%; 13 miRNAs (including one miR159/319, nine miR169s, one miR172, two miR399s) were not detected in the leaf library; seven miRNAs (including one miR160, four miR169s, one miR398 and miR2111) were not detected in the root library; ten miRNAs (including nine miR169s and one miR170/171) were not detected in the seed1 library; six miR169s were not detected in the seed2 library; and four miRNAs (including two miR169s, one miR170/171 and one miR399) were not detected in the endosperm tissue.
The sequencing frequencies for miRNAs in the library can be used as an index for estimating the relative abundance of miRNAs. High-throughput sequencing produced a large number of miRNA sequences, allowing us to determine the relative abundance of miRNAs in castor bean; the frequencies of miRNA families varied largely in different libraries, e.g. most members of miRNA156, miRNA167, miRNA168, miRNA535 were abundant in all libraries, whereas members of miRNA160, miRNA169, miRNA319, miRNA393, miRNA395, miRNA398 and miRNA399 were scarce in all libraries (see Table 2), indicating that expression level of miRNAs varies significantly among different miRNA families in castor bean. In addition, most of the miRNA members displayed a tissue-or developmental stagespecific expression, e.g. miR156e has a low expression in leaf and root libraries and a high expression in the seed libraries; the miR156f, miR156g and miR156 h have the highest expression in the leaf library and the lowest expression in seed1 library.
When analyzing the miRNA/miRNA* duplex structure for all conserved miRNAs identified in castor bean, we found that 60 of 86 conserved miRNAs displayed the miRNA/miRNA* duplex structure ( Figure 2 for examples), involving 23 families (see Table 3). In contrast, the abundance of miRNA* is significantly  lower than their reference miRNAs, except for rco-miR171e* and rco-miR408* (which has abundances higher than their references rco-miR171e and rco-miR408).

Identification of miRNA Isoforms
MiRNAs were initially thought to have a specific sequence of a defined length. Identification of miRNAs from different species has revealed that there are variations in pre-miRNA processing, which could result in miRNA isoforms with one or two nucleotide variation in length or structure from the same locus [26]. Ehrhardt et al. (2010) demonstrated that one fifth of the annotated Arabidopsis thaliana miRNAs (miRBase 14) have a stable miRNA isoform of one or two nucleotides longer [34]. Previous studies have revealed that these miRNA isoforms may have functional divergence due to differential associations with AGO proteins [35][36]. To identify miRNA isoforms from our transcriptome data, all   19), allowing at most two mismatches or four nucleotides in length difference. The total number of isoform variants found for each library was subjected to a filter that consisted of choosing variants that had a total number of reads 50% greater than the number of total reads of their reference miRNA previously reported, so that low-abundance and probable non-functional variants were discarded. Compared with the length and sequences of the reference miRNAs identified from castor bean genome based on computational prediction in previous study [20], 16 isoform variants from five libraries were detected totally, involving ten families (miRNAs 156, 167, 171, 319, 393, 395, 396, 398, 399 and 403; see Table 4). In the case of miR156, the isoform variant iso-miR156a-d with the 21A absent was detected from four loci (a, b, c and d); the isoform iso-miR156e with a 59 single nucleotide U/T extension from one locus (e). For the miR167 family, two isoforms with a 39 single nucleotide A (iso-miR167b) extension or G (isomiR167c) deletion were detected from two loci (b and c). In the case of miR319, two isoform variants with a 39 single nucleotide T (iso-miR319a-c) and a 59 single T extension and a 39di-nucleotide TT deletion (iso-miR319d) were detected from different loci. In the case of miR395, the isoform variant iso-miR319a-e with a 39 trinucleotide TCT deletion were detected from all miR395 loci identified (a, b, c, d and e). Similarly, in the case of miR399, the isoform variant with a 39 bi-nucleotide GG deletion (isomiR399bd) was detected from three loci (b, c and d), and the isoform variant with a 39 tri-nucleotide CAG deletion (iso-miR399e) was detected from the e locus. In the cases of miR171 and miR398, two isoform variants (iso-miR171a,b and iso-miR171g, and iso-miR398a and iso-miR398b) with a 59 tri-or tetra-nucleotide addition and a 39 tri-or tetra-nucleotide deletion were detected from different loci. In the other cases such as miR393, miR396 and miR403, isoform variants were produced due to the 1-3 nucleotide addition or deletion in the 3 strand of miRNAs. These results indicated that the isoform variants mainly occurred in several specific miRNA families such as miR156 (isoforms were detected from five loci), miR395 (isoforms were detected from five loci) and miR399 (isoforms were detected from four loci) in castor bean. The variation in length of isoforms identified involved two types: 1) single or several nucleotides addition or deletion in the 39 strand only (such as miR167 and iso-miR167, miR395 and iso-miR395, miR399 and iso-miR399); and 2) single or several nucleotides addition or deletion both in the 59 and 39 strands simultaneously (such as miR156 and iso-miR156, miR171 and iso-miR171, miR398 and iso-miR398).
When inspecting the expression of these isoform variants among five libraries, we unexpectedly found that the expression of these isoforms among different libraries had significant divergence, e.g., in the cases of miR156a-d, miR156e, miR167c and miR171a,b, the variants iso-miR156a-d, iso-miR156e, iso-miR167c and iso-miR171a,b were more highly expressed in all libraries than the rco-miR156a-d, rco-miR156e, rco-miR167c and rco-miR171a,b (Figure 3 for examples); in the case of miR395, rco-miR395a-e had relatively higher expression in the leaf library than its expression in other libraries, whereas the iso-miR395a-e was weakly expressed in all libraries; in the case of miR399, rco-miR399b-d was relatively higherly expressed in root library than other tissues, whereas the rco-miR399b-d was weakly expressed in all libraries; in the case of miR403, rco-miR403a,b was relatively higherly expressed in the leaf library than other libraries, whereas the iso-miR403a,b was higherly expressed in the seed2 library than others; in the case of miR171, rco-miR171g was only present in the root library, whereas the iso-miR171g was present in all libraries except for the endosperm library (see Table 4).

Expression Patterns of miRNAs among Tissues
Preferential expression of a miRNA in specific tissues might provide clues about its physiological function. To investigate the expression patterns of miRNAs among leaf, root, developing seeds and endosperm in castor bean, read count of each identified miRNA was normalized to the total number of miRNA read count in each library. Based on the relative abundance, we found that the expression of certain members within the miRNA families varied greatly in the given tissues, suggesting functional divergence within the family in castor bean. For example, abundance of the miR156 family varied from 122 reads (rco-miR156e) to 322,939 reads (rco-miR156e) in the leaf library, similar to the case for miR167 family varied from 941 reads to 59,219 reads in the seed1 library (see Table S2). These results indicate that miRNA members in one given miRNA family display clearly different expression levels, probably implying their functional divergence. We compared the expressional differentiation of conserved miRNAs identified between the leaf and seed1, root and seed1, seed2 and seed1, and endosperm and seed1, respectively. We found that 49 out of 69 miRNAs detected between the leaf and seed1 were significantly differentially expressed (log2ratio foldchange .1.0 and P value ,0.001, see Figure 4a and Table S2) with 15 miRNAs up-regulated and 34 miRNAs down-regulated in leaf. Similarly, 42 out of 69 miRNAs between the seed1 and root were significantly differentially expressed with 17 miRNAs upregulated and 25 miRNAs down-regulated in root (see Figure 4b and Table S2). When comparing the expressional differentiation of miRNAs between the seed2 and seed1, endosperm and seed1, respectively, we found that 42 out of 65 miRNAs detected between the seed1 and seed2, and 60 out of 68 miRNAs detected between the endosperm and seed1, were significantly differentially expressed (log2ratio fold-change .1.0 and P value ,0.001, see Figure 4c-d and Table S2) with 23 miRNAs up-regulated and 19 miRNAs down-regulated in the seed2, and 23 miRNAs upregulated and 37 miRNAs down-regulated in the endosperm. It is worthy to note that some families such as miR166 and miR165 were of abundance cross the five libraries, whereas many families such as miR160, miR169, miR171, miR395 were lowly expressed in five libraries. Based on their abundance in the libraries, most members of miR156 family were of higher abundance in vegetable tissues (leaf and root), whereas the rco-miR156e had higher expression in developing seeds than in the leaf and root; the members of miR167 and miR164 had obviously preferential expression among tissues (see Table 2 and Table S2). Novel miRNA Detection One of the most important features for high-throughput sequencing is that it can be employed to detect novel miRNAs in small RNA transcriptome [22,37]. In the previous study, 83 miRNAs were predicted based on genome sequences in castor bean and 63 of 83 miRNAs predicted were validated and released in the miRNA database [20]. In this study, remaining unannotated reads (5,444,532, 6,273,540, 5,207,130, 6,743,389 and 7,364,508 from leaf, root, seed1, seeds and endosperm, respectively) were mapped to reference genome of castor bean for identifying the genomic location and retrieving the adjoining sequence to help with secondary structure prediction of a miRNA precursor using the MIREAP pipeline (developed by BGI). The resulting reads, with a characteristic hairpin structure, a maximum free energy of ,25kcal/mol, minimal matched base pairs of miRNA and miRNA* exceeding 16 nt and the sequence length of 20-23 nt, and reads abundance more than 100 at least in one independent library were considered as novel miRNA candidates. As a result, 72 potential miRNA candidates were identified with typical stem-loop structure ( Figure S1), the negative folding free energies ranged from 25.4 to 103 (kcal/mol), and diverse loci in castor bean genome (see Table 5 and Table S3). Of the 72 potential miRNAs, 24 represented both the miRNA and miRNA* and 48 were miRNA*-deficient cases (having only the 59 arm or 39 arm sequences) (see Table 5 and Table S3). Fifty-three of these novel miRNA candidates were expressed in at least two independent libraries, and 19 of these candidates were expressed in a single library. A recently published article proposed precise and strict new miRNA annotation criteria by Meyers et al. [38]. Besides the primary criteria used by Mireap, two elementary requirements are demanded in high-throughput sequencing data analysis: (i) high-throughput sequencing data should represent both the miRNA and miRNA*; and (ii) in miRNA*-deficient cases, isolation and sequencing of the candidate miRNA should come from multiple and independent libraries. Based on these precise criteria, 58 of 72 novel miRNA candidates were categorized as highly confident. Fourteen miRNA candidates identified by Mireap did not meet Meyers et al.'s criteria (see Table 5).  [39,40], we predicted targets of the 95 miRNA candidates (including 23 new conserved and 72 novel miRNAs) using the currently annotated mRNAs of genes in the castor bean (from the CBGD database http://castorbean.jcvi.org). As a result, 80 of 95 miRNA candidates were identified to have their target genes, involving 482 miRNA:target pairs. The function of these target genes were broadly involved in the growth and development process of castor bean. The predicted target genes of these 95 miRNA candidates and their potential functional annotations are listed in Table S4.

Validation of the Putative miRNAs Newly Identified in Castor Bean
To validate the 95 miRNA candidates newly identified by highthroughput sequencing results, RT-PCR analysis was performed according to the method described in ''Materials and Methods''. Using first-strand cDNAs obtained respectively from leaves, root tips and developing seeds, 20 primer pairs showed clean amplification bands for miRNAs PCR products including five conserved miRNA families (rco-miR172bc-d, 396b-c, 482, 827 and 4414) and fifteen novel putative miRNAs (Rco-miR002, Rco-miR006, Rco-miR029, Rco-miR030, Rco-miR032, Rco-miR038, Rco-miR040, Rco-miR043, Rco-miR044, Rco-miR052, Rco-miR053, Rco-miR054, Rco-miR058, Rco-miR064 and Rco-miR068, see Figure 5), suggesting the 20 miRNAs newly identified were validated by RT-PCR amplification. When comparing the abundance of these miRNAs validated in five miRNA libraries, we found these miRNAs were relatively more abundant than other miRNAs newly identified (see Table 2 and Table 5). Those miRNAs newly identified with low abundances were not validated by RT-PCR amplification probably because of their low expression levels in these tissues tested. These RT-PCR results exhibited the same expression profiles as the original highthroughput sequencing results.

Discussion
Although miRNAs have been studied extensively in diverse plant species in these years, limited knowledge is known for plant species in the family Euphorbiaceae. Based on complete genome data of castor bean, the study on a genome-scale computational prediction of miRNAs combined with experimental analysis [20] provided a basis for further characterization and functional analysis of miRNAs in Euphorbiaceae species. The current study  using high-throughput sequencing method greatly enriches our knowledge in identifying miRNAs in castor bean and facilitates more particular and specific miRNA studies castor bean and other members of the family Euphorbiaceae as well. High-throughput sequencing analyses have become one of the major sources supporting miRNA annotations [22][23][24]. This study is the first report on identification and characterization of miRNAs and generates a large number of small RNA sequence reads using high-throughput sequencing techniques in castor bean. Studies to elucidate the number of miRNA molecules sequenced from these small RNA sequence reads are still needed for more accurate small RNA profiling studies. In term of reads, the small RNA libraries sequenced finally yielded a large number of unannotated reads after new miRNA screen in this study. These remaining unannotated reads could remain for further analyzing characterization of siRNA populations in castor bean.
Usually, miRNA isoform variants are considered to be a consequence of inaccuracies in Dicer pre-miRNA processing [41]. However, sequence length variation often have been overlooked, as small variations in the sequence length might not have been thought to alter the function of individual miRNAs, as they are directed to their target genes by base pairing [34]. Recent studies had showed that miRNAs and their isoform variants in length broadly co-existed and these variants might lead to functional differentiation, in particular, when the variation occurs in the 59end and gives rise to a alternation of the miRNA and argonaute (AGO) binding [36,42]. A decrease in abundance of the 21 nt isoform variant reduces miR168 homeostasis and leads to developmental defects in Arabidopsis and sequence length heterogeneity for plant miRNAs often is essential for correct plant development and environmental responses [36]. Although most of the isoform variants identified from the length variant group exhibit 39 heterogeneity, little is known about the biological interest of the variation in length occurring in 39-end of miRNAs. In this study, small RNA sequences from libraries were considered as miRNA isoforms only if they were similar to a reference miRNA identified in miRBase and had a significantly greater number of reads compared to those found for the reference miRNA in all five libraries. From these analyses for isoform identification, 16 miRNA isoforms involving 10 miRNA families were added to the total number of conserved miRNA families identified in castor bean. Six miRNA isoforms displayed 59 heterogeneity and ten displayed 39 heterogeneity. Whether these isoform variants detected in castor bean have functional differentiation and play different regulatory roles in plant growth and environmental responses are yet unknown. The expressional differentiation of these isoform variants and their references among tissues, however, imply their functional divergence, if these isoform variants have their biological interest. In addition, those variant sequences with missing bases and low frequencies produced from high-throughput sequencing could be viewed as degradation products or pyrophosphate sequencing errors. Application of deep sequencing technology can shed considerable novel lights hidden in the small RNA transcriptome data not only for identification of new conserved miRNAs, but also for successful discovery of novel miRNAs with high accuracy and efficiency [41]. Our current study has led to the discovery of 23 new conserved and 72 novel miRNA candidates in castor bean. These new miRNA candidates largely enriched the miRNA database for castor bean and Euphorbiaceae members. However, only seven new conserved and 15 novel miRNAs were validated using experimental RT-PCR method, though 58 of 72 novel miRNA candidates had been categorized as highly confident according to previous strict miRNA annotation criteria, with 35 represented both the miRNA and miRNA*. Most of novel miRNA candidates identified in this study have not been validated. The most likely reason is due to the limit of RT-PCR method when target miRNAs tested have a low expression [23,37]. Thus, validity of these novel miRNA candidates need to be further confirmed.
When comparing the numbers of miRNAs identified using the same high-throughput sequencing approach between rubber tree [43] and castor bean, we found that castor bean appeared to have less conserved miRNAs (86) involving 27 miRNA families than rubber tree which had 115 conserved miRNAs, covering 56 families. Further, we found that all homologs of 27 conserved miRNA families of castor bean in rubber tree, but we did not find any homolog of the 72 novel miRNAs identified from castor bean in other members of Euphorbiaceae including rubber tree [43,44], Jatropha curcas [45] and Manihot esculenta [46], implying that the 72 novel miRNAs detected might represent castor bean speciesspecific miRNAs. Compared to the target genes identified in other plants, rco-miR167, rco-miR172 and rco-miR482 exhibited similar targets to their homologs in Arabidopsis [47] and maize [25]. However, four conserved miRNAs newly identified (including rco-miR396, rco-miR827, rco-miR2111 and rco-miR4414) and most of the novel miRNAs in castor bean displayed speciesspecific targets.
In addition, high-throughput sequencing technologies can serve as a powerful miRNA expression profiling tool to identify the differentially expressed miRNAs, providing the basis for future analysis of miRNA functions and elucidating underlying mechanisms in regulating diverse molecular and physiological pathways [12,37]. In the study, comparison of their expression patterns among different tissues shows that 49, 42, 42 and 60 of 86 conserved miRNAs are significantly differentially expressed between seed1/leaf, seed1/root, seed1/seed2 and seed1/endosperm, respectively. Similarly, many of the miRNA*, isoform variants and novel miRNAs identified in this study presented differential expression patterns among tissues sampled. Although the biological function of miRNAs in castor bean is unclear the expressional differentiation of these miRNAs among tissues provides a clue for further investigation of the physiological roles of miRNAs in castor bean. Castor bean is of an important oilseed crop worldwide, containing significant amounts of lipid and protein. In this study, we searched for miRNAs that might play a function in regulating biological processes related to the biosynthesis of lipid and protein in developing seeds and endosperms. Our results demonstrated that ten miRNAs (rco-miR156f,e, rco-miR159, rco-miR168, rco-miR390a, rco-miR393a, rco-miR396a, rco-miR408, rco-miR003 and rco-miR020) had 21 target genes, which were involved in amino acid metabolism, fatty acid metabolism and lipid metabolism with differential expressions at different stages of seed development. These results imply that the ten miRNAs might have a physiological role in regulating lipid and protein biosynthesis in castor bean. In summary, we have identified and characterized a large number of miRNAs from castor bean, analyzed their expression and predicted the putative targets of these miRNAs. It will be very important to experimentally characterize these miRNAs and their downstream targets, as this will lead to a better understanding of the function relationship and mechanism of miRNAs in the regulation network. In particular, our high-throughput sequencing approach to miRNA discovery suggests that a significant number of novel miRNAs remain to be further analyzed and characterized. The current study is the first report on identification and characterization of miRNA using the high-throughput sequencing approach in castor bean.

Ethics Statement
No specific permits were required for the described field studies. No specific permissions were required for these locations and activities. The location is not privately-owned or protected in any way and the field studies did not involve endangered or protected species.

Sample Preparation and Total RNA Extraction
Seeds of castor bean var. ZB306 elite inbred line (provided kindly by Zibo Academy of Agricultural Sciences, Shandong, China) were cultivated in the greenhouse of Xishuangbanna tropical botanical garden (Kunming branch) with the temperature of day at 24-26uC and night at 18-20uC with the humidity controlled at 60-80%. Leaf tissue was collected from a fully expanded young leaf and root tips were collected, washed and dissected. Immature seeds at two different stages, i.e. seed1 at the initial stage (15 days after pollination) and seed2 at the fast oil accumulation stage (35 days after pollination) of seed development, were collected. Endosperm tissue was dissected from the immature seeds (40 days after pollination). The developing seeds did not start to accumulate TAG at the initial stage (seed1) and fast accumulated TAG at the fast oil accumulation stage (seed2, see Figure S2). Total RNA was extracted from the leaf, root tip, immature seed (seed1 and seed2) and endosperm tissues separately using Trizol (TaKaRa, Dalian, China) following the manufacturer's protocol. The quality of total RNA samples was tested using both the NanoDrop Spectrometer (ND-1000 Spectrophotometer, Peqlab) and agarose gel (1.5%) electrophoresis.

Small RNA Library Construction and Sequencing
Total RNA samples were firstly processed by 15% denaturing polyacrylamide gel electrophoresis (PAGE). The small RNA fragments in the range of 16-30 nt in length were isolated from the gel and purified by sRNAs gel extraction Kit (TaKaRa Bio, Otsu, Japan). Then, the 59 and 39 termini of the small RNA were linked with proprietary adapters sequentially and RT-PCR was performed to amplify RNA to DNA, which can be used as templates to produce sequencing libraries. At last, approximately 20 mg sequencing libraries were produced and Illumina Solexa Genome Analyzer was employed to sequence the generated libraries.

Small RNA Sequencing Analysis
After sequencing, we trimmed the adaptor sequences, filtered out the low quality tags and eliminated contamination of adaptor sequences. Non-coding RNAs including rRNA, tRNA, snRNA and snoRNA were identified by reads alignment to the Pfam 10.1 (http://www.sanger.ac.uk/software/Rfam) and GeneBank databases. After removing non-coding RNAs, the clean small RNA sequences ranging from 16-28 nt were collected and mapped to the castor bean genome for getting the unique reads with abundance and position on the genome using SOAP 2.0 program (http://soap.genomics.org.cn/). The unique RNA sequences that perfectly matched the castor bean genome were subjected to subsequent analysis. Sequence reads overlapping with exons and introns of mRNA were excluded to avoid DNA contamination or mRNA degradation products.

Identification of Conserved, Isoform and Novel miRNAs
In order to determine conserved miRNAs, the trimmed unique reads were aligned against the mature or precursor of conserved castor bean miRNAs in the miRBase [48]. Only the small RNA sequences that perfectly matched known castor bean miRNAs were considered to be conserved miRNAs. To find new conserved miRNAs, the remaining reads were aligned with mature plant miRNA sequences in miRBase allowing at most two mismatches. According to the genomic positions of new conserved miRNA candidates identified, we retrieved the flanking genomic sequences around matched loci to form possible precursors of candidate miRNAs with the Mfold program [49]. Those candidate sequences containing a typical RNA stem-loop with at least 18 bp in matched regions and having folding energy no greater than 218 kcal/mol were considered as new conserved miRNAs. Meanwhile, we inspected stem-loop structures for each miRNAs identified in castor bean and defined the star miRNA sequences based on Dicer-cleavage rules as implemented in the miRDeep software tool [50].
With the purpose of identifying miRNA isoforms, the sequence reads from all libraries that perfectly mapped in the annotated miRNA precursor sequences but not representing annotated miRNA mature and star sequences, were not shifted more than four positions from their original mature or star 59 position and have a total number of reads 50% greater than the total reads of their reference miRNA were considered as isoform miRNAs in castor bean. If no reference miRNA for a variant was previously detected in all libraries, the variant with the highest frequency was considered.
To identify the novel miRNAs, the unannotated reads that were identical to genome sequence were collected and the flanking sequences around matched position were retrieved. The MIREAP pipeline (https://sourceforge.net/projects/mireap/) was used to analyze their characteristic hairpin structure of miRNA precursor. Those reads which could meet criteria including having a characteristic hairpin structure and the Dicer cleavage site with a maximum free energy of ,25kcal/mol, minimal matched base pairs of miRNA and miRNA* exceeding 16 nt, the sequence length of 20-23 nt and the reads abundance .100, were considered as novel miRNAs. The filtered pre-miRNA sequences were folded again using Mfold and checked manually.

Validation of miRNAs Newly Identified
To validate castor bean miRNAs newly identified in this study, a modified oligo (dT) primers RT-PCR approach as described by Fiedler et al. [51] was performed. Briefly, after total miRNAs were extracted from plant tissues, polyA tails to all transcript miRNAs were added, and then transcript miRNAs with polyA tails were reversely transcribed into cDNAs using a set of 12 modified oligo(dT) primers containing a unique sequence tag at the 59 end and two bases at the 39 end. This step reaction converts all miRNAs into cDNAs with ,90bp length. Further, RT-PCR amplification is achieved using a primer specific to the miRNA in interest and a primer specific to the tag.
In our study, total miRNA was isolated from leaves, root tips and developing seeds of castor bean using Plant MicroRNA Extraction Kit (BIOTEKE, Beijing, China), following the manufacturer's instructions. MiRNA reverse transcription reactions were performed using One Step miRNA 1st cDNA Synthesis Kit (HaiGene Biotech, Haerbin, China) in a 20 mL reaction solution containing 1000 ng miRNAs, 4 mL 4x One Step miRNA RT solution, 2 mL 10x miRNA RT Primers, and RNase-free water was used to adjust the total volume of the reverse transcription reaction to 20 mL. The miRNA reverse transcription reactions were incubated in an Eppendorf Mastercycler (Eppendorf North America, Westbury, NY) for 60 min at 37uC, followed by 5 min at 95uC, and then 4uC until further use. For PCR amplification, 86 specific primers were designed based on mature miRNA sequences for amplifying 95 miRNAs new indentified (see Table S4). The RT-PCR reactions were performed in a 10 mL volume containing 1 mL diluted reverse transcription product, 16PCR buffer, 0.2 mM dNTPs, 2.0 U EasyTaq DNA polymerase (TransGen Biotech, Beijing, China), and 0.5 mM specific miRNA primer and universal primer (59-TTACCTAGCGTATCGTT-GAC-39) on Eppendorf Mastercycler. The PCR reaction conditions used were as follows: 2 min at 95uC, followed by 38 cycles of denaturation for 5 s at 95uC, annealing for 5s at 55-60uC, extension for 35s at 70uC, and then 4uC. PCR amplification products were confirmed on 1.5% agarose gel.

Differential Expression Analysis
To investigate the differentially expressed miRNAs among castor bean leaf, root, seed1, seed2 and endosperm, miRNAs considered for this analysis were the conserved miRNAs (Table 2). Firstly, each miRNAs read count was normalized against the total number of miRNA reads in each given sample. Subsequently, the fold-change (log2(sample1/sample2) and P-value were calculated from the normalized expression, and significantly difference of a given miRNA was determined by the P#0.001 and fold-change $1 in two samples.

Prediction of miRNA Targets
The whole genome and transcript databases of castor bean (http://castorbean.jcvi.org/index.php) provide a rich resource for predictions of miRNA targets. The putative target sites of miRNA candidates were identified by aligning the miRNA sequences with the genome and transcript database of castor bean. Allen et al.'s and Schwab et al's criteria [39,40] were used in our analysis, i.e.: each G:U wobble pairing was assigned 0.5 point; each indel was assigned 2.0 points; all other noncanonical Watson-Crick pairings were each assigned 1.0 point; no more than two adjacent mismatches in the miRNA/target duplex with a minimum free energy (MFE) of the miRNA/target duplex 75% greater than the MFE of the miRNA bound to it's perfect complement. Figure S1 The second structures of newly identified 95 miRNAs including 23 conserved (*) miRNAs and 72 novel pre-miRNAs in castor bean. (DOC) Figure S2 Developing seeds of castor bean and lipid (triacylglycerols, TAG) accumulation at two different developmental stages.

Supporting Information
(DOC) Table S1 The conserved miRNAs identified from castor bean and their distribution among miRNA families.

Author Contributions
Conceived and designed the experiments: AL. Performed the experiments: WX QC. Analyzed the data: WX QC. Contributed reagents/materials/ analysis tools: FL. Wrote the paper: AL QC WX.