The Hot Pepper (Capsicum annuum) MicroRNA Transcriptome Reveals Novel and Conserved Targets: A Foundation for Understanding MicroRNA Functional Roles in Hot Pepper

MicroRNAs (miRNAs) are a class of non-coding RNAs approximately 21 nt in length which play important roles in regulating gene expression in plants. Although many miRNA studies have focused on a few model plants, miRNAs and their target genes remain largely unknown in hot pepper (Capsicum annuum), one of the most important crops cultivated worldwide. Here, we employed high-throughput sequencing technology to identify miRNAs in pepper extensively from 10 different libraries, including leaf, stem, root, flower, and six developmental stage fruits. Based on a bioinformatics pipeline, we successfully identified 29 and 35 families of conserved and novel miRNAs, respectively. Northern blot analysis was used to validate further the expression of representative miRNAs and to analyze their tissue-specific or developmental stage-specific expression patterns. Moreover, we computationally predicted miRNA targets, many of which were experimentally confirmed using 5′ rapid amplification of cDNA ends analysis. One of the validated novel targets of miR-396 was a domain rearranged methyltransferase, the major de novo methylation enzyme, involved in RNA-directed DNA methylation in plants. This work provides the first reliable draft of the pepper miRNA transcriptome. It offers an expanded picture of pepper miRNAs in relation to other plants, providing a basis for understanding the functional roles of miRNAs in pepper.


Introduction
MicroRNAs (miRNAs), initially discovered in C. elegans, are non-coding RNAs that can play regulatory roles in animals and plants by negatively affecting gene expression at the posttranscriptional level through mRNA cleavage or translation repression, depending on complementarity between miRNA and the target mRNA. Many studies have shown that miRNAs regulate many biological processes in plants, including developmental phase transition, metabolism, hormone signaling, and stress responses [1][2][3]. Plant miRNAs mostly direct the cleavage of target messages with full complementarity, and their target sites are primarily found in coding regions [1,3], inducing a significantly robust effect on gene regulation. Recent studies have showed that plant miRNAs often repress translation via a slicerindependent mechanism [4,5], and therefore plant miRNA-mediated post-transcriptional gene silencing entails a combination of slicing and translation inhibition.
Plant miRNA genes are distinct noncoding transcriptional units which are transcribed by RNA polymerase II as primary miRNAs and are subsequently cleaved into precursor miRNA (pre-miRNAs) by the RNaseIII-type enzyme DICER-LIKE1 (DCL1) to produce the precursor miRNAs (pre-miRNAs). These pre-miRNAs are further cleaved into miRNA:miRNA* duplexes, again by DCL1, in the nucleus. These miRNA duplexes are stabilized by 29-O-methylation which is catalyzed by Hua Enhancer 1 [6] and transported from the nucleus to the cytoplasm by HASTY [7]. One of the strands is finally incorporated into the RNA-induced silencing complex (RISC), where it negatively regulates target mRNA expression.
There are two major approaches for identifying miRNAs in plants: experimental and bioinformatic approaches. Experimental approaches have included forward genetics, direct cloning, and next generation high-throughput sequencing. High-throughput sequencing technology shows significant promise for small RNA identification and has become commonly available and affordable. A large number of miRNAs have been identified by means of high-throughput sequencing and can be found on an online database (http://www.mirbase.org, release 19.0, August 2012), which currently consists of 21,643 mature miRNAs from 168 species, including 4,677 mature miRNAs from 52 plant species. The majority of miRNAs identified so far have been obtained from only a few model plant species, such as Arabidopsis thaliana (328), Oryza sativa (661), Glycine max (395) and Medicago truncatula (674). Solanaceae, whose annotated miRNAs are still very limited [8][9][10], is one of the largest families in the plant kingdom, consisting of more than 3,000 species [11]. Pepper (Capsicum annuum), which belongs to the Solanaceae family, is one of the most economically important crops cultivated worldwide, especially in South and Central America and Asia [12]. It is essential to understand the function of pepper miRNAs, considering the importance of pepper for food, health products and medicines. The study of the miRNAs in pepper has previously been reported using an in silico approach [13]. However, no research using highthroughput sequencing approaches has been performed on pepper so far.
In this study, we employed high-throughput sequencing technology to sequence and identify pepper miRNAs by taking advantage of an ongoing pepper sequencing project [14]. We were effectively able to identify and characterize 29 and 35 of conserved and novel miRNA families, respectively, from 10 different libraries. We also employed small RNA northern blot analysis to further validate and analyze the expression patterns of many of the representative miRNAs. Additionally, we made a detailed analysis of pepper miRNA targets, many of which were experimentally validated using 59 RACE analysis [15]. Most targets of conserved miRNA were validated, and one of novel targets of conserved miRNAs that we first discovered was domain rearranged methyltransferase (DRM), considered to be one of the most important epigenetic marks in plants. On the other hand, we found that most of non-conserved miRNAs were weakly expressed and tend to lack experimentally validated targets. On the whole, our work serves to expand existing knowledge of pepper miRNAs and to give a better comparison between pepper miRNAs and those found in other plants, especially within Solanaceae.

Construction of small RNA library
A small RNA sequencing library was constructed using the Small RNA Sample Prep Kit (Illumina, CA, US), according to the manufacturer's instructions. Briefly, 5 mg of total RNA from hot pepper tissue samples (Mexican pepper landrace, Criollo de Morelos 334, CM334), including, leaf, stem, root, flower, fruit-1 (6 DAP; DAP for days after pollination), fruit-2 (16 DAP), fruit-MG (36 DAP; MG for mature green), fruit-B (38 DAP), fruit-B5 (43 DAP) and fruit-B10 (48 DAP) were ligated to a 39 adaptor and a 59 adaptor sequentially and then converted to cDNA by RT-PCR. The resulting cDNAs were then amplified by PCR, gel-purified and submitted for Illumina/Solexa sequencing. The GEO accession number for our series is GSE 41654.

MicroRNA identification
A miRNA prediction pipeline was written by Python scripting language. High-quality small RNA reads were obtained from raw reads through filtering out poor quality reads and removing adaptor sequences using FAXTX toolkit [16]. These clean sequences were then queried against non-coding RNAs (rRNA, tRNA, snRNA, snoRNA) from the Rfam database (http://www. sanger.ac.uk/software/Rfam) and the tomato genome database, ITAG 2.3 Release (http://solgenomics.net/organism/ Solanum_lycopersicum/genome) [17]. Any small RNA read matches to these sequences were excluded from further analysis. The reads between 18-26 nucleotide in length were selected and aligned with a draft contig sequence of an ongoing pepper genome project [14] using MicroRazerS program [18]. The contig numbers used for mapping miRNAs are available in Table S5 and S6. The perfect matched sequences were selectively chosen and mapped to the maximum of 25 loci per sequence.
To obtain the precursor sequences, potential miRNA sequences (reads$50) were extended upstream and downstream of 100 to 500 nt with a step size of 100 nt. Each putative precursor sequence was folded using RNAfold from the Vienna RNA software package [19], and the potential miRNA* sequences were selected with a mismatch ratio of 0.3 or less. The region of these putative precursor sequences with the addition of 15 nt marginal sequences were re-folded using RNAfold [19] to check whether miRNA/miRNA* duplex was suited to primary criteria for annotation of plant miRNAs [20]. All of the small RNA reads in the selected putative precursor region were mapped to examine the strand bias whether the reads of sense strands accounted for 90% of total reads. The miRNA candidates were essentially grouped into families by mature sequence similarity and/or loci. By the similarity search with miRBase release 18.0 (http://www. mirbase.org), all members of miRNA candidate families were classified to either the known miRNAs or the novel miRNA candidates. The normalizing factors were calculated using the DESeq library [21] in the R statistical software package (R Development Core Team, 2009).

MicroRNA target prediction
The putative target sites of miRNAs were identified by aligning mature miRNA sequences with a draft genome sequence [14] using TargetFinder, (http://carringtonlab.org/resources/ targetfinder). miRNA targets were computationally predicted essentially as described [22][23][24]. Briefly, potential targets from FASTA searches were scored using a position-dependent, mispair penalty system. Penalties were assessed for mismatches, bulges, and gaps (+1 per position) and G:U pairs (+0.5 per position). Penalties were doubled if the mismatch, bulge, gap, or G:U pair occurred at positions 2 to 13 relative to the 59 end of the miRNAs. Only one single-nucleotide bulge or single-nucleotide gap was allowed, and targets with penalty scores of four or less were considered to be putative miRNA targets (Table S1).

Northern blot analysis
Total RNA was extracted from different tissues using TriReagent (Ambion). A total amount of 20 mg of RNA from leaf, stem, root, inflorescence, fruit and seedling was individually separated in a 15% UREA polyacrylamide gel, electrophoretically transferred to Hybond-NX membrane (GE healthcare), and was chemical cross-linked via 1-ethyl-3-(3-dimethylaminopropyl) carbodiimide (EDC) [25]. For labeling reaction of probes, 2 ml of 10 mM oligo, 2 ml of 10X T4 PNK buffer (Takara), 2.5 ml of [c-32 P] ATP, .7000Ci/mmole (,150 mCi/ml), 12.5 ml of dH 2 O and 1 ml of T4 Poly Nucleotide Kinase (Takara) were added in a 20 ml reaction for 1 hour at 37uC. The labeled probes were further purified from unincorporated labels with PERFORMA Spin Columns (Edge Bio) according to the manufacturers' instruction. Probe sequences used for northern blot analysis were shown in Table S2. Hybridization and washing procedures were performed essentially as described [26]. The membranes were exposed to a phosphorimager, and signals were analyzed using BAS-2500 (Fuji).

MicroRNA target validation assays
For miRNA target validation, gene-specific 59 RNA ligasemediated rapid amplification of cDNA ends (59 RLM-RACE) was performed using the GeneRacer Kit (Invitrogen). Five micrograms of total RNA from mixed tissues of seedling, root and flower were ligated to 0.25 mg of the GeneRacer RNA oligo adapter (59-CGACUGGAGCACGAGGACACUGACAUGGACUGAA GGAGUAGAAA-39). The combination of oligo(dT) and random hexamers were then used to prime the first strand of cDNA synthesis in a reverse transcription reaction. The resulting cDNA was PCR-amplified with the GeneRacer 59 primer (59-CGACTGGAGCACGAGGACACTGA-39) and each respective gene-specific primer (shown in Table S3). The PCR product was further amplified by nested PCR using the GeneRacer 59 nested primer (59-GGACACTGACATGGACTGAAG GAGTA-39) and each respective gene-specific primer (shown in Table S3). The final PCR product was gel-purified and finally cloned into pGEM-T Easy vector (Promega) for sequencing.

Small RNA analysis in pepper
To obtain endogenous small RNAs in pepper, we used highthroughput sequencing to generate small RNA sequences from leaf, stem, root, flower, fruit-1, fruit-2, fruit-MG, fruit-B, fruit-B5 and fruit-B10, which yielded a total of 542,938,815 reads (Table 1). After removing adaptor sequences and filtering out low quality tags, a total of 524,201,950 clean reads were obtained, which were 18-26 nt in length (Table 1). After further removal of tRNAs, rRNAs, and snRNAs and snoRNAs, a total of 306,372,865 reads remained.
These small RNA sequences were mapped to draft pepper genome sequences to determine whether they could be considered candidate miRNAs. These candidate miRNAs were further selected based on strict criteria for annotation of plant miRNA [20], including the presence of miRNA* sequences. Consequently, conserved and novel miRNAs were identified and represented in Tables 2 and 3, respectively. Total reads of conserved miRNAs were 15,250,415 whereas those of novel miRNAs were 1,322,036 ( Table 1), suggesting that novel miRNAs are weakly expressed in general. More detailed information regarding the statistics of the small RNA sequences from 10 different libraries is shown in Table 1. Furthermore, the length distribution and 59 end analysis were conducted for the genome-aligned small RNAs ( Figure 1). The majority of small RNAs were 24 nt long and accounted for 46.91% of all small RNAs, followed by 21 nt (14.16%), 22 nt (13.99%) and 23 nt (13.65%). Most of these small RNAs had a 59 terminal U or A, which are indicative of canonical small RNAs [28].

Identification of conserved miRNAs in pepper
With the purpose of identifying conserved miRNAs in pepper, genome-aligned small RNA sequences were BLASTN searched against currently known miRNAs in the miRbase (Release 18), allowing one or two mismatches between sequences. The BLASTN searches identified 128 conserved miRNAs corresponding to 29 miRNA families ( Table 2). Examples of representative hairpin structures are shown in Figure 2 and the full list of secondary structures is provided in Dataset S1.
The sequencing frequencies of miRNAs from ten different libraries might be used to estimate the tissue-or developmental stage-dependent expression patterns and their possible roles. As a result, Illumina/Solexa sequencing revealed that the majority of conserved miRNAs showed tissue-specific or developmental stagespecific expression. For instance, can-miR156d-g had the highest expression in root, and had lower expression in leaf, stem, flower and fruit. can-miR164a-b was expressed more abundantly in red fruits (fruit-B, B5 and B10) than in green fruits (fruit-1, 2 and MG) and other tissues. can-miR156a-c especially showed clear developmental stage-dependent expression patterns; the expression level gradually increased from fruit-1 (early stage of green fruit) to fruit-B10 (late stage of red fruit). The expression level of the can-miR168 family was higher in leaf and lower in other tissues. can-miR390 family had low expression in both green and red fruits, compared to other tissues. However, the expression level of a few miRNAs were similarly high (e.g., can-miR159 and can-miR166) or low (e.g., can-miR171f-g and can-miR172e) in all tissues.

Identification of novel miRNAs in pepper
For the identification of novel miRNAs, the genome-aligned small RNAs, exclusive of conserved miRNAs, were analyzed according to several criteria; first, only the miRNAs of which precursors were folded into stem-loop structures in hairpin prediction were selected for novel miRNA candidates. Precursors of these novel miRNAs had negative folding free energies ranging from 2355.30 to 219.80 according to RNAfold. The average free energy of these novel miRNAs was about 2104. 85, much lower that of other plant miRNA precursors (259.5 kcal/mole in A. thaliana and 271.0 kcal/mol in O. sativa). We next applied baseparing criteria [20] between the miRNA and the other arm of the hairpin; (1) mismatched miRNA bases are four or fewer, (2) asymmetric bulges are two or less in size, and (3) one or less in frequency. In addition, we investigated whether these novel miRNA candidates contained both miRNA and miRNA* sequences in our sequencing libraries since the presence of miRNA* is strong evidence of precise biogenesis [20]. Finally, miRNAs showing high expression levels with a total frequency of 1,000 or higher were chosen to be studied for more strict identification of novel miRNAs. As a result, we identified 50 novel miRNAs that belong to 35 families ( Table 3). The largest family was can-miR-n019 with four members, and most of the novel miRNAs were mapped to a single locus in the pepper genome,   contrary to multiple loci of conserved families ( Table 3). Some of the representative secondary structures of their precursors were shown in Figure 3 and the full list of secondary structures was provided in Dataset S2. Classification of a large number of miRNAs from highthroughput sequencing data is usually faced with difficulty in some cases where multiple miRNAs accumulate from the same precursor. For example, there is a case where miRNA and miRNA* species are sequenced approximately at the same level, as is the case for ath-miR832 [29]. In the present study, likewise, can-miR-n009-5p and can-miR-n009-3p accumulated to approximately equal levels in most tissues examined, except the leaf ( Table 3).
The sequencing frequencies of these novel miRNAs also showed that their expression had clear tissue-specific or developmental stage-specific patterns. For example, can-miR-n001 was sequenced in more than 1,000 reads in stem, root and flower, but not sequenced in leaf. can-miR-n002 was most abundantly in leaf (70,358 reads; the highest frequency of novel miRNAs in our libraries), and was moderately found in stem and root and was lowest in flower and fruit. can-miR-n026 had a similar expression pattern; abundant expression in leaf, moderate expression in stem and root, and low expression in flower and fruit. In contrast, can-miR-n006 was expressed abundantly in fruit and rarely in leaf, stem, root and flower. However, several novel miRNAs had similar expression patterns in all tissues (e.g., can-miR-n004), similar to some of the conserved miRNA families such as can-miR159, 160, 166, 171 and 172.

Expression patterns of miRNAs in pepper
It has been reported that a large number of miRNAs in plants have tissue-specific and developmental stage-specific expression patterns [26,[30][31][32], some of which might have a wide range of crucial roles during development and stress adaptation [3,33]. Since it has been reported that there were biases inherent in nextgeneration sequencing (NGS) technologies [34], northern blot analysis was expected to reveal more accurate in vivo expression patterns. Furthermore, detection of miRNAs by northern blot analysis could provide more direct evidence for their expression. Therefore, in this study, we performed northern blot analysis to observe tissue-specific or developmental stage-specific expression patterns in different tissues; leaf (L), stem (S), root (R), inflorescence (I), fruit (F) and seedling (Se).
Among the 29 conserved miRNA families, 19 miRNA families were tested and 15 were detected as discrete bands ( Figure 4). Four miRNA families (can-miR162, can-miR390, can-miR408, and can-miR477) were not detected by northern blot analysis. Expression levels of these miRNAs appeared not high enough to be detected by northern blot analysis [35]. As a result, tissuespecific and developmental stage-specific expression patterns were  Table 2. Conserved miRNAs in pepper.   observed for some of conserved miRNAs. For instance, can-miR156 was expressed abundantly in the seedling, moderately in the root, inflorescence and leaf, but weakly in the stem and fruit ( Figure 4A). Similarly, tissue-specific or developmental stagespecific expressions were observed in can-miR164, can-miR166, can-miR167, can-miR394, can-miR395, can-miR396, can-miR398, and can-miR530 ( Figure 4C, D, E, I, J, K, L and N). The expression level of can-miR164 was high in the stem, root, inflorescence and fruit, and low in the leaf and seedling ( Figure 4C). can-miR166 and can-miR396 were expressed lowly in the leaf and inflorescence, respectively ( Figure 4D, K). can-miR167 was expressed highly in the leaf, inflorescence and fruit and lowly in the stem, root and seedling ( Figure 4E). The expression level of can-miR394 was higher in stem than in other tissues ( Figure 4I). can-miR398 and can-miR-530 accumulated in leaf ( Figure 4L and N), and can-miR395 was exclusively expressed in the inflorescence ( Figure 4J). However, can-miR159, can-miR168, can-miR171, can-miR319, and can-miR403 had no tissue-specific or developmental stage-specific expression patterns (Figure 4B, F, G, H, and M). They appeared to be expressed ubiquitously in all tissues. Some miRNAs, including can-miR167, can-miR319 and can-miR403 appeared as a doublet or triplet on northern blot analysis in all or part of the tissues. The expression profiles obtained by northern blot analysis usually agreed with the sequencing data, as in the cases of can-miR159, can-miR167, can-miR168, can-miR171, can-miR394, can-miR395, can-miR396, can-miR403 and can-miR530 ( Figure 4B, E, F, G, I, J, K, M and N, Table 1). However, we observed a discrepancy between northern blot analysis and sequencing frequencies although it was restricted to one or two tissues. Specifically, can-miR156 and can-miR398 were expressed more abundantly in leaf than stem although they were sequenced much less in leaf than stem ( Figure 4A and L, Table 1). This reverse expression pattern between northern blot and sequencing data was observed for can-miR166 expression in leaf and root ( Figure 4D, Table 1). In addition, can-miR164 expression showed a significant difference between leaf and root in spite of nearly same read number, and can-miR319 expression in leaf was as abundant as in other tissues despite its much lower sequenced frequency than other tissues ( Figure 4C and H, Table 1).
Subsequently, the expression patterns of novel miRNA families were analyzed by northern blot analysis. On the whole, 19 out of 35 novel miRNA families were subjected to northern blot analysis, and all of them (19) were detected. Representative results of the northern blot analysis for 15 novel miRNAs were shown in Figure 5. Overall, the expression level of many miRNAs was similarly high or low in all tissues; can-miR-n003, can-miR-n013, can-miR-n016, can-miR-n027, can-miR-n030, can-miR-n032 and can-miR-n033 ( Figure 5E, M, C, O, D, G and P). Although these seven miRNAs did not show tissue-specific or developmental stage-specific expression patterns, the sequencing frequencies from different libraries suggested that four miRNAs (can-miR-n016, n027, n030 and n032) showed apparent tissue or developmental stage-dependent expression patterns (Table 3). In contrast, eight other miRNAs had tissue-specific or developmental stage-specific expression patterns. Specifically, can-miR-n002 was expressed abundantly in leaf, root and seedling, and weakly in stem, inflorescence and fruit ( Figure 5A). can-miR-n026 was expressed ubiquitously in all of the tissues, but was especially high in fruit ( Figure 5F). can-miR-n004 was more highly expressed in stem, root and inflorescence than leaf, fruit and seedling ( Figure 5H). can-miR-n005 was expressed highest in root, moderately in stem and seedling, and lowest in leaf ( Figure 5I). can-miR-n006 was abundantly expressed in leaf, stem, root and seedling, but not Table 2. Cont.  Table S5.
doi:10.1371/journal.pone.0064238.t002 Table 3. Novel miRNAs in pepper. detected in inflorescence and fruit ( Figure 5J). The expression level of can-miR-n007 was higher in stem and root, and lower in leaf, inflorescence, fruit and seedling ( Figure 5K). The expression of can-miR-n010 was higher in leaf, stem, root and inflorescence, and lower in fruit and seedling ( Figure 5L). can-miR-n017 was expressed moderately in stem and inflorescence, and slightly in leaf, root, fruit and seedling ( Figure 5N). In the case of can-miR-n013 and can-miR-n030, a doublet was observed in all or part of the tissues (Figure 5M and D). Most of evolutionarily young miRNAs, such as species-specific or non-conserved miRNAs, have been reported to be expressed weakly [36,37] whereas highly conserved miRNAs are expressed at a higher level [29,36,38]. This was observed for several novel miRNAs, which showed weak expression in northern blot analysis although they were selected by strict criteria, including a high frequency of 1,000 or higher. In contrast, some of the novel miRNAs were expressed at a high level in northern blot analysis as well ( Figure 5A, H, I, K, L, F and P), suggesting that they could play more roles in pepper-specific biology than the other weakly expressed novel miRNAs. In some cases, there were inconsistencies between expression levels obtained by Illumina/Solexa sequencing and northern blot analysis. As mentioned above, it is possible that sequencing technology could cause biases for certain sequences. Another possibility is that the probes used for northern blot analysis might have captured heterogeneous miRNAs.

Predicted targets of miRNAs in pepper
Assuming that plant miRNAs have a perfect or nearly-perfect match to their target mRNAs, FASTA searches were performed to predict targets of conserved miRNAs, and the potential targets were chosen by a position-dependent penalty scoring system in which predicted targets with penalty scores of four or less were selected. As a result, we identified 334 potential target genes from 26 out of 29 conserved miRNA families (Table S1), and the representative targets for each miRNA family are listed in Table 4.
In general, most of the targets predicted for conserved miRNAs in pepper were transcription factors ( Figure 4). For instance, predicted targets such as SQUAMOSA promoter binding proteins (SBP), auxin response factors (ARF), growth regulating factors (GRF), no apical meristem (NAM) and MYB family belong to transcription factors (Table 4). Therefore, similar to other plants [3], pepper miRNAs appear to play a significant role in plant development and growth.
We were able to predict the functions of miRNAs in pepper based on the functions known in other plants; we found most of the predicted targets of conserved miRNAs in pepper also had a conserved function with miRNA targets in other plants. The miRNA families having conserved targets were: can-miR156, can-miR159, can-miR160, can-miR164, can-miR167, can-miR171, can-miR172, can-miR319, can-miR394, can-miR395, can-miR396, and can-miR482 (Table 4; See Table S1 for total predicted target list). For example, SBP transcription factors, known to regulate flowering time in plants, were previously reported to contain complementary sequence of miR156 in other plants [39]. Our results suggested that SBP transcription factors can be considered as the targets of can-miR156 as well. MYB transcription factors, which are known to regulate meristem formation and seed development [39][40][41][42], are conserved targets of miR159/319 families. We identified homologous transcripts of MYB genes in pepper, which were considered as the target of can-miR159 and can-miR319 families. ARF genes, which are known to play important roles in plant development as activators or repressors of auxin-responsive transcription [43], are conserved targets of miR160/167 families in Arabidopsis [39,40]. Our results Table 3. Cont.  also indicate that ARFs are presumed to be targeted by the can-miR160 and can-miR167 families. The miR482-mediated regulation of NBS-LRR disease resistance proteins, recognizing specific pathogen effectors and trigger resistance responses, are conserved in several plant species [44][45][46]. Our results suggest that NBS-LRR disease resistance proteins in pepper were also presumed to be targeted by can-miR482 families. Similarly, many other targets, including GRF, F-box, TCP transcription factors, and NAM are potential targets of can-miR396, can-miR394, can-miR319, and can-miR164, respectively. Among 35 novel miRNA families, we identified a total of 57 potential target genes from 19 novel miRNA families (Table 5). Unlike conserved miRNAs, the targets of novel miRNAs were not enriched in transcription factors; only a target, WRKY transcription factor, was predicted. The largest group of predicted targets was F-box family proteins, predicted to be targeted by can-miR-  n002 and can-miR-n005, both of which showed similar expression patterns in terms of high expression levels in root and seeding (see Figure 5). The second largest group was composed of the following two groups: (i) disease resistance-related genes such as verticillium wilt disease resistance proteins, late blight resistance proteins and NBS-LRR type disease resistance proteins; (ii) protein kinase genes. Among the protein kinase group, leucine-rich repeat protein kinases were known to be critical components of PTI (PAMP-triggered immunity), one of the plant immune systems triggered by pathogen-associated molecules [47,48]. Therefore, miRNAs predicted to target these kinases may also be involved in the immune system of pepper through cooperation with other miRNAs related to regulate disease resistance gene expression in times with and without pathogen infection. The fourth largest group was GDSL-lipase like chlorogenate-dependent caffeoyltransferase precursors, known to have multifunctional properties and be related in lipid metabolism [49].

Validation of miRNA-directed target cleavage
It is necessary to experimentally validate whether computationally predicted targets are actually regulated by miRNAs in pepper by means of miRNA-directed cleavage of these targets because plant miRNAs regulate their target genes mainly by cleaving them [3,15]. Therefore, 59 RACE, universally used for detecting miRNA-directed target cleavage, was carried out for some of the predicted targets of conserved miRNAs; SBP, ARF, NAM, F-box, MYB, Argonuate, TCP, and sulphate transporter ( Figure 6). We noticed that one of the SBP transcription factor was targeted only by can-miR156d-g ( Figure 6A), whereas another SBP transcription factor was targeted both by can-miR156a-c and can-miR156d-g ( Figure 6B). In addition, two cleavage sites of the SBP transcription factor were mapped on the mRNA target, owing to a single length difference between can-miR156a-c and can-miR156d-g ( Figure 6B). Likewise, the miRNA-directed target cleavages of ARF, NAM, F-box, MYB, Argonuate, TCP, and sulphate transporter were validated as well ( Figure 6C, D, E, F, G, H, I, J and K).
We also carried out 59 RACE analysis for many of the novel miRNA targets. Of the 57 predicted targets of novel miRNAs, 26  targets were tested. Five of them were validated, but most of the remaining targets (21) could not be (Table S4). Overall, we noticed that most of the predicted targets of novel miRNAs had highpenalty scores (49 out of 58, 86% had penalty scores of 3 or 4) (i.e., low sequence complementarity to miRNAs), especially for targets which could not be validated. On the contrary, the validated targets generally had lower penalty scores (i.e., higher sequence complementarity to miRNAs) ( Table S4). The novel miRNAdirected target cleavage was validated for F-box family proteins and the GDSL lipase-like chlorogenate-dependent caffeoyltransferase precursor (Figure 7). F-box family proteins are known to play a role in protein degradation as one component of SCF complex [50] and also associated with regulation of cell cycle and signal transduction [51]. Among the predicted novel miRNA targets, F-box family proteins were the largest group (Table 5) and were predicted to be targeted by can-miR-n002 and can-miR-n005, both of which had a similar expression pattern; abundant expression in root and seedling ( Figure 5). Additionally, F-box family proteins share higher sequence complementarity (i.e., low penalty score) to can-miR-002 and can-miR-005. Consequently, we obtained positive results of four F-box family proteins ( Figure 7A, B, C and D). GDSL lipase-like chlorogenate-dependent caffeoyltransferase precursor, another validated target, had a single nucleotide bulge on can-miR-n026, but the miRNAmediated cleavage was still observed ( Figure 7E).
In summary, we experimentally validated that conserved and novel miRNAs in pepper negatively regulate the expression of many transcription factors (SBP, ARF, MYB, NAM, F-box and TCP), small RNA biogenesis protein (AGO), sulphate transporter and GDSL lipase-like chlorogenate-dependent caffeoyltransferase precursor by the miRNA-directed cleavage mechanism (Figure 6 and 7).

Predicted novel targets of conserved miRNAs
In our target prediction analysis, some putative target genes of conserved miRNAs were assumed to be a pepper-specific. Unlike conserved targets, the novel pepper-specific targets were not enriched in transcription factors, but included mRNAs encoding the gypsy/Ty-3 retroelement polyprotein, splicing factor, DNA methyltransferase, helicase protein, and copper-transporter, indicating that the corresponding novel targets of conserved miRNAs may be involved in specific biological processes in pepper (Table  S1).  Among them, one target gene is worth describing specifically. Some DNA methyltransferase genes were predicted to be targeted by miR396 in pepper, and these putative target genes were validated by the 59 RACE assay (Figure 8). In order to make a detailed analysis of these target genes, the predicted target protein sequence was submitted as a query in the HHpred server (http:// toolkit.tuebingen.mpg.de/hhpred) to search for a putative domain [52]. We identified a C-terminal catalytic domain showing sequence similarity to mammalian DNA methyltransferase 3A (DNMT3) and unique N-terminal ubiquitin associated (UBA) domains which are specifically present in DRM methyltransferase in plants (Figure 8). Since DRM methyltransferase is the plant ortholog of DNMT3 and has a unique N-terminal UBA domain [53], we assumed that this is the predicted target of a DNA methyltransferase belonging to the DRM protein family, the principle de novo methyltransferase in plants.
Furthermore, we performed a BLAST search against other plants genome databases to investigate patterns of conservation and phylogenetic relationships among many plant species. As a result, we found that protein sequences for DRM methyltransferase were highly conserved (shown as red in Figure S1 and closely related to other plants ( Figure S2). We also found that miR396 families are extensively conserved through many other plant species ( Figure S3A). Therefore, total RNAs from tobacco, tomato, potato and pepper were subjected to northern blot analysis to assess expression of miR396. Notably, we found that expression of miR396 was higher in tomato and pepper than that in tobacco and potato ( Figure S3B).
In particular, with the help of computational target prediction, we were also able to identify some DRM methyltransferase genes sharing high sequence complementarity to the miR396 in other plant species, including tomato, potato and tobacco, implying the possibility that miR396 targeting is conserved in Solanaceae ( Figure S3C). To see whether the function of miR396, which targets DRM methyltransferase genes, is conserved in Solanaceae, we experimentally tested miR396-directed cleavage of putative targets using the 59 RACE assay. As a result, we found that similar miR396 -DRM methyltransferase pathway should exist in tomato ( Figure S3D), but we failed to amplify a major PCR product as a diagnostic for miR396-directed cleavage in tobacco and potato (data not shown). As mentioned above, the northern blot analysis revealed that the expression of miR396 was higher in tomato and pepper than in tobacco and potato. Moreover, DRM methyltransferase genes shares higher sequence complementarity to miR396 (i.e., lower penalty score) in pepper and tomato than those in tobacco and potato. These might only explain the obvious reasons why DRM methyltransferase genes were only targeted by miR396 in pepper and in tomato. However, these observations still require further investigation to determine why miR396 shares conserved targets in pepper and tomato, but not in tobacco and potato, all of which belong to Solanaceae.
Taken together, at least in pepper and tomato, we assumed that the miR396 family members are possibly involved in de novo cytosine methylation, which is tightly linked to transcriptional silencing of repetitive sequences, including transposons and other mobile DNA elements.

A potential link between miRNA expression and fruit development
We noticed that some miRNAs, whose targets were validated experimentally in this study, exhibited prominent changes in expression levels (based on the normalized sequencing reads) during fruit development stages (i.e., from fruit-1 to fruit-B10). In particular, it is worth pointing out that the accumulation of can-miR156a-c substantially increased during the fruit development of pepper ( Figure 9A). In Arabidopsis, miR156 regulates SBP transcription factors, and thus, modulates the vegetative phase transition through its temporal expression [103]. Similarly, among the novel miRNAs, the expression of can-miR-n002a-c, whose target was validated as the F-box protein, was up-regulated upon development from early stage to late stage of fruits ( Figure 9B).
To elucidate the potential regulatory roles of these miRNAs in pepper fruit development, we performed qRT-PCR analysis of the target mRNAs, including the SBP-transcription factor and F-box protein. As a result, we found that expression of these two target mRNAs gradually decreased in general during fruit development and hence was negatively correlated with the expression of their corresponding miRNAs ( Figure 9A and 9B). The validation of miRNA-directed cleavage of these target mRNAs, combined with the results of qRT-PCR analysis, likely suggests that some miRNAs in pepper may play a role in fruit development.   Tomato has emerged as a useful model for fleshy fruit development and climacteric fruit ripening [54]. In tomato, the SBP-box gene residing at the colorless non-ripening (CNR) locus, which plays vital roles in fruit ripening [55] was shown to be targeted by miR156 [56]. The over-expression of miR156 resulted in down-regulation of CNR expression and caused the red fruit color lighter than wild-type [57], suggesting that the expression of the CNR mRNA is, in part, under the control of miR156 expression. In pepper, the SBP-CNR mRNA, a homologous gene to tomato, was predicted to be a target of miR156 as well (Table  S1), but we did not find negative correlation of the expression levels between miR156 and SBP-CNR mRNA (data not shown).
The two F-box genes, EBF1 and EBF2 were shown to function in ethylene response by regulating the ubiquitin 26S proteasomedependent EIN3/EIL turnover [58], and in tomato, co-silencing of both SI-EBFs (EBF-1 and -2) resulted in ethylene-associated phenotype [59]; however, none of the miRNAs has previously been linked to these members of the F-box family so far, which is consistent with our miRNA target prediction results. Nevertheless, our results here provide a possible link that other members of SBPbox or the F-box family are under the control of miRNAs during the fruit ripening in pepper.

Discussion
The study of plant miRNAs from non-model species and their functional roles are still in their infancy. However, many recent studies have demonstrated that plant miRNAs are involved in a variety of functional roles in developmental and morphogenetic processes [60][61][62][63]. Even though pepper is the most cultivated spice and one of the most economically significant vegetable crops worldwide, systematic study of pepper miRNAs has been reported scarcely. A previous study of pepper miRNAs using an in silico approach [13] was able to identify a very limited number of miRNAs. Many previous studies employed expressed sequence tag (EST) analysis approaches to identify miRNAs from other species [64,65], especially for species whose genomes were poorly understood. There are often a limited number of ESTs available in databases, and therefore a great proportion of protein-coding genes are missed, which makes effective computational prediction of miRNAs and their target genes difficult. Here using draft genome sequences, we have extensively identified a large number of miRNAs in pepper from 10 different libraries. This work therefore provides the first reliable draft of the pepper miRNA transcriptome. We successfully differentiated our miRNAs from other small RNAs by extensive investigation of pre-miRNA foldback structures. From the 10 different libraries, we found that many pepper miRNAs exhibited tissue-specific expression, as is commonly inferred [66]. We further validated and observed these patterns in the northern blot analysis as well.

Conserved miRNAs in pepper
In Arabidopsis, ath-miR156, which is involved in floral development and phase change by targeting SBP transcription factors, is one of the most abundantly expressed miRNAs [24]. Expression of can-miR156 reached its highest level at the seedling stage as previously reported in other plants [67][68][69], suggesting it regulates juvenile-to-adult phase transition in pepper. Previous studies suggested that over-expression of ath-miR156 affected phase transition from vegetative growth to reproductive growth [70][71][72], including a decrease in apical dominance [70,73,74], which is in full agreement with our expression analysis where can-miR156 was barely expressed in stem and fruit. The 59 RACE assay confirmed that some members of SBP transcription factor families are regulated by the miRNA-directed cleavage mechanism, depending on sequence specificity of can-miR156. Similar to can-miR156, can-miR164 was temporally regulated because its expression was low in the seedling and high in adult tissues except for the leaf. Expression of ath-miR164 negatively regulates cell death by repressing the NAC transcription factor ORESARA1 (ORE1), a positive regulator of cell death [75,76]. In addition, decreased expression of ath-miR164 leads to increased expression of ORE1 with leaf age [76]. This correlates with our expression analysis where the expression of can-miR164 was barely detectable in adult leaves, suggesting an analogous role in repressing ORE1 in pepper. ath-miR166, which targets the HD-ZIP gene family, is conserved in many land plants [77]. In Arabidopsis, over-expression of ath-miR166 resulted in seedling arrest and diverse phenotypic changes, especially in floral structure and in shoot apical meristem (SAM) formation [78,79]. We found that can-miR166 was highly expressed in flower followed by stem and root, and weakly in the seedling and leaf which was closely associated with their over-expression consequences.
In Arabidopsis, ath-miR159 and ath-miR319 share 17 identical nucleotides in their 21 nt mature miRNA sequences, and they both play significant roles in plant development, morphogenesis and reproduction [80,81]. Computational prediction suggests that ath-miR159 and ath-miR319 might potentially regulate both MYB and TCP transcription factors [80,81] owing to their sequence similarity. However, when studied in vivo, over-expression of ath-miR159 specifically down-regulated MYB mRNA expression, and over-expression of ath-miR319 specifically downregulated TCP mRNA expression in Arabidopsis [82]. Similarly, in pepper, can-miR159 and can-miR319 are seemingly related due to their similarity in mature miRNA sequences. The 59 RACE assay confirmed that can-miR159 and can-miR319 specifically regulate MYB transcription factors and TCP transcription factors, respectively. This result is consistent with those reported for Arabidopsis. Expression of miR159 is abundant in all tissues and widespread over the whole plant [67]. Similarly, we found expression of can-miR159 is stably expressed in all tissues in pepper as well. However, miR319 is known to be expressed at much lower level and restricted to certain tissues and specific developmental stages in other plants [67], which seems to be different from can-miR319 whose expression is consistently high in all of the tissues tested.
A previous study demonstrated that miR395 was induced under sulfate starvation conditions [83]. Several genes, including sulfate transporter and ATP sulfurylases (APS), both of which are involved in the sulfur assimilation pathway, are regulated by miR395 in Arabidopsis [84,85]. From our target prediction analysis, can-miR395 was also predicted to target sulfate transporter and APS. We tested the can-miR395-directed cleavage of sulfate transporter by the 59 RACE assay. We found that cleavage in the position corresponding to 9 th and 10 th nucleotides within their complementary site was clearly detected while no other cleavage product was found. Although miRNAs are generally known to induce cleavage of their target mRNA between 10 th and 11 th nucleotides, non-canonical cleavage sites have also been reported in plants [24,83,86] and in mammals [87]. Interestingly, a recent study of miR395 in Arabidopsis reported the miR395-directed cleavage of the APS3 mRNA in the position corresponding to 9 th and 10 th nucleotides in their complementary site [84], which is similar to our 59 RACE analysis.

Novel miRNAs in pepper
Many earlier studies repeatedly reported that non-conserved miRNAs are generally less abundant, more divergent, processed less precisely, and tend to lack of experimentally verifiable targets, suggesting that most are neutrally evolving [88][89][90]. Therefore, we expected that identification and characterization of novel miRNAs in pepper might be challenging, as there were many things to consider.
The variation in mature miRNA sequences, especially that of 59 terminal nucleotides, often showed differential Argonaut protein association [91,92], and hence the mature miRNA sequence itself is now considered to be an important determinant of proper sorting into functional Argonaut complex. Therefore, the imprecise excision of the mature miRNAs from the stem-loop precursor could produce a functionally unstable miRNAs. For these reasons, we employed very stringent standard for the identification of novel miRNA candidates, as followed: (1) one of the proofs of miRNA biogenesis; the presence of the miRNA star sequences, forming a miRNA/miRNA* duplex with two nucleotides, 39 overhang; (2) the presence of reliable stem-loop structures with proper folding free-energy; (3) minimal in size and frequency of asymmetric bulges, along with four or fewer mismatched miRNA bases within miRNA/miRNA* duplex; (4) highly expressed small RNAs with frequency of 1,000 or higher in at least one or more from 10 different libraries examined.
As a result, we were able to discover a total of 47 highly reliable novel miRNAs candidates from various pepper tissues. These novel miRNAs had many predicted targets; consequently we performed a 59 RACE assay to reveal miRNAs involved in pepper-specific biology, but most of them could not be validated, which are consistent with previous studies, where most of the nonconserved miRNAs had no experimentally supported targets [9,93], in contrast to the high validation rate of conserved miRNA targets. There are several possible explanations for this observation. (1) The novel miRNA targets that gave negative results upon 59 RACE analysis, other than technical failures, could be false positive prediction owing to the limited genome annotation currently available in pepper, which was the major hindrance to high confidence prediction of miRNA targets. (2) The nonconserved miRNAs and their target genes are not co-expressed in the same tissue during development [88]. (3) The abundance of novel miRNAs is substantially lower than that of conserved miRNAs [36,37], as is observed in our case as well. In addition, we noticed that most of the predicted targets of novel miRNAs had high-penalty score; low sequence complementarity. (4) The nonconserved miRNAs might function mainly through translational inhibition [4,5], which is also supported by the fact that the level of non-conserved miRNA target transcripts were largely unchanged in miRNA biogenesis mutants [22]. (5) Some of the putative novel miRNAs are not bona fide miRNAs. Although we provided one of the major proofs of biogenesis where the perfect miRNA star sequences were found, we still did not demonstrate DCL1 dependency of these novel miRNAs because a dcl1 mutant is not yet available for pepper. To conclude, while none of these possibilities can be ruled out at this point, it is currently difficult to explain the lack of experimental verifiable targets of the majority of non-conserved miRNAs. The other possible regulatory mechanisms behind this still remain to be elucidated.
can-miR396-directed cleavage of DRM methyltransferase Recently, DNMT3 was defined as a potential target of hsa-miR143 in colorectal cancer cells [94]. Also in plants, methyl-transferase2 (MET2), another class of DNA methytransferase, was suspected to be a potential target of ath-miR773 in the mature pollen of Arabidopsis, although ath-miR773-directed cleavage of MET2 could not be validated [95]. Interestingly, in pepper and tomato, we were able to validate miRNA-directed cleavage of DRM methyltransferse, an ortholog of the mammalian methyltransferase DNMT3, which is responsible for the de novo methylation in all sequence contexts [96,97]. In higher eukaryotes, DNA methylation is a conserved epigenetic mechanism which plays a vital role in chromatin organization [97]. Another layer of complexity in gene regulation is added by the fact that DNA methylation in plants is reflected in the complicated interplay between DNA methylation and RNAi as well as histone methylation.
Furthermore, transcriptional gene silencing via RNA-directed DNA methylation (RdDM), in which DNA with sequence identity to heterochromatin siRNA is de novo methylated at its cytosine residues, is also present in plants [96,98]. Previous studies have reported that DRM methyltransferase is not only required for de novo cytosine methylation but also for establishment of RdDM [53,96] whose main function is to suppress the proliferation of transposons to maintain genome stability. Given the importance of DRM methyltransferase as a component of RdDM, our results raise an intriguing possibility that some miRNAs in pepper may be involved in silencing of transposons and other repetitive DNA elements. Most regions of the pepper genome are very rich in constitutive heterochromatin which consists mostly of repetitive DNA sequences and transposable elements [14,99,100]. Through our high-throughput sequencing data, we found that 24 nt siRNAs, which can be produced from heterochromatin repetitive sequences, account for about half of the small RNA transcriptome in pepper (See figure 1). Such a high percentage of 24 nt siRNAs, which are known to be involved in the RdDM pathway and histone modification of repetitive sequences and transposable elements [101,102] as well as miRNA involvement in DRM methyltransferase, might altogether reflect the functional complexity of the pepper genome.

Conclusion
In summary, this is the first study to identify conserved and nonconserved miRNAs in pepper, to analyze their expression in different tissues and to predict their targets, many of which were experimentally validated. Among them, our new finding of miR396-directed cleavage of DRM methyltransferase opens a new avenue in the field of epigenetic regulation in the complex pepper genome. Using high-throughput sequencing approaches, our work is the first to provide a comprehensive view of pepper miRNAs and their targets, establishing a foundation for future studies of miRNAs in pepper and Solanaceae.

Supporting Information
Dataset S1 Full list of hairpin structures in conserved miRNAs.