Small RNA Profile in Moso Bamboo Root and Leaf Obtained by High Definition Adapters

Moso bamboo (Phyllostachy heterocycla cv. pubescens L.) is an economically important fast-growing tree. In order to gain better understanding of gene expression regulation in this important species we used next generation sequencing to profile small RNAs in leaf and roots of young seedlings. Since standard kits to produce cDNA of small RNAs are biased for certain small RNAs, we used High Definition adapters that reduce ligation bias. We identified and experimentally validated five new microRNAs and a few other small non-coding RNAs that were not microRNAs. The biological implication of microRNA expression levels and targets of microRNAs are discussed.


Introduction
Small non-coding RNAs (sRNAs) with the size of 20-24 nucleotides (nts) that are generated by one of the Dicer-like proteins function in transcriptional or post-transcriptional regulation of gene expression [1]. Plant sRNAs show high diversity and one of the best characterized group of sRNAs is the microRNAs (miRNAs), which are mainly 21 nts although some are 20, 22 and 24 mers. Some miRNAs are conserved across all plant species but many newly identified miRNAs are species or family specific. In addition to miRNAs, there are several groups of small interfering RNAs (siRNAs) involved in the regulation of gene expression such as heterochromatic small interfering RNAs (hc-siRNAs) that are usually 24 mers [2], trans-acting small interfering RNAs (ta-siRNAs) that are 21 mers [3] and anti-sense origin siRNAs that are produced from overlapping anti sense transcripts [4]. sRNAs are involved in diverse biological processes including developments and responses to environmental changes [5] thus it is important to characterize sRNAs in non-model but economically important plants.
Bamboo is one of the most important forest resources and belongs to the grass family. There are more than 1500 species of bamboo within 87 genera [6]. Bamboo has relatively long period of vegetative growth varying from a few years to decades, which makes it hard for traditional genetic improvement. The advance in next generation sequencing (NGS) technology opens new opportunities for studying the molecular biology of bamboo. The genomic sequence of Moso bamboo (Phyllostachy heterocycla cv. pubescens L.) has been reported recently and is becoming a valuable resource [7]. Moso bamboo is a large fast-growing woody bamboo and a major bamboo species for timber and food production in Asia. mRNA and miRNA analysis has been carried out using developing culms during fast growing stage [8], where RNA-seq libraries were constructed using the Illumina method. Many conserved miRNAs were found and putative new miRNAs were predicted. However, their expression levels were not confirmed experimentally. In addition, some miRNAs were found in the leaf tissues of ma bamboo (Dendrocalamus latiflorus L.) [9], but the genomic sequence is not available for this species therefore the analysis could not be completed.
High throughput sRNA studies rely on NGS results; however, different NGS platforms often produce different sRNA profiles [10]. The difference has been attributed to ligation bias due to the preference of RNA ligases to join molecules (in this case sRNAs and adapters) that can anneal to each other and form a structure favoured by the ligase [11][12][13]. Therefore libraries generated with different adapters produce different sRNA profiles. Since different NGS platforms or even different versions of the library generation kit for the same platform (such as Illumina v. 1.0, v. 1.5 and TruSeq) contain different adapters, these platforms and kits produce different sRNA profiles because they all rely on adapters with fixed sequences that determine the ligation bias. Adapters with degenerated nucleotides at the ligating ends (called High Definition (HD) adapters) can reduce the ligation bias because sRNAs can anneal to a pool of different sequences instead of a single adapter sequence [12,13]. sRNA libraries generated with HD adapters were found to recover more different sRNA sequences and the abundance of a sRNA sequence read correlated more quantitatively with the real expression level [13]. Here, the HD adapters were used for cloning the sRNAs in the leaves and roots of moso bamboo seedlings. The sRNA profiles were analysed and the expression of the new miRNAs with expressed miRNA* sequences were confirmed by Northern blot analysis.

Plant materials
The seeds of Moso bamboo were soaked in 0.4% KMnSO 4 solution for 6 hrs, rinsed with distilled water five times, and further soaked in distilled water overnight. The soaked seeds were planted in moist vermiculite in a closed container and kept at 25uC under light with intensity of 100-200 mmol?m 22 ?s 21 for two weeks followed by removing the cover film of the container so that the seeds were kept in the same container for about 2 months under the same growing conditions. The seedlings were transferred to pots with turfy soil and grown under the same conditions. The leaf, stem and root tissues were harvested and stored in liquid nitrogen when the seedlings were at 5-6 leaf stage.

RNA extraction and small RNA library construction
Total RNA was extracted by using TRI Reagent Solution (Ambion) following the manufacturer's protocol. Small RNA fractions of the leaf and root total RNA samples were further isolated by using mirVanamiRNA Isolation Kit (Ambion) following the protocol provided by the manufacturer. 2 mg of sRNA from each sample was ligated to 39 and 59 HD adapters [13] by using the ScriptMiner Small RNA-seq Library preparation Kit (Epicenter) following its protocol (but replacing the adapters provided in the kit with HD adapters with identical sequences but containing the four degenerated nucleotides HD tag). The ligated products were then reverse transcribed and PCR amplified. The PCR products expected to contain 20-24 bp cDNA inserts were gel-purified and sequenced on the Illumina HiSeq2500 sequencer.

Northern blot analysis
Northern blot analysis was used to confirm the expression levels of known miRNAs and verify the new miRNAs and some other new sRNAs following the previously published protocol (Lopez-Gomollon et al. 2012). Briefly, 5 mg of total RNA from stem, root and leaf tissues were denatured and loaded into denaturing 16% polyacrylamide gel. The RNA was transferred to Hybond NX (Amersham) nylon membrane through semi-dry electrophoresis transfer system (Bio-Rad), and chemical cross-linking was done at 60uC for 90 minutes by using 1-ethyl-3-(dimethylaminopropyl) carbodiimide (Sigma). The probes were generated by labelling the oligonucleotides that were reverse complementary to the sRNAs of interest with cP 32 -ATP. The membranes were hybridized with the probes at 37uC overnight. The list of the probe sequences used for northern blot is provided in supplementary Table 1.

Bioinformatics analysis
We used version 1.0 of the bamboo genome [7] and the annotations available on the ICBR-CAFNET website (http://202. 127.18.221/bamboo/down.php). We downloaded the bamboo chloroplast genome, accession NC_015817 from (http:// chloroplast.ocean.washington.edu) and used the publicly available annotations [14]. First, the fastq files from NGS were converted to fasta files and sequence reads with no Ns were kept for further analysis. Next, the first 8 nt of the 39 adapters were identified and removed and then four nucleotides on the 59 and 39 ends of the reads were trimmed (which corresponded to the NNNN tags on the HD adapters; see: Sorefan et al. 2012) using the UEA sRNA Table 1. General Information of the libraries. Workbench [15]. The sequence reads were mapped to the bamboo genome and the bamboo chloroplast genome with 0 mis-matches, in non-redundant format, using PatMaN [16]. The abundance of sequenced reads was normalized to the total number of reads in the sample, as indicated in [17]. The differentially expressed reads were identified using offset fold change as described in [18], using an offset of 20. The miRNAs were identified using the miRCat and miRProf tools within the sRNA Workbench [15] and custom made Perl scripts (Strawberry Perl v 5.18.2.1). The miRNA targets were predicted using the target prediction tool within the UEA sRNA Workbench [15]. The rules used for target prediction are based on those suggested by Allen et al. and Schwab et al. [19,20]. Specifically, no more than four mismatches between miRNA and target (G-U bases count as 0.5 mismatches) are allowed, no more than two adjacent mismatches in the miRNA/target duplex are permitted, no adjacent mismatches in positions 2-12 of the miRNA/target duplex (59 of miRNA) and no more than 2.5 mismatches in positions 1-12 are accepted, no mismatches in positions 10-11 of miRNA/target duplex are permitted. In addition, the minimum free energy (MFE) of the miRNA/target duplex should be . = 74% of the MFE of the miRNA bound to its perfect complement.

sRNA libraries from bamboo leaf and root
About 10 million (M) and 18 M raw reads were obtained from deep sequencing of bamboo leaf and root sRNA libraries, respectively. About 9 M (leaf) and 12 M (root) reads contained the 39 adapter preceded with potential sRNAs that were 17-33 nts ( Table 1). The root library contained ,2.9 M unique sequences while the leaf library was made up of ,1.8 M unique sequences, thus the overall sequence complexities (defined as the ratio of unique reads to all reads) for the two libraries were 16-17%. About 83% and 74% of the total reads in leaf and root library were mapped to the published moso bamboo genome (Peng, 2013 #60). Among the genome matching sRNAs, 5.2% of the reads in leaf library and 3.5% of the reads in root library were mapped to coding region. Distribution of read mapping to other genomic regions is shown in Table 1.
Similar to other plant species, size class distribution was bimodal where the most abundant sequences were 24 mers followed by the 21 mers in leaf library. The root library showed a slightly different size class distribution because the most abundant 24 mers were followed by 31 mers and even the 23 mer sRNAs were slightly more abundant than the 21 mers ( Figure 1a). The 24 mer reads showed the highest complexity while the 21 and 31 mer sequences had very low complexity (Figure 1b and 1c). The majority of transposon mapping sRNAs were 24 mers and 40% of unique 24 mer reads mapped to transposons (Supplementary Figure 1).

Chloroplast mapping reads
The sRNAs in leaf and root libraries mapped to the chloroplast genome exhibit very strong sequence, sequence abundance and distribution pattern differences. We found that 25% of the total reads in leaf library were mapped to the chloroplast genome while this percentage was only 0.3% in root libraries (Table 1). Nearly 50% of the 100 most abundant sRNAs were mapped to the chloroplast genome in leaf library but none in root library. Approximately 70% of the chloroplast genome matching reads were mapped to rRNA and tRNA genes, ,23% of the reads mapped to intergenic regions and 5.6% to protein coding regions. The remaining reads were mapped to the junction sequences crossing coding and non-coding sequences and repeat sequences ( Table 2).
The size class distribution of chloroplast matching reads does not show enrichment for any particular size sRNAs (Supplementary Figure 1), but complexity is very low for the rRNA, tRNA and intergenic region mapping sRNAs ( Table 2). The highly accumulated sRNAs were mapped to a few specific loci, which are shown in the sRNA presence plots for the chloroplast genome ( Figure 2). Many sRNA loci were less than 100 bp away from the coding regions such as atpH, rbcL, psaJ, clpP, psbH, rpoA and rps3, and some were about 100-500 bp away from the coding regions of psbZ, ndhJ, atpE, rpl33, rpl22, rpl23 and rps7. Some of these sequences were the footprints of pentatricopeptide repeat (PPR) or PPR-like protein binding sites that were previously experimentally identified or predicted in barley and Arabidopsis [21].
The most abundant sequence among the chloroplast genome matching sRNAs is a 31 nt long sequence: TATCGAGTA-GACCCTGTCGTTGTGAGAATTC. The first nucleotide of this sRNA maps to a position that is 59 nt upstream of the translational start site of rbcL. In barley, that position is the start transcriptional start site and the first ,30 nts of the 59UTR was the predicted binding site of the barley MRL1-PPR protein [21]. This 31 nt sRNA was further detected by northern blot analysis ( Figure 3) only in leaf tissue and it is most likely a PPR protected RNA fragment.

Known miRNAs
Based on sequence similarity to miRNAs in miRBase (http:// www.mirbase.org/; release 20) [22], 22 miRNA families with normalized abundance above 100 were identified (Table 3). In total, 75 miRNA and 24 miRNA* variants of these families were found (Supplementary Table 2). Among them, three miRNAs miR528, miR444 and miR1432 are likely to be monocot specific [23,24]. These miRNA variants were mapped to the bamboo genome and the sequences of their flanking regions were further investigated for potential hairpin structure. This analysis resulted in 141 precursor loci, out of which 13 had both mature and miRNA* sequences ( Figure S2 and Table S3).

New miRNAs identified in Moso bamboo
In addition to the known miRNAs that have been identified in other species, we looked for new miRNAs that have not been described in any species. We searched the two libraries for potential new miRNAs using miRCat [25]. This algorithm identifies potential hairpin structures at the flanking regions of genome mapping sequences, considers the percentage of miRNA and miRNA* reads out of the total number of reads that map to the hairpin sequence (pre-miRNA sequence) and looks whether the expected miRNA* sequence has also been sequenced. Using the first two criteria, we identified seven potential new miRNAs Table 2. General information of the sRNAs mapped to bamboo plastid genome. with total normalized read value in the two libraries above 100 (Table 4 and Supplementary Figure 3). Since the miRNA* sequences of several conserved miRNAs were not present in our libraries, we included the two potential new miRNAs without sequenced miRNA* (New-6 and New-7) in Table 4 because their miRNA* may be present in future sRNA libraries. However, at present those two cannot be classified as miRNAs. The new miRNA, New-1, had high read numbers in both libraries. New-2 was more abundant in the root library while, New-3, New-4 and New-5 were more abundant in the leaf library. These five new miRNAs were also confirmed by Northern blot analysis and their expression profile matched the expected pattern based on the sequencing results ( Figure 3).

miRNA targets
The database of coding sequences from the published bamboo genome was used as input for the identification of putative targets of bamboo miRNAs. Many mRNAs involved in various biochemical procedures were predicted as targets for conserved miRNAs (Table 3, Supplementary Table 5). The majority of the predicted targets of the conserved miRNAs were also conserved and confirmed in other plant species (Table 3).
phe-miRNA-528, -397, -408, -444 and miRNA-1432 are the most differentially expressed miRNAs between the two libraries with higher accumulation in leaf tissues (Supplementary table 4). Among these, phe-miRNA-528, 444 and miRNA-1432 were found to be monocot-specific so far [23,24]. Their predicted targets did not include the ones that were confirmed in other monocots [24,26,27] (Table 3). Nevertheless the putative targets of the two most abundant miRNAs, miRNA-528 and miRNA-397, include several family members of laccase precursors (Supplementary Table 5). In addition, the predicted targets of miRNA-528 and miRNA-408 include several plastocyanin-like domain containing proteins, multicopper oxidase domain containing proteins and polyphenol oxidase.
Using the current rules and coding sequence annotation, no putative targets were predicted for the new miRNAs.

Other groups of sRNAs
We searched our sRNA libraries to identify the conserved miRNAs that target ta-siRNA precursors such as miRNA-390, miRNA-482 and miRNA-828. However, surprisingly we did not find any of these miRNAs in either of the two libraries. Next we tried to find the MIRNA genes for these miRNAs on the genome using maize miRNA-482, tomato miRNA-828 (because miRNA-828 has not been identified in any monocots so far) and bamboo miRNA-390 [8] sequences. We did not find a potential gene for miRNA-828 but it was expected as it seems to be dicot specific.   The mature maize miR-482 sequence mapped to 435 positions on the bamboo genome (with 1 or 2 mis-matches); therefore we did not investigate whether any of them produces miRNA. However, we were able to identify a potential MIRNA-390 gene. Next we tried to detect phe-miRNA-390 by northern blot analysis but our attempt was not successful (Figure 3), therefore we did not look for ta-siRNAs.
In addition to the sRNAs that were mapped to the wellcharacterized non-coding and coding sequences about 37% of top 100 reads in root library and about 11% in leaf library were mapped to intergenic or intron region. Among them, a series of sRNAs ranging from 30-32 nucleotides were mapped to 14 different loci in the bamboo genome. One of them is the most abundant sRNA in the root library which contributes more than 50% of 31 mer reads. The sum of their reads is about 16% of the total genome matching reads in the root library. Using a probe complementary to the most abundant 31 nt sRNA, Northern blot analysis revealed a very strong signal in the root sample ( Figure 3). Accumulation of this sRNA was also confirmed in stem and leaf tissues.

Discussion
In the present study, we have analysed the sRNA profiles in leaf and root tissues of moso bamboo, one of the most important forest grass. Several groups of sRNAs with known and unknown functions were found in the sRNA libraries constructed by using HD adapters. The accumulation of several sRNAs was verified in the bamboo seedlings by northern blot analysis. The expression patterns obtained by Northern blot analysis were consistent with the sequencing results indicating the excellent quantitative correlation between sRNA read numbers and its actual accumulation in two different tissue samples. Thus, the HD adapters based cDNA libraries appeared to be reliable for high through analysis of sRNA populations in plant tissues.

Complex sRNA population in plant tissues
As in most plant sRNA libraries, 21 mer miRNAs and 24 mer siRNAs were very abundant in the bamboo leaf and root sRNA libraries. However, there were also differences between the two tissues. For example, 25% of the sRNAs in the leaf library were mapped to the chloroplast genome while very few reads in the root library were derived from plastid. There is an obvious difference between plastid development and chloroplast activity between the tissues, which is also reflected at the sRNA level. A large number of sRNAs were mapped to non-coding RNAs such as rRNAs and tRNAs and UTRs or intergenic regions with extremely high redundancy. Many of them were mapped to the PPR binding or predicted PPR binding sites based on results in other plant species indicating a valuable resource for future PPR binding site prediction. The different sizes of sequences or accumulation of these sRNAs in the two libraries may reflect the underlying mechanisms in transcription or translation regulation of plastid originated genes in these tissues.
In most previously published papers on sRNA profiling, the proportion of rRNA/tRNA matching sRNAs has been smaller than what we have found. However, the majority of the total RNA consists of rRNAs and tRNAs are also very abundant. Nevertheless it will be interesting to understand in the future why some sRNAs with high abundance were mapped to specific positions of specific rRNAs or tRNAs. In addition, there are some other highly abundant sRNAs that were mapped to intergenic or intron regions and their accumulation levels also varied in the two tissues. Further study of their origins and functions will help to understand sRNA biology in plants.
However, it is unexpected that none of the known and conserved TAS RNA targeting miRNAs were found or detected in the leaf and root tissues of young bamboo seedlings. miRNA-390 was cloned in the culm libraries of moso bamboo with relatively low normalized read value [8], but not other miRNAs such as miRNA482 and miRNA828. These two miRNAs were not found in rice either.
The miRNA profiles are consistent with their biological functions A set of known conserved miRNAs was found in both libraries in high abundance. Their predicted targets were also conserved or confirmed in other plant species. These conserved miRNAs are known to be important for leaf and root development by  [28,29] while miRNA-164 is also required for vegetative shoot development [30,31]. Along with miRNA-319, miRNA-164 control leaf senescence [32,33]. mir319 also regulates cell proliferation and leaf morphogenesis mediated by the conserved miR396-GRF module [34]. Mir159 is essential for leaf morphogenesis [35,36] and miRNA-167 plays a role in the development of adventitious root in both dicots and monocots [37][38][39]. Some of the above conserved miRNAs and other conserved miRNAs such as miRNA-398, -399, -168 and -162 also respond to environmental stresses and regulate nutrient uptake in both leaf and root tissues [26,[40][41][42][43][44][45]. Although these miRNAs were expressed abundantly in both root and leaf tissues, the variants of some miRNA families exhibited tissue preference, which may be correlated with some tissue specific targets and/or tissue specific regulation under environmental stresses. Some miRNAs exhibit quite dramatic expression difference between the two tissues. For example, phe-miRNA-156 was more abundant in roots particularly one of its variants. Over-expressing of tomato miRNA-156 induces the development of high density of airy roots along the stem [46]. The higher accumulation of bamboo miRNA-156 may be correlated with its dense adventitious roots and airy roots along the stem which help with its growth in poor soil. On the other hand miRNA-528, -397 and -408 were expressed at much higher levels in bamboo leaf tissue. miRNA-528 is also very abundant in rice seedlings and young maize leaf tissues [47,48]. It is expressed in maize shoot apex and leaf primordial and vasculature [49]. Through qRT-PCR, miRNA 397 and miRNA 528 were found to most abundant in the leaves of Pinellia pedatisecta S. [50]. The experimentally confirmed target in rice is Os06g37150 encoding an ascorbate oxidase [27], and IAA-alanine resistance protein 1 (IAR1) gene [26]. One confirmed target in maize is copper/zinc superoxide dismutase [44]. However, none of these genes' homologs were predicted to be the targets of bamboo miRNA-528. Instead, phe-miRNA-528 more likely targets polyphenol oxidase genes including the laccase genes, another putative polyphenol oxidase, a putative peroxidase and plastocyanin-like proteins. In the meantime, miRNA-397 and miRNA-408 were predicted to target different laccases and a plastocyanin-like protein and other polyphenol oxidases, respectively. Both polyphenol oxidases and plastocyanin proteins are copper containing oxidases. miRNA-397 has been confirmed to be involved in cell wall development by down-regulating laccases in Arabidopsis and poplar [51]. Both miRNA-528 and miRNA-397 also respond to diverse biotic and abiotic stresses including low copper availability [26,40,41,43]. Thus these three miRNAs that were preferentially expressed in leaf tissues may work together in regulating stress response and copper homeostasis. For example, they may down regulate the expression of copper-containing proteins to accommodate the optimum availability of copper for photosynthesis, in order to store enough energy for rapid growth.