Transcriptome/Degradome-Wide Identification of R. glutinosa miRNAs and Their Targets: The Role of miRNA Activity in the Replanting Disease

Rehmannia glutinosa, a traditional Chinese medicine herb, is unable to grow normally in a soil where the same species has recently been cultivated. The biological basis of this so called “replanting disease” is unknown, but it may involve the action of microRNAs (miRNAs), which are known to be important regulators of plant growth and development. High throughput Solexa/Illumina sequencing was used to generate a transcript library of the R. glutinosa transcriptome and degradome in order to identify possible miRNAs and their targets implicated in the replanting disease. A total of 87,665 unigenes and 589 miRNA families (17 of which have not been identified in plants to date) was identified from the libraries made from a first year (FP) and a second year (SP) crop. A comparison between the FP and SP miRNAs showed that the abundance of eight of the novel and 295 of the known miRNA families differed between the FP and SP plants. Sequencing of the degradome sampled from FP and SP plants led to the identification of 165 transcript targets of 85 of the differentially abundant miRNA families. The interaction of some of these miRNAs with their target(s) is likely to form an important part of the molecular basis of the replanting disease of R. glutinosa.


Introduction
The herbaceous species Rehmannia glutinosa L (Scrophulariaceae) is of some economic importance because extracts from its tuberous roots are medicinally active [1]. Although the plant is perennial, its productivity declines significantly after the first year [2,3], a syndrome described as`replanting disease'', and thought to be due to the release of autotoxins into the soil either in the form of exudate from the roots or of leachate into the soil from above ground residue left after harvest [4,5]. The phenomenon is referred to more generally as ''allelopathic autotoxicity'' [6].
Gene expression can be regulated at multiple levels. A recently discovered mode of post-transcriptional regulation involves microRNAs, small RNA molecules (averaging typically 21-24 nt in length) able to regulate the expression of specific genes by targeting their mRNAs for degradation [7][8][9][10]. The specificity of a given miRNA's target is achieved via its complementarity with the sequence of its target mRNA. An increasing weight of evidence has shown that miRNAs are intimately involved in many adaptive responses to both abiotic and biotic stress [9,11]. Given that allelopathic autotoxicity represents a form of stress, it is possible that miRNAs are involved in the replanting disease of R. glutinosa.
High throughput sequencing platforms have created numerous opportunities for the characterization of genomes and transcrip-tomes, especially in non-model organisms for which the full genome sequence has yet to be acquired [12][13][14][15][16][17]. The genomic resources available for R. glutinosa are very limited [18]. Here we describe the application of the Solexa/Illumina platform to obtain a global view of the R. glutinosa transcriptome, including the population of small RNAs (sRNAs) and the degradation products of mRNA (the ''degradome'') present. In particular, the focus was to identify novel miRNAs in R. glutinosa by transcriptome, to detect those miRNAs which were differentially abundant in plants suffering from the replanting stress by sRNA sequencing and qRT-PCR, and to identify their target mRNAs by degradome analysis. The purpose was to reveal the possible functions of miRNA and their targets in forming of replanted disease in R. glutinosa.

Transcriptome Sequencing and de novo Assembly Analysis
The root and leaf libraries each comprised ,41 million 75bp paired end raw sequence reads, which were deposited at NCBI under the accession number SRX269425 and SRX269426, and this number was reduced to, respectively, ,38 and 39 million Q20 (base quality more than 20) standard reads by applying a stringent quality check (Table A in File S1). Use of the assembly software SOAPdenovo (http://soap.genomics.org.cn/), developed specifically to process short reads, yielded 99,708 unigenes (mean length 348 bp) from the root and 94,544 (mean length 368 bp) from the leaf transcript library. Root and leaf libraries were further assembled and incorporated in an entire R. glutinosa transcriptome. The removal of partially overlapping sequences finally produced a set of 87,665 unigenes (mean length 554 bp), representing in all 41.82 Mbp of sequence (Table 1, Fig. A in File S2). The entire transcriptome was used to analyze R. glutinosa sRNA and degradome libraries.

Sequencing and Analysis of sRNAs
The two sRNA libraries derived from FP and SP material produced, respectively, 17,723,851 and 18,123,606 raw reads. After removal of low-quality and corrupted adaptor sequences (reads,18 nt), a total of 14,658,978 and 15,649,864 clean reads corresponding to 6,798,635 and 7,917,831 unique reads remained for FP and SP libraries ( Table B in File S1). Of the combined reads, 28.40% was FP-specific, 32.94% was SP-specific and 38.66% was present in both libraries (Fig. B in File S2). The length of most of these small RNAs lay in the range 21-24 nt, with 24 nt molecules predominating (Fig. 1).

Identification of known miRNAs
The sRNA sequences were used to interrogate the relevant GenBank database (http://www.ncbi.nih.gov/Genbank/) using Rfam 10.1 software (http://rfam.sanger.ac.uk/), resulting in the annotation of 0.92% of the FP and 0.62% of the SP unique nonprotein-coding sRNAs (rRNA, snRNA, snoRNA and tRNA). The remained sRNA sequences were then compared to those deposited in miRBase 19.0 (www.mirbase.org/), which on the basis of ,3 mismatched nucleotides produced 546,181 (unique 1,913) hits in the FP library, and 314,312 (unique 1,434) in the SP library. These hits accounted for, respectively, 3.73% and 2.00% of the FP and SP sRNA sequence (Table 2). A total of 768 miRNAs, belonging to 572 already documented miRNA families, was identified across the two sRNA libraries. Based on their conservation, these known miRNAs were classified into three (conserved, non-conserved and Rehmannia-specific) groups (Table C in File S1) [19]. The group I was 27 miRNA families (126 members) for that have already been identified in both monocotyledonous and dicotyledonous species.
Some of the miRNAs, notably miR156/157, miR159, miR172, miR166 and miR167, were highly abundant in both libraries, whereas others were present in much lower abundance. The group II, which exists only in no more than ten plant species (miRBase 19.0, http://www.mirbase.org/), was also detected 539 miRNA families (636 members) from R. glutinosa. Most of these miRNAs were of low to moderate abundance. While the copy number of miR2870, miR2937, miR4248 and miR918 was notably higher in the SP than in the FP library, that of miR1860, miR2123, miR2635 and miR948 was higher in the FP library. Finally, the group III of six Rehmannia-specific miRNAs was identified, all of which were present in low abundance in both libraries.

Identification of Novel miRNAs
After removal of the annotated sRNA and known miRNAs, 13,487,758 (6,733,923 unique sequences) FP and 14,852,155 (7,867,648 unique sequences) SP sequences were remained in the libraries (Table 2). 9,946 (3,956 unique sequences) FP and 9,239 (3,782 unique sequences) SP sequences of these sRNAs were mapped onto the R. glutinosa 87,665 unigenes with a view to identifying their targets on the basis of the presence of the diagnostic hairpin structures flanking sequences [20,21]. 41 unique Table 1. Length distribution of assembled Contig, Scaffold, Unigene and All-Unigene in R. glutinosa.

The Differential Abundance of miRNAs in FP and SP Plants
A comparison of miRNA abundance in the FP and SP libraries revealed statistically different levels for 303 (eight Rehmanniaspecific, 295 known) of the 589 families (p-value ,0.01) ( Table D in File S1). In all, 124 of these were more abundant in FP than in  Table 4). The result showed that miRNA expression levels could be altered in replanted R. glutinosa. A randomly chosen subset of 20 of the differentially abundant miRNAs present in moderate abundance was subjected to validation by qRT-PCR. The estimated abundance from the sequencing outcome and from the qRT-PCR analysis was consistent for 18 of the 20 (the exceptions were miR3269 and miR415a) (Fig. 3).

Degradome Sequencing Analysis
A total of ,14.6 million (FP) and ,21.1 million (SP) raw reads of 39 cleavage fragments was obtained. These were reduced after manual editing to, respectively, ,12.5 and 20.0 million clean reads (Table E in File S1), which were then filtered to exclude any structural RNAs (rRNA, tRNA, snRNA and snoRNA). This step removed only a small proportion (2,873 and 6,756, respectively) of the reads. The remaining reads were then aligned with the R. glutinosa transcriptome. In total, 367,290 (37.39%) and 769,336 (38.36%) unique reads from FP and SP degradome libraries, respectively, could be associated with the transcriptome ( Table 5). The CleaveLand pipeline [22,23] was used to identify degraded targets for each of the miRNA families which differed in abundance between FP and SP. The abundance of each sequence was plotted for each unigene target, and the degradation products were grouped into three categories (I-III) according to their relative abundance (File S4). In all, 165 target mRNAs were identified, involving 85 (including 33 up-and 52 down-regulated in the SP sRNA library) miRNA families (Table F in File S1). Among the miRNA families, 50 targeted a single transcript, and the highest number of targets cleaved by a single miRNA was nine (miR157). One mRNA (unigene35337_All), which putatively encodes a squamosa promoter binding protein, was possibly targeted by two miRNAs (miR156 and miR157). Of the 56 targets in the FP degradome library, 35 fell into category I and 12 each into categories II and III; for the SP libraries, 76 of the 137 targets fell into category I, 31 into category II and 30 into category III ( Table 6). Analysis of the targets showed that the cleaved targets were differentially present between the two degradome libraries. These observations suggest that replanting promotes miRNAdriven cleavage in R. glutinosa.

Functional Analysis of miRNA Targets
A BlastX search of the Nr (non-redundant protein sequences) database showed that these miRNA targets shared homology with other plant proteins ( Table F in File S1). The targets of the miRNA families which were more abundant in SP included genes encoding a WD-40 repeat family protein, a polyphenol oxidase, CBL-interacting protein kinase 18, a potassium ion transmembrane transporter 7, a histidine kinase 3B, MYB transcription factor 127 and an aspartic protease. These genes are all involved in plant growth and development. On the other hand, the targets of the miRNA families which were more abundant in FP included genes encoding a heat shock protein, a squamosa promoterbinding protein, a class III HD-Zip protein, AGO1-1 and RNA helicase, genes which are all involved in the stress response. When gene ontology categories were assigned to these targets (Fig. 4), six molecular function categories predominated, with the two most highly represented being ''binding'' and ''catalytic''. Fifteen biological processes were identified, with the two most frequent being ''cellular process'' and ''metabolic process'', followed by ''development process'' and ''response to stimulus''. Finally there were seven major cellular component classes, with the two most abundant being ''cell'' and ''cell part''. This analysis suggested that the miRNA targets were concentrated in biological regulation, response to stimuli, development and metabolic processes.

Transcription Profiling of Selected miRNAs
The temporal pattern of transcription of 16 selected miRNAs (eight of which were more abundant in FP and eight more abundant in SP) was obtained using qRT-PCR (Fig. 5). For each of the eight miRNAs selected as being more abundant in the SP root according to the sequencing data, transcript abundance was uniformly higher in the SP than in the FP root. The abundance of miR2931, miR3951 and miR7811 was particularly high throughout the four month sampling period, while that of miR1851, miR1147, miR160c, miR1861b and miR3512 appeared to be strongly induced only at certain measured times. Similarly, for each of the eight miRNAs selected as being more abundant in the FP root according to the sequencing data, transcript abundance was uniformly lower in the SP than in the FP roots. miR157a, miR167d, miR408a and miR477c were almost undetectable for a three month period, while miR1115, miR165a, miR168a and

Transcriptome-wide Novel miRNAs Identification
Understanding the function of a given miRNA requires the recognition of its target mRNA(s). Little is yet known regarding the spectrum of miRNAs present in R. glutinosa [18,24], particularly because only 1,500 EST sequences have as yet been deposited in GenBank. Here we have added greatly to this number by applying a high throughput sequencing platform, which has also been useful in characterizing the transcrptome in both the root and leaf. The number of miRNA sequences present in R. glutinosa has as a result been greatly expanded over what had been established prior to this study, and now numbers almost 800, of which the great majority (97%) have been identified previously in other plant species. Clearly, high throughput sequencing provides a highly effective means of acquiring miRNA sequence, especially in a species whose genome has yet to be properly characterized [25][26][27][28].

Differential Abundance of miRNAs in FP and SP Plants
The Solexa/Illumina platform is able to provide an estimate of the abundance of individual miRNAs, based on the number of times that a specific sequence is recovered among the millions of sequence reads generated [29,30]. This feature made it possible to recognize differences in the abundance of individual miRNAs in FP and SP plants. In the event, 303 miRNA families behaved differentially, with 124 families more abundant and 179 less abundant in SP than in FP. A number of these miRNAs may be involved in the regulation of the stress response [9,31]. Of particular interest are the 54 families (51 of which also occur in other plants, and three which are specific to R. glutinosa) which were only present in SP. Some of these may well therefore be implicated in the replanting disease.

The Identification of mRNA Targets by Analysis of the Degradome
Degradome sequencing was focused on the 165 targets identified for the 85 miRNA families present in differential abundance in the replanted R. glutinosa. Although there were 303 differentially abundant miRNAs in total, the other 218 miRNA families were not associated with any identifiable cleavage target. Possible reasons for this failure are first that the target was present at too low an abundance to have been recovered [31,32], and second that at least some of these miRNAs are incapable of cleavage, acting instead by translational repression [33], as has been suggested elsewhere [31]. The predicted functions of the 165 targets were associated with growth, development and the stress response, along with certain other processes. Typical symptoms of the replanting disease of R. glutinosa include weak growth, the development of fibrous in preference to tuberous roots, the abnormal expansion of the tubers and precocious flowering and premature, etc [1,6,34]. The interesting question is whether any of these phenotypes could be determined by the action of any or some of the 85 differentially abundant miRNAs.
A hypothetical regulatory mechanism of miRNAs in the replanting disease is presented in Fig. 6. One of the targets of miR160 is an auxin response factor gene (ARF10), the product of which is heavily implicated in the development of the root cap [35]. If the higher abundance of miR160 in SP plants interfered with the production of ARF10, the expectation would be that a greater number of lateral roots would be formed. A target of miR1861 is the gene encoding potassium transporter 7, the product of which is required for, inter alia, the uptake of potassium from the soil [36,37]. An excess presence of miR1861 in SP plants would thus be expected to compromise potassium uptake, and result in the abnormal expansion of tubers, a typical symptom of potassium deficiency. A target of miR2931 is a gene encoding histidine kinase 3B, which is a component of the eukaryotic machinery for signal transduction across the cellular membrane [38]. The down-regulation of this gene would therefore be expected to disturb normal signal transduction. A target of miRNA3951 is a gene encoding a MYB transcription factor, many of which act as major regulators of development, metabolism and the stress response [39,40]. Its miRNA-driven down-regulation in SP plants may well therefore have major growth and developmental consequences. The target of miR7811 is a gene whose product is a member of the SMC (structural maintenance of chromosomes) family. Eukaryotic SMC proteins are the core components of the cohesin complex, which is responsible for sister chromatid and homolog cohesion during mitosis and meiosis [41]. Down-regulation of this gene in SP plants may therefore affect cell division.
Other key targets of the miRNAs more abundant in SP than in FP included genes encoding a transducin family protein/ WD-40 repeat family protein (miR1147), an ALY protein (miR1851) and an ABC transporter family protein (miR3512), all of which are key components of metabolism [42][43][44]. Among the targets of the miRNAs which were less abundant in the SP than in the FP plants were certain genes involved in the determination of flowering time, prematurity and the response to environmental stress. The product of genes encoding squamosa promoter-binding proteins (SPBs) (targeted by miR156/157) are known to regulate flowering time in A. thaliana and their overexpression accelerates flowering [12,45,46]. The reduced abundance of miR156/157 in SP plants implies the maintenance of a high level of SPB, which would tend to promote the shift to reproductive growth; the pleiotropic effect of this would be a reduction in vegetative and root growth. The genes targeted by miR165 included four which encode class III HD-Zip proteins; in the A. thaliana root meristem, these proteins are involved in xylem differentiation [47]. Their enhanced presence in SP plants might be expected to promote the formation of fibrous roots and to affect root expansion. miR167 targets genes encoding ARF6 and ARF8, which act as positive regulators of adventitious root formation [48], so an elevated presence of these proteins would tend to militate against the formation of tubers. miR408 targets a gene encoding a lateral organ boundary domain protein, which is important for lateral root morphogenesis in Arabidopsis [49], rice [50] and maize [51]. Since the abundance of miR408 was lower in SP than in FP plants, one might expect differences in the expression of the the LOB gene, leading to a greater degree of lateral root formation [52], with negative consequences for tuber formation. Several other targets of the reduced abundance miRNAs included a probable thiol methyltransferase 2 (miR1115), AGO 1 (miR168) and an RNA helicase (miR477); each of these gene products are components of the stress response [53,54]. miRNAs regulate many cellular processes. Here, we have shown that they may well be involved in the replanting disease of R. glutinosa.

Plant Material and RNA Isolation
Our experimental plants of R. glutinosa, 'Wen 85-5', were grown in Wen Agricultural Institute, Jiaozuo City, Henan Province, China. The designed growing period was from the period from April 22, 2011 to November 30, 2011. A group of seedlings was grown in the field where R. glutinosa had not been planted for more than 10 years. The other group was grown in the field where the same cultivar had been grown in the previous year (planted on April 15, harvested on November 30, 2010). For convenience of description, we name the former group as first year plants (FP) and the latter group as second year plants (SP). R. glutinosa plants (five independent FP and same number SP) were collected at the tuberous root expansion stage (August 22, 2011). Six sequencing libraries belonging to three types (I, transcriptome, II, sRNA and III, degradome) were constructed in present study (Fig. 7), i.e. two transcriptome libraries, I1 (roots of FP and SP) and I2 (leaves of FP and SP), two sRNA libraries, II1 (roots and leaves of FP) and II2 (roots and leaves of SP), two degradome libraries, III1 (roots and leaves of FP) and III2 (roots and leaves of SP).
Total RNA from each sample was isolated using a TriZOL reagent (TaKaRa Co., Tokyo, Japan) following the manufacturer's instructions and treated with RNase free DNase I (Qiagen). RNA concentrations were measured using a spectrophotometer and integrity was ensured through analysis on a 1.5% (w/v) agarose gel. Those RNA samples were also used in qRT-PCR verification.
For measure of miRNA different expression in different development stages, roots of FP and SP from five independent

Transcriptome Sequencing and de novo Assembly Analysis
Beads with Oligo(dT) were used to isolate poly(A) mRNA after total RNA was collected from each sample. Fragmentation buffer was added for interrupting mRNA to short fragments. Taking these short fragments as templates, random hexamer-primer was used to synthesize the first-strand cDNA. The second-strand cDNA was synthesized using buffer, dNTPs, RNaseH and DNA polymerase I, respectively. Short fragments were purified with QiaQuick PCR extraction kit and resolved with EB buffer for end reparation and adding poly(A). After that, the short fragments are connected with sequencing adaptors. And, after the agarose gel electrophoresis, the suitable fragments are selected for the PCR amplification as templates. At last, the library could be sequenced using Illumina HiSeq TM 2000. Four fluorescently labeled nucleotides and a specialized polymerase were used to determine the clusters base by base in parallel. The raw reads were generated by the Illumina HiSeq TM 2000 system. Short reads after filtering dirty raw reads that only have 39 adaptor fragments should be removed before data analysis, on which all following analysis are based. Transcriptome de novo assembly is carried out with short reads assembling program-SOAPdenovo [55]. If results of different databases conflict with each other, a priority order of (Nr) nonredundant protein, Swiss-Prot, KEGG and COG should be followed when deciding sequence direction of unigenes [15]. Genes were tentatively identified according to the best hits against known sequences.

sRNA Sequencing and miRNAs Identification
To construct FP and SP sRNA libraries, total RNA from each plant was pooled, and then separated by 15% denaturing PAGE to recover the population of small RNAs (size range 18-30 nt) present. The small RNAs were ligated sequentially to 59 and 39 RNA/DNA chimeric oligonucleotide adaptors (Illumina), and the resulting ligation products were gel purified by 10% denaturing PAGE, and reverse transcribed. The cDNAs obtained in this way were sequenced by the Illumina HiSeq TM 2000 system. Known miRNAs were identified by Blastn searches against Genbank (www.ncbi.nlm.nih.gov), Rfam 10.1 (http://rfam.sanger.ac.uk/) and miRBase 19.0 (www.mirbase.org/) databases with default parameters (E-value cutoff was 10, Maximum no. of hits was 100, ,3 mismatches nucleotides were allowed). Potentially novel sequences were identified by an alignment with the R. glutinosa transcriptome sequences using SOAP (http://soap.genomics.org. cn/) software [55]. Candidate pre-miRNAs were identified by folding the flanking sequences of distinct miRNAs using MIREAP (http://sourceforge.net/projects/mireap/), followed by a prediction of secondary structure by mFold v3.1 [56]. The criteria chosen for stem-loop hairpins were as follows [20,21]: 100 nt maximum distance was allowed between miRNA and miRNA*, maximum free energy should be not less than 20 kcal mol 21 , duplex asymmetry of miRNA and miRNA* was set in 7 nt, pairing number between miRNA and miRNA* was revised to 10 nt, mature bulge was less than 4 nt.

Verification of Novel miRNAs and Analysis of miRNAs Different Expression
For reverse transcription (RT) reaction, polyA was first added to the 39 end of the miRNAs using polyA polymerase, and cDNA was then synthesized using AMV reverse transcriptase (GeneCopoeia, Inc.), and then stored at 220uC, employing a 53 nt oligodTadaptor sequence (GeneCopoeia, Inc.) as the primer.
For verification of novel miRNAs and analysis of miRNAs different expression, qRT-PCR was performed using an All-in-One TM miRNA Q-PCR detection kit (GeneCopoeia, Inc.) on a BIO-RAD iQ5 real-time PCR detection system (Bio-Rad labora-  tories, Inc.). Each 20 ml Q-PCR comprised 0.5 ml cDNA, 2 ml 2 mM miRNA forward primer (sequences given in Table G, H and I of File S1), 2 ml 2 mM reverse primer from a 53 nt oligodTadaptor sequence (Universal Adaptor PCR Primer), 10 ml 26 Allin-One TM miRNA Q-PCR buffer and 5.5 ml nuclease-free water. The reactions were incubated at 95uC for 10 min, and then were cycled 36 times through 95uC/10s, 55uC/20s and 72uC/10s. After the reactions had been completed, the threshold was manually set and the threshold cycle (CT) was automatically recorded. 3 technical replicates were used for each tested sample. A 4 ml aliquot of each reaction product was subjected to 3% agarose electrophoresis. The relative expression level of the miRNAs was calculated using the 2 2ggCT method [57], and the data were normalized on the basis of 18s rRNA CT values.

Degradome Sequencing, Target Identification and Analysis
FP and SP degradome libraries were constructed according to a published protocol [22,58]. Briefly, RNA fragments with a poly(A) tail were isolated from total RNA of each example using the Oligotex mRNA mimi kit (Qiagen), and then a 59 RNA adaptor with a MmeI restriction site at its 39 end was added to the 59 ends of the isolated poly(A) RNAs. After reverse transcription using oligo d(T) and PCR enrichment, the PCR products were purified and digested with MmeI. After ligating a double-stranded DNA adaptor to the 39 end of the digested products, the ligated products were further purified and amplified, and then sequenced using the Illumina HiSeq TM 2000 system.
Raw sequencing reads were obtained using Illumina's Pipeline v1.5 software to remove adaptor sequences and low quality sequencing reads. The extracted sequencing reads with the length of 20 and 21 nt were then used to identify potentially cleaved targets by the CleaveLand pipeline [22,23]. The degradome reads were mapped to the R. glutinosa transcriptome sequences. The target was selected categorized as I, II and III as previous study [19,22]. In addition, to easily analyze the miRNA targets and RNA degradation patterns, t-plots were built according to the distribution of signatures (and abundances) along the R. glutinosa transcriptome. All the identified targets were subjected to BlastX analysis (http://www.ncbi.nlm.nih.gov/BLAST/) to search for similarity, and then to GO analysis previously described [59].

Statistical Analysis
Each result in this study is the mean of at least three replicated treatments and each treatment contained at least five roots, leaves and plants (including roots and leaves), respectively. Statistical analysis was performed to identify differentially expressed sRNAs between the libraries using a rigorous algorithm described previously [60]. For small RNAs, the SP (replant or allelopathy autotoxicity-stress) library-derived sequence reads were normalized to the high-quality reads of the control (the second plant, FP) library. The absolute value of log 2 ratio#1 was used as the threshold to judge the significant difference of miRNA expression [32].

Supporting Information
File S1 Additional tables.