RNA Captor: A Tool for RNA Characterization

Background In the genome era, characterizing the structure and the function of RNA molecules remains a major challenge. Alternative transcripts and non-protein-coding genes are poorly recognized by the current genome-annotation algorithms and efficient tools are needed to isolate the less-abundant or stable RNAs. Results A universal RNA-tagging method using the T4 RNA ligase 2 and special adapters is reported. Based on this system, protocols for RACE PCR and full-length cDNA library construction have been developed. The RNA tagging conditions were thoroughly optimized and compared to previous methods by using a biochemical oligonucleotide tagging assay and RACE PCRs on a range of transcripts. In addition, two large-scale full-length cDNA inventories relying on this method are presented. Conclusion The RNA Captor is a straightforward and accessible protocol. The sensitivity of this approach was shown to be higher compared to previous methods, and applicable on messenger RNAs, non-protein-coding RNAs, transcription-start sites and microRNA-directed cleavage sites of transcripts. This strategy could also be used to study other classes of RNA and in deep sequencing experiments.


Introduction
The RNA world is appearing more and more complex and deciphering the growing list of RNA species, isoforms and byproducts is a major challenge in biology [1]. Genome annotation algorithms are still limited and full-length cDNAs or RACE PCR remain essential for studying the structure and function of the genes. These approaches are widely used for single gene as well as large-scale genomic programs (e.g. [2,3]).
The main limitation full-length cDNA or 59 RACE methods try to resolve is binding a known sequence at the cap site, so as to prime second-strand polymerization of the cDNA. In some methods, cap-dependent tagging is used as a way of selecting for complete cDNAs; in other protocols the tag is added on cDNAs previously enriched for molecules extending to the 59cap [4,5]. Enzymatic tagging can be performed on cDNA or RNA, with as diverse activities as terminal transferase [6], T4 RNA ligase 1 (Rnl1) (e.g. [7][8][9][10]) or reverse transcriptase [11]. More recently we proposed an RNA-tagging solution using the T4 DNA ligase (Dnl) and a special adapter producing double-strand structure at the junction with the RNA [12]. In the framework of the grapevine genome program, 5 full-length cDNA libraries were constructed with this method and 84,000 clones were sequenced. At least 81.6% of the 59 reads matching to Arabidopsis proteins are large enough to span the start codon in this collection used for grapevine genome annotation [13].
The T4 RNA ligase 2 (Rnl2) shows optimal activity on doublestrand nucleic acids [14], though it is also used for adding a 39adapter by single-strand ligation in small-RNA cloning with the Rnl2(1-249) truncated enzyme producing reduced ligation side products [15,16]. Based on the full-Rnl2 enzyme and our previous strategy [12], here is shown the development of an effective 59-RNA-tagging system and its application to RACE PCRs and fulllength cDNA library construction. This new method has been used in two large-scale transcript-inventory programs in grapevine and melon. The present report focuses on the implementation and validation of the protocol; its sensitivity as compared to previous methods is shown by RACE PCR on a range of transcripts.

Results
High-performance RNA tagging using Rnl2 for transcript analysis An overview of the proposed cDNA-synthesis strategy is shown in Figure 1. Contrary to previous methods [9,10,12], Rnl2 is used to add an oligoribonucleotide to the 59 end or specifically to the cap site of RNAs. The procedure is based on a 59 adapter generating a local-double-strand structure with the transcript. The adapter-ligated RNA can then be reverse transcribed with an oligo-dT-primer carrying a second adapter, leading to cDNAs with cloning sites integrated at both ends. This reaction product can be used for producing full-length double-stranded cDNAs or as template in deep sequencing or RACE PCR experiments. The optimum conditions for grafting an adapter on the RNA were developed by using RACE PCR and a straightforward biochemical test. Five RACE PCR assays were designed on welldocumented genes in Arabidopsis. The transcripts were chosen with increasing expression level (Table S1) ranging from 1 (eIF4E) to 88 (RBCS) EST counts among 12,273 leaf ESTs in Unigene (http://www.ncbi.nlm.nih.gov/UniGene); the RBCS abundant signal was obtained by anchoring the (RBC402L) RACE primer in a region shared by all 4 RBCS genes as described previously [12]. The RACE PCRs were carried out in semi-quantitative conditions, on cDNAs normalized by Q-PCR based on ACT2 ( Figure S1). Parallel to this, an oligonucleotide tagging assay using denaturing-gel electrophoresis to detect increases in length of an oligonucleotide upon ligation was derived from previous studies [14] (Figure 2). Based on various oligonucleotide combinations, Rnl2 showed best activity on full RNA adapters (Figure 2A). By contrast comparable nick-joining rates with DNA or RNA template strand were shown in other studies [14,17]; the discrepancy could be due to the different shorter substrates used in the present study. Lower adapter fragments with gradually shorter sticky ends were assayed in different ligation buffers. While in 0% polyethylene glycol (PEG), 2 bases are sufficient for nick joining by Rnl2 ( Figure 2B), in 30% PEG the ligase is also active on one-base sticky ends, blunt-end adapters and single-strand substrates ( Figure 2C). Based on homogeneous oligonucleotides this result is confirmed on complex adapters, bearing 1-5 random-base sticky ends ( Figure 2D). Although blunt-end or single-strand adapters are sealed, the Rnl2 performed better with sticky end adapters ( Figure 2E). This was corroborated by RACE PCRs, showing the SN3 adapter with a 3-random-base sticky end provided best activity on cellular RNAs ( Figure 3A). Macromolecular crowding agents are commonly used to improve Dnl or Rnl1 ligations [18]; by RACE PCR ( Figure 3B) and oligonucleotide tagging assays ( Figure 2F) the best Rnl2 activity is shown in 30% PEG buffer, and by ligating at 25uC then 16uC ( Figure S2).
Rnl2 outperforms Rnl1 and Dnl for tagging the RNA RACE on the cap site of mRNAs. The proposed strategy based on Rnl2 has been compared to the Dnl [12] and the Rnl1 (Oligo Capping) methods [10,19] with the panel of RACE PCR assays towards the cap site of mRNAs. Complementary DNAs were prepared using the three RNA-tagging protocols, normalized by Q-PCR ( Figure S1) and amplified by RACE PCRs ( Figure 4A). All five RACE assays showed nice bands on the gel with the RNA Captor (Rnl2) method and less or no visible products with the other two protocols by using the same cDNA amounts and PCR conditions. In particular, Rnl1 tagging produced fainter bands for RBCS, TCTP, GAPA and ACT2, no detectable products for the lowabundant (eIF4E) transcript and only the most-sensitive assay (RBCS) was positive with the Dnl method ( Figure 4A). Similar results were reproduced with different cDNA preparations.
RACE on microRNA-directed cleavage site of mRNAs. RACE PCR is widely used to characterizing cleaved RNA products which, just as for non-degraded mRNAs, is hindered by transcript paucity. To show the RNA Captor applies to this topic, a RACE PCR was performed on the two documented miR165/166-targeted PHB and PHV genes [20] and compared to the Rnl1 RNA-tagging method. Both cleaved RNAs were successfully amplified by RACE, showing the Rnl2 method is convenient for studying degraded transcripts ( Figure 4B). Under similar conditions, the Rnl1 method produced fainter bands on the gel and therefore appears less sensitive than the RNA Captor protocol.
This range of RACE PCRs run on seven transcripts with different expression levels shows that Rnl2 outperforms Rnl1 and Dnl for adding an adapter on the 59 end of RNAs. The oligonucleotide tagging assay led to the same conclusion ( Figure 5).

Large scale transcript inventories and identification of full-length cDNA for MIR genes
The RNA Captor protocol was used to construct 11 full-length cDNA libraries from grapevine and melon. A preliminary analysis has been run on 53,215 59 reads corresponding to 11 subsets of 3466 to 9624 clones and 880 to 3280 cDNA clusters from each library. Among the 53,215 clones, 45,359 match to Arabidopsis proteins and 92.6% of these 45,359 are large enough to span the homologous start ATG, suggesting most of the clones are full length. We also looked at the occurrence of micro RNA transcripts among these cDNAs and identified three clones corresponding to primary micro RNAs (pri-miR164, pri-miR168 and pri-miR390) not yet characterized in melon. Primary micro RNAs are transient molecules subjected to maturation, generally under-represented among ESTs; based on EST counts (TAIR, http://www. arabidopsis.org/) and small RNA sequencing data (ASRP, http://asrp.cgrb.oregonstate.edu/db/), these clones correspond to medium to low expressed MIR genes in Arabidopsis. The RNA Captor protocol is therefore efficient for getting full-length cDNAs corresponding to mRNAs as well as non-protein-coding RNAs.

Discussion
This study provides an alternative strategy to the difficult issue of transcript characterization. Grafting an oligonucleotide on the 59-phosphate of RNAs can be carried out by Rnl2 and partially degenerate adapters. The performance of this new method was assessed by seven RACE PCR assays targeted towards the cap or cleavage sites of mRNAs and by constructing 11 full-length cDNA libraries. Preliminary analysis showed 92.6% of potentially full-ORF clones in these collections and three novel melon miRNA primary transcripts corresponding to medium to low expressed MIR genes in Arabidopsis. Comparing protocols is a tricky question, especially on this topic, the sensitivity of the new RNA-tagging strategy and related methods was assessed by RACE PCRs under rigorous semi-quantitative conditions and by oligonucleotide tagging assay. In this range of experiments, Rnl2 performed significantly better than Rnl1 and Dnl. Obtaining better rates with sticky-end adapters as compared to non-templated ligation is in line with expectation from previous studies [14]. In the present protocol, possible diminution of the overall efficiency owing to the SN3-adapter complexity is compensated by using molar excess of adapters in the tagging reaction ( Figure S2B); replacing the short random-nucleotide tail with desoxyribo-inosines led to poor ligation rates (not shown). The Dnl was first proposed to catalyze templated ligation of RNA [12]; this approach was successful in a large-scale transcript inventory of 84,000 clones used for grapevine genome annotation [13]. Rnl2 ligation notably increases the sensitivity of the strategy, in agreement with the comparative analysis by Bullard and Bowater [14] which showed nicked substrates with a 59-phosphate-RNA strand are most-effectively ligated by this enzyme. With the Rnl1 tagging method, a wide range of 39OH polynucleotides can be joined to the 59-phosphate of the RNAs, leading to unwanted side products such as concatemers or RNA circles and loss of the less-abundant transcripts, even though high-molarity adapters should avoid this problem. With the present method, the selectivity offered by templated ligations keeps such a background to a minimum; accordingly no chimeras have been identified in the large-scale sequencing programs we're currently running.
The RNA Captor is a straightforward and accessible protocol; its efficiency is shown on mRNAs and npcRNAs, bearing or not a   59 cap. This strategy could also apply to other RNA classes such as small RNAs, viral RNA or for studying RNA-protein interactions in vivo [21]. A number of deep-sequencing programs engaged in human and various organisms to deciphering the complete catalogue of transcripts [22,23], transcription start sites [3] or degradation products [2] could benefit from the advantages offered by the present method.

Materials and Methods
Total RNAs were extracted from 4-week-old Arabidopsis-thaliana Col0 aerial-vegetative tissues, as previously described [13]. Polyethylene glycol 8000 (Fluka BioUltra grade ref. 81268) stock solution was prepared by dissolving at 70% w/v in DEPC-treated water, aliquoted and stored at 220uC. To help pipetting, the aliquots were warmed few minutes at 37uC before use.

Adapters and oligonucleotides
The oligonucleotide sequences are shown in Table S2. RNA oligonucleotides were purchased from Eurogentec and kept as dried aliquots at 280uC. The double-stranded adapters were prepared extemporaneously: Resuspend and mix the upper and lower oligonucleotide at 400 mM each in 10 mM Tris (pH 7.5) and 1 unit/ml of Rnasin Plus. Heat 2 min at 60uC and ramp down to 25uC, at a rate of 3uC per min.

Oligonucleotide tagging assay
Unless otherwise specified, the ligations were performed with 2 mM of each oligonucleotide, in 10 ml of ligation buffer, supplemented with 1 unit/ml of Rnasin Plus, 0.5 ml of ligase (20 U/ml for Rnl1, 10 U/ml for Rnl2 and 5 U/ml for Dnl), 0%-50% PEG and incubated 3 h at 20uC. The reactions were stopped by 10 ml of formamide buffer (containing 1 mg/ml of bromophenol blue and 10 mM EDTA), 3 min at 95uC and analysed on a 16% polyacrylamide-urea gel (8.

The RNA Captor protocol
To study microRNA-directed cleavage sites and other noncapped RNAs, go straight to the oligonucleotide ligation step.
Dephosphorylation. was performed at 50uC, in Tris-EDTA The miR165/166-targeted cleavage sites of PHB and PHV genes were amplified using the gene-specific primers described by Mallory et al [20]. For these less-abundant transcripts, I performed highly-specific hemi-nested PCRs by avoiding cooling down the reaction below the hybridization temperature between the first and the second PCR. Start the first PCR with (PHB1 or PHV1, SD), as above specified with a thermal cycling program of 94uC 20 sec and 52 cycles of [94uC 20 sec, 60uC 20 sec, 72uC 20 sec]. At the 26 th cycle, transfer 1 ml of the reaction to the pre-heated second-PCR mix, containing PHB2 or PHV2 and SD primers, and carry on the cycling to a total number of 48, 50 or 52 cycles. The expected major RACE PCR fragments are: 320 bp with RBC402L, ,130 bp with TCTP142L; ,180 bp with GAPA143L; 220 bp with ACT2.256L; 250 bp with eIF4E215L; 228 bp with PHB2 and 231 bp with PHV2.
Construction of full-length cDNA library. for the grapevine and the melon transcript inventories, the libraries were constructed in suboptimal conditions, with small amount of RNA, tissues infected by pathogens or with high-sugar content. Depending on available starting material, the inserts were produced from 4 mg of total RNA, to 2 mg of total RNA Comparison of the RNA-tagging methods based on Rnl2, Rnl1 and Dnl The RNA taggings were carried out in parallel, using the same S oligoribonucleotide aliquot with 2 mg equivalent of the same dephosphorylated and decapped RNA preparation for the RACE PCR towards the cap sites and 2 mg of the same untreated total RNA aliquot for amplifying the cleavage sites. Rnl1 ligations on the RNA 59end with the S oligoribonucleotide were performed as recommended by Suzuki and Sugano [19]; other assays carried out in a 20 ml-final volume (instead of 100 ml) with 1 mM of S adapter (instead of 0.42 mM), the New-England-Biolabs Rnl1 (instead of Takara Rnl1) and overnight incubations produced similar ligation rates. Dnl-based RNA ligations were performed as described previously [12] but for using 10 mM of annealed [S : N5D] adapter and 30%-PEG buffer. After oligonucleotide ligation, all RNAs were processed identically.

Q-RT PCRs
The cDNAs were normalized by quantitative-RT PCR with ACT2-internal primers (see Figure S1). The linearity of amplification with (ACT2.190U, ACT2.256L) was evaluated using triplicate serial dilutions (1:1, 1:2, 1:4, 1:8, 1:16 and 1:32) of 1 ml of single-strand cDNA template (i.e. 100 ng equivalent of starting total RNA) on an ABI PRISM 7900 system (Applied Biosystems), with 0.3 mM of each primers in 10 ml of 16MESA-Green Q-PCR mix (Eurogentec) and a thermal cycling of 50uC 2 min, 95uC 10 min and 38 cycles [95uC 15 sec, 58uC 1 min]. The resulting standard plot (Cq versus cDNA concentration input) was characterized by a slope = 23.35079 and a correlation coefficient R 2 = 0.99846. A primer efficiency, E = 98.8%, was calculated for this ACT2 primer pair, according to the equation E = (10 21/slope 21)6100. The relative concentration of the various cDNA preparations was calculated with the 2 2DCt method; the Q-RT PCRs were run with 1 ml of cDNA or 10-fold dilutions with the above conditions. The specificity of the Q-RT PCR products were routinely confirmed on 3%-agarose gel, expected size is 87 bp from cDNA and 587 bp from genomic DNA.

Sequence analyses and accession numbers
BLAST searches [25] were performed against the Arabidopsis proteome (Flagdb, [26]) or miRBase (http://www.mirbase.org/). The three new melon pri-miRNA sequences were deposited in miRBase under the following accession number MI0018163 (cme-MIR164), MI0018164 (cme-MIR390) and MI0018177 (cme-MIR168). Figure S1 Normalization of the cDNAs. QRT-PCR based on ACT2 of cDNAs produced by RNA tagging using the Rnl2, Dnl or Rnl1 method. Separated by a 500 bp intron on genomic DNA, the ACT2.190 U and ACT2.256L primers favor PCR amplification of the shortest cDNA fragment. The specificity of the Q-PCR reaction is confirmed by agarose-gel analysis, showing a single band at the expected size (87 bp). As shown on the Standard Plot, the Ct values are completely proportional to the amount of cDNAs used in the Q PCR assay. The amplification plot and accompanying histogram show the relative concentration of the cDNAs used in Figure 2a. NTC: no template control. (TIF) Figure S2 Oligonucleotide tagging assays with Rnl2 under various conditions. Ligations were performed with Rnl2, on the indicated substrates, in 10 ml of 30% PEG ligation buffer, in the following specific conditions: A) Ligations with 2 mM of each oligonucleotide and a range of 0.3 to 20 units of ligase, for 3 h at 25uC. 10 fmol equivalent of T oligonucleotide were loaded per well. Between 2.5 and 10 Rnl2 units produced best ligation rates. B) Ligations with 1 unit of ligase, 2 mM of T oligonucleotide and increasing amount (0.02-100 mM) of the N4-adapter, as shown above the lanes. One hundred fmol equivalent of T oligonucleotide were loaded per well. The tagging efficiency reaches a plateau from 2 to 100 mM of adapter in the tagging reaction, without obvious inhibition at the highest concentrations. Any possible reduction of the overall ligation efficiency expected from the use of a complex adapter (4 3 = 64 combinations for the retained N3-adapter) can therefore be compensated by using molar excess of adapters in the tagging reaction. C) Ligations with 5 units of ligase and 2 mM of each oligonucleotide were incubated at 37 or 25uC for 1 h. For each reaction 200, 50 and 12 fmol equivalent of T oligonucleotide were loaded. No ligation-rate differences are shown between both conditions, confirming the PEG ligation can be performed at low temperature, as recommended for the other two T4 ligases. D) Ligations with 5 units of ligase and 2 mM of each oligonucleotide were incubated for 1 h at 25uC, followed or not by a second incubation of 7 h at 16uC. Eighty fmol equivalent of T oligonucleotide were loaded per well. Slightly improved ligation rates were obtained by keeping the reactions few additional hours at 16uC. L: 44 nt oligonucleotide control.

(TIF)
Table S1 Expression level of the genes used in the RACE PCR assays. EST counts of various Arabidopsis genes among 12,273 leaf ESTs in Unigene (http://www.ncbi.nlm.nih.gov/ UniGene; 01/25/2010 release). Libraries known to be normalized, subtracted or otherwise biased were not included in this profile by Unigene. In particular in this pool of ESTs, the RBC402L primer matches (53+3+7+25 = ) 88 ESTs from all four RBCS genes. (TIF)