Four Methods of Preparing mRNA 5′ End Libraries Using the Illumina Sequencing Platform

Background The 5′ untranslated regions of mRNA play an important role in their translation. Results Here, we describe the development of four methods of profiling mRNA 5′ ends using the Illumina sequencing platform; the first method utilizes SMART (Switching Mechanism At 5′ end of RNA Transcript) technology, while the second involves replacing the 5′ cap structure with RNA oligomers via ligation. The third and fourth methods are modifications of SMART, and involve enriching mRNA molecules with (nuclear transcripts) and without (mitochondrial transcripts) 5′ end cap structures, respectively. Libraries prepared using SMART technology gave more reproducible results, but the ligation method was advantageous in that it only sequenced mRNAs with a cap structure at the 5′ end. Conclusions These methods are suitable for global mapping of mRNA 5′ ends, both with and without cap structures, at a single molecule resolution. In addition, comparison of the present results obtained using different methods revealed the presence of abundant messenger RNAs without a cap structure.


Introduction
Among the next-generation sequencing platforms, the Illumina platform offers by far the highest number of sequence reads per run. Illumina sequencing has been previously used for transcriptomic analyses e.g. [1,2]. However, most available RNA-seq kits generate full length transcripts, rather than the 59 end alone (e.g., NEBNext mRNA Library Prep (New England Biolabs); TruSeq (Illumina); ScriptSeq v2 RNA-Seq (epicentre) [3]). Furthermore, these kits require fragmentation of RNA templates before preparation of the sequencing library, which prohibits profiling of the 59 end. Salimullah et al. recently described a 59 end sequencing strategy called NanoCAGE [4]. This method is advantageous in that it requires only a small amount of total RNA as input. However, it uses random primers for the synthesis of first strand cDNA, which results in the amplification of nonadenylated RNA. A second strategy, called CAGE [5], also uses random primers for first strand synthesis [5], but its reliance on the EcoP15I restriction enzyme (which cleaves 27 bp away from the recognition site) limits the length of the resulting sequences to 27 nt. A third 59 end sequencing strategy, called RAMPAGE, was more recently described by Batut et al. [6]. Advantages of RAMPAGE include the ability to identify capped transcription start sites and the potential for high sample number multiplexing; however, this method requires large quantities of total RNA (5 mg). Moreover, the Illumina platform requires the bases to be balanced at the beginning of each read (for cluster detection and cross-talk matrix generation during the first four cycles, and for phasing and pre-phasing rate calculations during cycles 2-12; [7]). None of the sequencing strategies described above meet this particular requirement.
In the present study, we developed four Illumina-based methods of preparing libraries for 59 end profiling analysis. The first method is based on SMART (Switching Mechanism At 59 end of RNA Transcript: [8]), while the second involves replacing the 59 RNA cap structure with ligated RNA oligomers [9]. Libraries generated by the SMART method were found to be highly reproducible, allowing mRNA abundance to be measured directly based on sequence counts. In contrast, the ligation-based method enabled the mapping of 59 end boundaries with mature cap structures. The resulting 59 end profiles provide fresh insights into 59 untranslated regions, revealing the presence of abundant mRNAs without a cap structure. The last two methods are modifications of SMART, used to enrich mRNA molecules with (CapSMART) or without cap (Non-CapSMART) structures. All four methods allow balanced representation of bases at the beginning of each read, which is required for high quality Illumina sequencing.

Methods
Adult Drosophila melanogaster poly A+ RNA was purchased from a commercial source (Clontech: Cat. 636222, Lot. 1009305A). The cDNA libraries were constructed with the SMART cDNA Library Construction Kit (Clontech, Cat. 634901) (for the SMART method) or the ExactSTART Eukaryotic mRNA 59-& 39-RACE  Kit (epicentre, Cat. ES80910) (for the ligation method), using a modified variant of the manufacturer's instructions (as described below). To determine reproducibility, six libraries were independently prepared using each SMART and ligation method, and multiplexed using single Illumina HiSeq lanes. Three libraries were also independently prepared using each CapSMART and Non-CapSMART method (each of the three libraries used different STOP oligos), and the resulting six libraries were multiplexed using a single Illumina lane. Following cDNA library construction, libraries generated by all four methods were subjected to the same workflow, which involved sonication, biotin collection of the 59 end, and Illumina library preparation. All thermal reactions were performed using a Veriti thermal cycler (Applied Biosystems).
To further confirm the reproducibility of each method, a single SMART library and four ligation libraries (using tags TAG02, TAG04, TAG05, and TAG06) were also constructed using embryonic Drosophila melanogaster poly A+ RNA (Clontech: Cat. 636224, Lot. 1210373A). The four ligation libraries were pooled before sequencing. Illumina MiSeq was used to sequence the pooled ligation libraries and the SMART library. With the exception of the sequencing machine, all experiments were performed as described for the adult poly A+ samples (above). Sequence out put from all four ligation libraries were pooled before analyses. The first 101 nucleotides of the sequence output were used for further genome mapping analyses.

Preparation of cDNA libraries using the SMART method
Libraries were constructed using the SMART cDNA Library Construction Kit (Clontech) with modified SMART oligonucleotides (Table 1, Fig. 1). First, a 5 ml reaction volume containing 375 ng of poly A+ RNA (Clontech), 1 ml of modified SMART oligonucleotide {Table 1, Modified SMART (12 mM): AGA GTG TTT GGG TAG AGC AGC GTG TTG GCA TGT ggg (lower case for RNA), synthesized and HPLC purified by Metabion, Germany} and 1 ml of CDS III/39 PCR primer was incubated for 2 minutes at 72uC to denature the RNA. The tube was placed on ice for 2 minutes immediately after the incubation. First-strand reverse transcription was subsequently performed by adding 2 ml of 56 First-strand buffer, 1 ml of DTT, 1 ml of dNTP (2.5 mM each), and 1 ml of SMARTScribe MMLV Reverse Transcriptase (with the exception of the modified SMART oligo, all chemicals are included in Clontech SMART kit) to the tube, and incubating it for 60 minutes at 42uC.
After reverse transcription, the resulting cDNA was amplified by LD PCR (Clontech). Each PCR consisted of a 50 ml reaction volume containing 37 ml of nuclease-free water, 5 ml of 2 PCR buffer, 4 ml of dNTP (2.5 mM each), 1 ml of modified CDS III/39 primer ( Table 1, Modified CDS III/39 (12 mM): ATT CTA GAG GCC GAG GCG GCC GAC ATG, synthesized and HPLC purified by Genomics BioSci & Tech, Taiwan), 1 ml of biotinylated primer {Table 1, SMART 59 biotin (12 mM): AGA GTG TTT GGG TAG AGC AGC G T G TTG GCA TGT GGG *G (subscript indicates biotinylation, * indicates a PTO bond), synthesized and HPLC purified by SynGen, USA}, 1 ml of Advantage 2 DNA Polymerase Mix, and 1 ml of the first-strand reverse transcript product (with the exception of the modified CDS III/39 primer, biotinylated primer, and dNTP, all chemicals are included in the Clontech SMART kit). Initial denaturation was carried out at 95uC for 10 min, followed by 22 cycles of the following thermal-cycle profile: denaturation at 95uC for 5 seconds, annealing and extension at 68uC for 6 minutes. The resulting products were electrophoresed on a 1% TAE agarose gel together with Safe-Green (Applied Biological Materials Inc.), and visualized using a blue light transilluminator (Maestrogen: LB-16). PCR products were purified by Agencourt AMPure XP (Beckman Coulter) and eluted in 150 ml of Elution Buffer (Qiagen).

Preparation of cDNA libraries using the ligation method
Libraries were constructed using the ExactSTART Eukyaryotic mRNA 59-& 39-RACE Kit (epicentre) with modified 59-RACE Acceptor Oligos (Table 2, Fig. 2). First, alkaline phosphatase was used to remove the 59-phosphate group from 59-mono-, di-, and tri-phosphorylated RNAs: a 100 ml reaction volume containing 375 ng of poly A+ RNA, 10 ml of APex Reaction Buffer, 5 ml of APex Heat-Labile Alkaline Phosphatase, and nuclease-free water was incubated for 15 minutes at 37uC. After the reaction, the products were purified using the RNeasy MinElute Cleanup kit   (Table 3: TAG02, TAG04, TAG05, TAG06, TAG07, TAG12)  (Qiagen), and eluted with 10 ml nuclease-free water. The products were then treated with Tobacco Acid Pyrophosphatase (TAP) to remove the 59 cap structure and expose a mono-phosphate group for ligation; a reaction mixture consisting of 1 ml of TAP buffer, 0.5 ml of RiboGuard RNase Inhibitor, 1 ml of TAP enzyme, and 7.5 ml of alkaline phosphatase-treated RNA was incubated for 30 minutes at 37uC. Next, 10 ml of TAP-treated RNA were incubated with 4 ml of nuclease-free water, 2 ml of RNA ligase buffer, 1 ml of TAP STOP buffer, 1 ml of modified 50 mM 59-RACE Acceptor Oligo, 1 ml of 2 mM ATP solution, and 1 ml of T4 RNA ligase for 30 minutes at 37uC to ligate modified 59-RACE Acceptor Oligos to the RNA. This step required thorough mixing of the reaction after the addition of STOP buffer and before the addition of ATP solution. Each reaction contained one of six different modified 59 -RACE Acceptor Oligos (Table 2:  TAG02, TAG04, TAG05, TAG06, TAG07, TAG12). It is important to select appropriate sets of oligomers for high sequence quality [10]. Following ligation, first-strand reverse transcription was performed by adding 14 ml of nuclease-free water, 1 ml of cDNA synthesis primer, 2 ml of dNTP PreMix (2.5 mM each), 2 ml of MMLV RT buffer, and 1 ml of MMLV Reverse Transcriptase to the RNA, and incubating the reaction for 60 minutes at 37uC, followed by 10 minutes at 85uC. RNase digestion was then performed by adding 1 ml of RNase solution to the reaction mixture for 5 minutes at 55uC (all chemicals used are included in the epicentre ExactSTART Eukaryotic mRNA 59-&39-RACE Kit, with the exception of the modified 59-RACE Acceptor Oligos). After RNase digestion, second-strand cDNA synthesis was performed by PCR, by setting up a 50 ml reaction volume containing 13 ml nuclease-free water, 5 ml of 2 PCR buffer (Clontech), 4 ml of dNTP (2.5 mM each), 2.5 ml of PCR primer 2 (epicentre, ExactSTART kit), 2.5 ml of biotinylated primer {Table 2, Ligation 59 biotin (2 mM): AGA GTG TTT GGG TAG AGC AGC G T G TTG GCA TGT (subscript indicates biotinylation), synthesized and HPLC purified by SynGen, USA}, 2.5 ml of Advantage 2 DNA Polymerase Mix (Clontech), and 20.5 ml of first-strand reverse transcript product. PCR amplification was confirmed by electrophoresis, as described in the previous section. After purification using Agencourt AMPure XP, samples were eluted with 30 ml of Elution Buffer, and the dsDNA concentration was measured using the Qubit dsDNA HS Assay Kit (Invitrogen). Equal amounts of quantified libraries were pooled in a single tube, and the volume was adjusted to 150 ml with nuclease-free water.

Preparation of cDNA libraries using the CapSMART method
Libraries were constructed using both ExactSTART Eukyaryotic mRNA 59-& 39-RACE (epicentre) and SMART cDNA library Construction Kits (Clontech), with modified SMART oligonucleotides and STOP oligos ( Table 1, Fig. 3). First, alkaline phosphatase was used to remove the 59-phosphate group from 59mono-, di-, and tri-phosphorylated RNAs: a 100 ml reaction volume containing 375 ng of poly A+ RNA, 10 ml of APex Reaction Buffer, 5 ml of APex Heat-Labile Alkaline Phosphatase, and nuclease-free water was incubated for 15 minutes at 37uC.
After the reaction, the products were purified using the RNeasy MinElute Cleanup kit (Qiagen), and eluted with 18 ml nucleasefree water. The products were then treated with T4 Polynucleotide Kinase to add mono-phosphate to non-capped mRNA to ready it for ligation; a reaction mixture consisting of 1 ml of T4 Polynucleotide Kinase (Fermentas, # EK0032), 2 ml of RNA Ligase Reaction Buffer (New England Biolabs), 0.5 ml of RNase-OUT (Invitrogen, #10777-019), 1 ml of 100 mM ATP solution (Fermentas, #R0441), and 15.5 ml of alkaline phosphatase-treated RNA was incubated for 30 minutes at 37uC. Next, 20 ml of T4 Polynucleotide Kinase-treated RNA were incubated with 2.5 ml of nuclease-free water, 1 ml of RNA Ligase Reaction Buffer (New England Biolabs), 4.5 ml of PEG8000 (New England Biolabs), 1 ml of STOP oligo {Table 1, STOP1 (50 mM): iGiCiG, STOP2 (50 mM): iCiGiC, STOP Mix (50 mM): mixture of STOP1 and STOP2, synthesized by Metabion, Germany}, and 1 ml of T4 RNA Ligase (New England Biolabs, M0204S) for 16 hours at 16uC to ligate STOP oligos to the non-capped mRNA. To test ligation bias of the STOP oligos, three reactions were performed, using STOP1, STOP2, and STOP Mix. Following STOP oligo ligation, the products were purified using the RNeasy MinElute Cleanup Kit (Qigaen), and eluted with 10 ml nuclease-free water. Subsequently, 3 ml of these purified products were incubated together with 1 ml of modified SMART oligonucleotide {Table 1, Modified SMART (12 mM)} and 1 ml of CDS III/39 PCR primer for 2 minutes at 72uC to denature the RNA. The tube was placed on ice for 2 minutes immediately after incubation. First-strand reverse transcription was subsequently performed by adding 2 ml of 56 First-strand buffer, 1 ml of DTT, 1 ml of dNTP (2.5 mM each), and 1 ml of SMARTScribe MMLV Reverse Transcriptase to the tube, and incubating it for 60 minutes at 42uC. After reverse transcription, the resulting cDNA was amplified by LD PCR (Clontech), as described above.

Preparation of cDNA libraries using the Non-CapSMART method
Libraries were constructed using both the ExactSTART Eukyaryotic mRNA 59-& 39-RACE (epicentre) and SMART cDNA library Construction Kits (Clontech), with modified SMART oligonucleotides and STOP oligos (Table 1, Fig. 4). First, alkaline phosphatase was used to remove the 59-phosphate group from 59-mono-, di-, and tri-phosphorylated RNAs: a 100 ml reaction volume containing 375 ng of poly A+ RNA, 10 ml of APex Reaction Buffer, 5 ml of APex Heat-Labile Alkaline Phosphatase, and nuclease-free water was incubated for 15 minutes at 37uC. After the reaction, the products were purified using the RNeasy MinElute Cleanup kit (Qiagen), and eluted with 10 ml nuclease-free water. The products were then treated with Tobacco Acid Pyrophosphatase (TAP) to remove the 59 cap structure and expose a mono-phosphate group for ligation; a reaction mixture consisting of 1 ml of TAP buffer, 0.5 ml of RiboGuard RNase Inhibitor, 1 ml of TAP enzyme, and 7.5 ml of alkaline phosphatasetreated RNA was incubated for 30 minutes at 37uC. Next, 10 ml of TAP-treated RNA were incubated with 4 ml of nuclease-free water, 2 ml of RNA ligase buffer, 1 ml of TAP STOP buffer, 1 ml of STOP Oligo (Table 1), 1 ml of 2 mM ATP solution, and 1 ml of Figure 3. Library preparation using the CapSMART method. A) The protocol used either poly A+ (0.50-10 mg) or total (10-200 mg) RNA. B) De-phosphorylation of mono-, di-, and tri-phosphate groups from non-capped 59 end molecules using alkaline phosphatase. C) Phosphorylation to add mono-phosphate to the non-capped 59 end molecules using T4 Polynucleotide Kinase. D) Ligation of STOP oligos. A total of three kinds of oligonucleotides (Table 2: STOP1: iGiCiG, STOP2: iCiGiC, STOPMix: mixture of STOP1 and STOP2) were used in the present study. E) First-strand cDNA synthesis. F) Second-strand cDNA amplification by PCR with biotinylated 59 end primers. G) Fragmentation of cDNA using a Bioruptor and collection of biotinylated 59 ends using beads. H) Illumina sequencing library preparation. doi:10.1371/journal.pone.0101812.g003 T4 RNA ligase for 16 hours at 16uC to ligate modified STOP Oligos to the RNA. This step required thorough mixing of the reaction after the addition of STOP buffer and before the addition of ATP solution. To test ligation bias of the STOP oligos, three reactions were performed, using STOP1, STOP2, and STOP Mix. Following the STOP oligo ligation, the products were purified using the RNeasy MinElute Cleanup Kit (Qigaen), and eluted with 10 ml nuclease-free water. Next, 3 ml of the purified products were incubated together with 1 ml of modified SMART oligonucleotide {Table 1, Modified SMART (12 mM)} and 1 ml of CDS III/39 PCR primer for 2 minutes at 72uC to denature the RNA. The tube was placed on ice for 2 minutes immediately after the incubation. First-strand reverse transcription was subsequently performed by adding 2 ml of 56 First-strand buffer, 1 ml of DTT, 1 ml of dNTP (2.5 mM each), and 1 ml of SMARTScribe MMLV Reverse Transcriptase to the tube, and incubating it for 60 minutes at 42uC. After reverse transcription, the resulting cDNA was amplified by LD PCR (Clontech), as described above.

Sonication and biotin collection of 59 ends
The following procedures were carried out for all libraries. Sonication was performed using a Bioruptor (Diagenode). Prepared libraries were sonicated by 20 cycles of ON/OFF for 30 seconds each at high intensity. Sonicated products were electrophoresed to confirm fragmentation.
Biotinylated 59 ends were subsequently collected using 50 ml of Dynabeads MyOne Streptavidin T1 (Invitrogen) per reaction, following the manufacturer's protocol. Each reaction consisted of either 50 ml of purified SMART, CapSMART, and Non-CapSMART method-generated products or 100 ml of purified ligation method-generated products. In brief, samples were immobilized and washed, and then incubated with 50 ml elution buffer (1 ml 0.5 M EDTA, 47.5 ml formamide, 1.5 ml H 2 O) for 5 minutes at 65uC. Further purification was performed using a 90% volume (45 ml) of Agencourt AMPure XP, and 59 ends were eluted with 30 ml Elution Buffer for SMART, CapSMART, and Non-CapSMART products, or 60 ml for ligation method products.

Library preparation and Illumina sequencing
Illumina sequencing libraries were prepared following Meyer & Kircher [10]. In brief, biotin-collected 59 ends were subjected to blunt-end repair (100 ng of collected 59 ends were used for each preparation). Next, Illumina-compatible adaptors were ligated to the 59 ends, and adaptor-gaps were filled in. After each reaction, products were purified using Agencourt AMPure XP. After the final purification, indexing PCR was performed by adding 26.5 ml of nuclease-free water, 4 ml of dNTP (25 mM each), 5 ml of 2 PCR buffer, 1 ml of IS4_59 primer (10 mM) {modified from IS4, Table 3: AAT GAT ACG GCG ACC ACC GAG ATC TAC ACT CTT TCC CTA CAC GAC GCT CTT CCG ATC TAG AGT GTT TGG G*T (* indicates a PTO bond), synthesized and HPLC purified by Metabion, Germany}, 1 ml of indexing primer (10 mM), 2.5 ml of Advantage 2 DNA Polymerase Mix (Clontech), and 10 ml of Illumina adaptor-ligated product to a total reaction volume of 50 ml. Modified IS4 primer was used to enable directional sequencing. For SMART, CapSMART, and Non-CapSMART library samples, indexing PCR was performed using six different indexing primers for multiplexing (indexing primers: ID01, ID02, ID03, ID04, ID05, and ID06 [10]). Six SMART libraries were independently indexed and pooled as a single sample. Three CapSMART and three Non-CapSMART libraries were also independently indexed and pooled as a single sample. For indexing PCR, initial denaturation was carried out at 95uC for 10 min, followed by 12 cycles of the following thermal-cycle profile: denaturation at 95uC for 20 seconds, annealing at 60uC for 20 seconds, and extension at 72uC for 30 seconds. PCR amplification was confirmed by electrophoresis, as described previously. PCR products were purified by Agencourt AMPure XP and eluted using 30 ml of Elution Buffer (Qiagen). Libraries independently prepared using SMART, CapSMART and Non-CapSMART technology were quantified using the Qubit dsDNA HS Assay Kit. Equal amounts of SMART quantified libraries were pooled in a single tube, and sent for sequencing (Sequencing Core, Biodiversity Research Centre, Academia Sinica). In addition, libraries equivalent to 300 mg of CapSMART and to 50 mg of Non-CapSMART were pooled separately in single tubes, and sent for sequencing.   Sequencing was performed using the following Illumina instruments: HiSeq for adult poly A+ and MiSeq for embryo poly A+ samples, in accordance with the manufacturer's protocol (with the exception of the use of a modified read 1 sequencing primer). Different custom sequencing primers were used depending on the library preparation method (Table 3:

Sequence analysis
Reproducibility tests were performed using Fastx-Toolkit, MySQL, and Perl scripts, as described below. First, sequences obtained from pooled samples using the ligation method were separated into different sample sources based on unique TAG sequences ( Table 2: TAG02, TAG04, TAG05, TAG06, TAG07,  TAG12) using fastx_barcode_splitter.pl, and allowing a single Table 4. Read numbers of sequences obtained from multiplexed samples using adult poly A+ RNA. Total  ID01  ID02  ID03  ID04  ID05  ID06   164,160,619  25,882,115  26,062,243  28,514,523  28,076,062  27,965,537  27,660,139   Ligation   Total  TG02  TG04  TG05  TG06  TG07   mismatch. Next, frequencies of identical sequences from each library were counted using fastx_collapser. After tidying the data format using Perl script, datasets were imported into MySQL, and the frequencies of identical sequences between libraries were extracted. In this comparison, sequences which occurred less than 10 times were disregarded. This reproducibility test was performed for adult poly A+ RNA libraries prepared by both SMART and ligation methods.

SMART
Mapping and data filtration of reads were performed using Bowtie2 [11] and MySQL, as follows. First, reads from each library were mapped onto the Drosophila melanogaster genome (Release 5: [12]) using Bowtie2 with the default settings. Next, output SAM files were imported into MySQL, and positional information (counts for each position) was extracted using MySQL commands. Only those sequences that mapped onto the minus strand of chromosome 2L were analyzed in the present study.
Mitochondrial transcript frequency was estimated for adult poly A+ RNA libraries as follows. First, fastx_collapser command (Fastax-tool kit) was performed for each library. Next, the output files were imported into local BLAST+ [13] and used as reference sequence. BLAST searches were subsequently performed using the first 100 bp of each mitochondrial gene sequence as query. BLAST was performed using the default settings, but the dust option was disabled, and seeding word size was set to 50 bp.

Results and Discussion
Six adult poly A+ RNA libraries were prepared using both the SMART and ligation methods, and each of the six libraries were pooled and analyzed in two lanes. The CapSMART and Non-CapSMART methods were used to prepare three libraries each, and the six resulting libraries were pooled and analyzed in a single lane. Using three lanes of an Illumina HiSeq sequencer, the SMART method generated a total of 164,160,619 reads, the ligation method generated a total of 130,314,839 reads, and the CapSMART and Non-CapSMART methods generated a total of 150,847,202 reads (Table 4). Embryonic poly A+ RNA was used to prepare one library using the SMART method, and four libraries using the ligation method. Using an Illumina MiSeq sequencer, the SMART method generated 8,461,669 reads and the ligation method generated a total of 9,688,990 reads.

Reproducibility
Reproducibility was determined by comparing the frequencies of the same sequences between SMART-and ligation-derived libraries ( Fig. 5 and 6). High reproducibility was observed for libraries prepared by the SMART method (Fig. 5). In contrast, relatively poor reproducibility was observed for libraries prepared by the ligation method (Fig. 6). The large amount of bias observed using the ligation method is assumed to be caused by ligation bias [14,15,16].

Distribution of 59 ends
Fragments obtained using all four methods were mapped onto the Drosophila melanogaster genome sequence (Release 5: [12]). Sequences mapped onto the minus strand of chromosome 2L were   (Figs. 7-9).
The first example (Fig. 7) encompasses the region surrounding Rapgap1 and CG13791 (FlyBase: [17]). In this region, large differences in frequency distribution were observed between libraries generated by different methods. No peaks were observed in libraries prepared by the ligation method. In contrast, two clear peaks were observed in the SMART library, which correspond to Rapgap1 and CG13791. The second example (Fig. 8) is the region surrounding the jet, Jon25Bi, Jon25Bii, and Jon25Biii genes. Libraries constructed using adult poly A+ RNA as template revealed that this region contains three repeats with a right-facing ''swan-shaped'' distribution. Although the shape of the swans were similar in libraries derived from all four methods, the swan's bodies were smaller (i.e., the frequency distribution was reduced) in libraries prepared by the ligation method. Although we present only a few examples here, we observed several swan-shaped distributions in our dataset. In contrast to the libraries generated using adult poly A+ RNA, very low transcript frequencies were observed in libraries prepared using embryo poly A+ RNA. This observation is consistent with a previous report that gene transcription is lower at embryonic stages than in the adult [18]. The third example is the region surrounding the RpL36A gene ( Fig. 9). In contrast to the other examples, the frequency distributions in this region were very similar when adult poly A+ RNA was used as template, regardless of library preparation method. On the contrary, large differences in frequency were observed between libraries prepared by different methods using embryonic poly A+ RNA as template (frequency was reduced at the left side peak for ligation-derived libraries, and at the right side peak in SMART method-derived libraries). In addition, the shape of the distribution was affected by the use of embryonic poly A+ RNA, as evidenced by the sharp central peak of libraries derived from the ligation, but not SMART method (Fig. 9).
The library preparation methods used in this study differ in that the SMART method does not require the 59 cap structure (Fig. 1), while the ligation method does require it (Fig. 2). Therefore, peaks observed only in libraries generated by the SMART method represent 59 ends without a cap structure. As such, these results indicate that a large proportion of mRNAs lack a 59 cap structure. Until recently, decapping was believed to be an irreversible process that committed an mRNA molecule to degradation [19]. However, recent studies have indicated that recapping of mRNA may occur in the cytoplasm [20,21]. Therefore, mRNAs without a cap structure may serve as a potential source of mRNA under certain conditions.
It is interesting to note that genes differed in the shape of their peaks. Swan-shaped distributions were observed for Jon25Bi, Jon25Bii, and Jon25Biii (Fig. 8). In contrast, sharp peaks were observed for the distributions of CG13791 (Fig. 7) and jet (Fig. 8). Shapes also differed between developmental stages; for example, the central peak of RpL36A was considerably narrower in ligation libraries derived from embryonic RNA than those derived from adult RNA (Fig. 9). The 59 untranslated regions play an important role in gene translation [22,23,24], but the underlying regulatory mechanisms are still largely unknown. Investigation of these mechanisms is beyond the scope of the present study. However, the methods described here will provide the means to elucidate such mechanisms.
Although lower reproducibility was observed using the ligation method (Fig. 6), mapping analyses revealed highly similar frequency distribution patterns between the libraries, irrespective of the tags used (Figs. 8 and 9: TG02 and TG04). This may be due to the wide range of transcription start sites, which normalize sequence-specific ligation bias. However, quantitative skews were sometimes observed at the transcription start sites of genes with a sharp peak distribution (such as CG13791, Fig. 7 and jet, Fig. 8). Therefore, we recommend the use of the same or a random tag sequence to facilitate comparisons, and the application of an Illumina-style ''indexing'' system for multiplexing [10]. An advantage of the ligation method is its high dependency on cap structure (see below). This high dependency enables us to determine the exact position of the transcription start site of mature capped mRNAs, which is not possible using the other three methods.

Frequency of mitochondrial transcripts
The numbers of mitochondrial transcripts (which have no cap structure) obtained using the four library preparation methods are summarized in Tables S1, S2, and S3. We observed that few mitochondrial transcripts were sequenced using the ligation method, confirming that this method is highly dependent on cap structure.
By modifying the SMART method, we developed two additional library preparation methods, CapSMART and Non-CapSMART. Both of these methods are based on ligation of nonnatural nucleotides [25] to non-capped mRNA (CapSMART) or capped mRNA (Non-CapSMART) to suppress synthesis of nontarget mRNA molecules (Fig. 3, 4). Although the frequency of mitochondrial transcripts indicated successful enrichment of target mRNA (fewer transcripts from CapSMART than from Non-CapSMART; Table S3), this pattern was not entirely consistent (e.g., a higher number of mitochondrial transcripts were observed in CapSMART ID03; Table S3). We hypothesize that this inconsistency may arise from ligation efficiency bias of non-natural nucleotides (iGiCiG and iCiGiC) to non-target molecules.

Conclusions
We have developed four methods of using the Illumina platform to sequence mRNA 59 ends. All four methods require small amounts of starting poly A+ RNA (minimum of 25 ng for the SMART method), and the entire library construction procedure can be completed in two to four days. Furthermore, all libraries were developed using commercially-available kits supplemented with additional oligos, making it easy for any laboratory to repeat these procedures. The SMART method outperformed the ligation  Plots from three libraries using SMART and ligation methods {adult (TG02, TG04,  ID01, and ID02) and embryo RNA} and from three libraries using CapSMART and Non-CapSMART methods (ID01, ID02, ID03, ID04, ID05, and ID06) are depicted in the figure. doi:10.1371/journal.pone.0101812.g008 method in terms of reproducibility, and therefore, this method is suitable for the quantification of mRNA abundance. In contrast, the ligation method is able to selectively sequence mRNAs with a 59 cap structure. This latter technique promises to increase our understanding of the distribution of the 59 end of genes. Finally, the resulting 59 end profiles provide fresh insights into 59 untranslated regions, indicating that mRNAs without a cap structure are abundant.

Supporting Information
Table S1 Counts of mitochondrial transcripts obtained from libraries prepared by the SMART method (ID01-ID06) using adult poly A+ RNA. Mitochondrial 16S rRNA genes are known to be adenylated (Neira-Oviedo et al. 2011), and therefore their occurrence had been included for comparison purposes. (XLSX)

Table S2
Counts of mitochondrial transcripts obtained from libraries prepared by the ligation method (TG02-TG12) using adult poly A+ RNA. Because of the low occurrence rates, parts-per notation were depicted as ppm in this table. (XLSX)