Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud
Fig 4
RNA-seq library enrichment strategies that influence interpretation and analysis.
RNA-seq library construction protocols differ widely, and these differences have significant consequences for data interpretation and analysis. The figure above illustrates representative alignment results for either total RNA or one of three commonly used enrichment strategies at a hypothetical genomic locus with very highly expressed ribosomal RNA (pink), highly expressed protein coding (green), lowly expressed protein coding (brown) and lowly expressed noncoding RNA (blue) genes. (A) If total RNA is sequenced without enrichment, the vast majority of reads correspond to a small number of very highly expressed RNA species such as ribosomal RNAs (rRNAs). In humans, ~95%–98% of all RNA molecules may be rRNAs. A significant amount of genomic DNA (gDNA) and unprocessed heteronuclear RNA (hnRNA, also known as pre-mRNA) contamination may also remain after typical RNA isolation procedures. As a result, most reads will align to intronic, intergenic, and especially to ribosomal gene regions. Since analysis of these molecules is rarely the target of RNA-seq, various enrichment strategies are commonly employed. The amount of gDNA contamination in total RNA can be reduced, but not entirely eliminated, by use of a deoxyribonuclease (DNase) treatment. The amount of unprocessed RNA can be reduced, but not entirely eliminated, by employing an RNA isolation method that attempts to keep nuclei intact and removing these to enrich for mature mRNAs present in the cytoplasmic compartment. Additional strategies are discussed in S3 Table. * When sequencing total RNA, a complete representation of the transcriptome is theoretically present, but in practical terms, insufficient sequence reads are obtained to sufficiently sample all transcripts of all types, and some enrichment strategy is required to reduce extremely abundant rRNA species. (B) Selective rRNA reduction kits use oligonucleotides complementary to ribosomal sequences to specifically reduce the abundance of rRNAs while maintaining a broad representation of transcript species. Since the oligonucleotide probes used in these kits are only designed to bind to and deplete rRNA sequences, a significant amount of unprocessed RNA and gDNA contamination may remain. (C) Poly(A) selection and (D) cDNA capture methods specifically enrich for (primarily) mature polyadenylated RNA species or specific targets (e.g., all known transcript exons), respectively. Since poly(A) selection specifically targets RNAs that have been polyadenylated—a modification that happens at the end of the transcription process—poly(A) selection results in an enrichment for mature, completely processed RNAs. Poly(A) selection and cDNA capture methods sacrifice some transcriptome representation for increased signal to noise for transcripts of greater interest. Poly(A) methods will fail to represent most noncoding and other nonpolyadenylated RNAs. Capture methods on the other hand will under-represent any loci not specifically included in the capture design. For example, in this case the brown gene was not included in the design, and therefore, expression of this gene would be underestimated. Each of the methods depicted here has advantages and disadvantages (S3 Table and S7 Table). Furthermore, the relative amounts of each class of RNA depicted in each panel are hypothetical examples meant to demonstrate the goals and principles of each enrichment strategy and should not be interpreted quantitatively. Refer to S4 Table for additional information on the effect of each enrichment strategy.