The Case for Junk DNA
(A) Analysis of nascent and total poly(A)+ RNA levels from mouse liver nuclei. Nascent (i.e., polymerase-associated) RNA and poly(A)+ RNA were isolated from mouse liver nuclei and analyzed by high-throughput sequencing. Individual reads were categorized by their source. Exonic and intronic are from known referenced genes (i.e., “RefSeq” genes), while intergenic originate from nonreferenced loci (i.e., “non-RefSeq”) in the mouse genome. Reproduced from . (B) Empirical Cumulative Distribution Function (ECDF) of transcript expression in each cell compartment as determined by the ENCODE consortia. Results for RNA that either contain (“polyA+”) or lack (“polyA−”) a poly(A)-tail in the nucleus and cytosolic fractions are shown. Each human cell line that was analyzed is represented by three lines, one for each pool of RNA (red for protein-coding RNAs, blue for lncRNAs [“noncoding”], and green for intergenic transcripts [“novel intergenic”]). The lines indicate the cumulative fraction of RNAs in a given pool (y-axis) that are expressed at levels that are equal or less than the reads per kilobase per million mapped reads (RPKM) on the x-axis. Total numbers in each pool are as follows: reference protein coding genes: 20,679, loci producing lncRNAs: 9,277, and regions producing intergenic transcripts: 41,204. Transcripts with expression levels of 0 RPKM were adjusted to an artificial value of 10−6 RPKM so that the onset of each graph represents the fraction of nonexpressed genes or loci. Note that 1–4 RPKM is approximately equivalent to one copy per tissue culture cell , . Using this figure, one can easily deduce that the vast majority of intergenic transcripts are present at levels less than one copy per cell. Reproduced with permission from .