Figure 1.
Schematic presentation of experimental plan used in this study.
Samples from 4 platelet donors were investigated. One sample (S0) was used for isolation of polyA+ transcripts. The 3 other samples (S1, S2, and S3) were used for analysis of total RNA after depletion of ribosomal RNA (rRNA).
Figure 2.
Mapping strategies and abundance estimates.
i) Alignment of reads (short red lines) to the human reference genome hg19 (thick blue line) using the TopHat program that aligns RNA-Seq reads to the genome while also attending to splice junction reads. Abundance estimates obtained by counting the number of reads that map within the coordinates defining the corresponding gene with RefSeq annotations; ii) Alignment of reads (short red lines) to human reference (RefSeq) mRNA (thick blue line with polyA tail) using the bwa software for abundance estimates; iii) Alignment of reads (short red lines) to a de novo assembled transcript reported by Trinity (thick red line with polyA tail and green SMARTer IIA oligonucleotide as 5′-leader sequence) using Blat for identification and RSEM for abundance estimates.
Figure 3.
Read start position density on ACTB mRNA.
The horizontal axis shows the distance in nucleotides (bp) from the 5′-end of ACTB mRNA, and the vertical axis shows the natural logarithm of the number of uniquely mapped reads. The fitted red line calculated over the transcript body ignoring both ends corresponds to exponential decay of approximately 50% per 250 bp upstreams fom the polyA-site in the 3′-UTR. Correlation coefficient: 0.93, Slope: 0.0027638, Std error: 0.0002751, t value: -10.05, p-value: 4.70e-08 ***. (Statistics and graph generated by the R-program).
Table 1.
Distribution of mapped reads for samples S0, S1, S2 and S3.
Table 2.
TopHat alignment of PolyA + mRNA to genome.
Figure 4.
Mapping of S0 (poly(dT) selected transcripts) against RefSeq mRNA.
The horizontal axis shows the distance in nucleotides from the 5′-end of the transcript (bin length = 100 bp), and the vertical logarithmic axis shows the sum of uniquely mapped reads to each position of the bin. The slope of the dotted line corresponds to the exponential decay function derived in Fig. 3. The sudden “drops” correspond to polyA-sites. As seen in the figure NM_002704 (PPBP) has two polyA-sites which correspond to the known polyA-sites at positions 708 and 1307, respectively. The abundance of the longer PPBP transcript appears to be hundred-fold lower than that of the shorter transcript.
Figure 5.
Snapshot of UCSC Browser Blat alignment of de novo assembled transcript variant comp1_c0_seq1 mapping to TMSB4X.
The 5′-leader sequence matches the SMARTer IIA oligonucleotide. The Trinity de novo assembled nucleotide sequence is identical to the GRCh37/hg19 reference. Part of the polyA tail is also included. Splice junctions are marked in turquoise.
Table 3.
de novo assembly of platelet transcripts.
Figure 6.
Biological coefficient of variation of samples S1, S2 and S3 as estimated by TopHat/HTSeq/edgeR software.
As expected the more highly expressed genes show much lower dispersion estimates than the mean value. “CPM” represents counts per million.
Figure 7.
Plot showing the magnitude of FPKM gene expression in rRNA-depleted total RNA in pair-wise comparisons between sample S1 and sample S2.
Each dot represents a S1/S2 pair for a gene that had detectable expression in both samples. Pearson's correlation coefficient = 0.99. (TopHat/Cufflinks/Cuffdiff/CummeRbund software).
Table 4.
TopHat/Cufflinks alignment of rRNA-depleted total RNA to genome (excluding ncrna).
Figure 8.
Graphs showing the dispersion and log2 fold change, respectively, when comparing the two male samples S1 and S3 with the female sample S2 using DESeq.
The “dispersion” on the y-axis in the left-hand plot represents the square of the coefficient of biological variation, and the red “hockey-stick” line is a fitted curve through the estimates of the dispersion value for each gene. In the right-hand plot, the horizontal red line represents equal expression in male and female samples. Red dots represent differentially expressed genes at 10% FDR, and red triangles represent red dots that lie outside the graph (above or below). The identity of the differentially expressed genes and the corrresponding log2 fold changes can be found in Table 5 (columns 2 and 8, respectively).
Table 5.
Significantly differentially expressed genes in male and female platelets at 10% FDR as estimated by DESeq.
Figure 9.
Heatmap showing normalized levels of expression for the 30 most highly expressed gene transcripts across mRNA and rRNA-depleted total RNA samples from the 4 different patients.
Nearly all differences of intensity for a given gene are likely to represent preparation artefacts, i.e. due to the poly(dT) enrichment and rRNA-depletion, respectively. Sample names have a ‘C’ added to indicate that the intensities represent length- and method-adjusted counts (TopHat/bedtools/DESeq and “in-house” software).
Figure 10.
Histogram of p-values from the call to negative binomial test with DESeq comparing the length- and method-adjusted counts of polyA + mRNA sample S0 with the rRNA-depleted total RNA samples S1, S2 and S3.
Most of the circa 500 remaining significant differences after length- and method-adjusted normalization presumably represent preparation artefacts, i.e. due to the poly(dT) enrichment and rRNA-depletion, respectively. However, protein coding transcripts lacking a polyA-tail should also appear as differentially expressed. Note that omission of the length- and method-adjusted normalization yields a couple of thousand “differentially expressed” genes (TopHat/bedtools/DESeq and “in-house” software).
Table 6.
Significant DEφ among the most abundant transcripts in polyA+ mRNA versus rRNA-depleted total RNA.
Figure 11.
The platelet transcriptome data compared with RNASeq data from Illumina's Human BodyMap 2.0 project.
The integrated platelet data from samples S0, S1, S2, and S3 represent counts obtained with TopHat, Ensembl annotations, and the HTSeq-counts program. The Illumina codes are as follows. ERS025098 adipose, ERS025092 adrenal, ERS025085 brain, ERS025088 breast, ERS025089 colon, ERS025082 heart, ERS025081 kidney, ERS025096 liver, ERS025099 lung, ERS025086 lymphnode, ERS025084 mixture, ERS025087 mixture, ERS025093 mixture, ERS025083 ovary, ERS025095 prostate, ERS025097 skeletal_muscle, ERS025094 testes, ERS025090 thyroid, ERS025091 white_blood_cell.
Figure 12.
Differential expression of mitochondrial (MT)-genes in total RNA vs mRNA preparations.
The figure shows that apart from MT-RNR1, MT-RNR2 and MT-TF, mitochondrially encoded gene expression levels were rather similar in rRNA-depleted total RNA and polyA + mRNA preparations (TopHat/HTSeq/edgeR software). “FC” denotes fold change whereas “CPM” represents counts per million.
Table 7.
Read count table for mitochondrially encoded genes for samples S0, S1, S2 and S3.
Figure 13.
Classification of the proteins coded by the most abundant (top 50) coding transcripts of human platelets.
Bars represent molecular function categories generated by the PANTHER gene ontology classification web-based tool. A) Sequencing was performed on polyA+ enriched RNA, whereas in B) rRNA-depleted total RNA was analyzed.
Table 8.
The function of the proteins coded by top 50 platelet genes, as provided by PANTHER gene ontology classification web-based tool.