A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells

doi:10.1371/journal.pone.0197162

Fig 1.

The different analysis pipelines.

The flowchart describes the different tools and steps used for microarray (blue) and RNA-Seq analysis (green). Tasks and tools used at different steps are colored in light blue. Tools corresponding to the same steps are grouped and colored as follows: green (P1 with STAR, HTSeq and EdgeR), blue (P2 with STAR, RSEM and edgeR), red (Sailfish and edgeR) and purple (TopHat2, Cufflinks and CuffDiff).

More »

Expand

Table 1.

Overview of total mapping rates over all samples in % for the different RNA-Seq aligner.

Displayed are the mean mapping rates over the complete dataset with the variance in brackets.

More »

Expand

Fig 2.

Mapping distribution.

Mapping distribution in % for all three Aligners for both datasets. On the left the BL2 dataset and on the right the RC dataset is shown.

More »

Expand

Fig 3.

Correlation of all samples after analysis.

The heatmap describes the combined Pearson correlation coefficient over all pairwise correlation tests of normalized gene expression against all replicates between groups. For RNA-Seq all expression values are normalized to tpm (transcript per million), to be able to compare them. Fig. 3A and 3C show the BL2 dataset and the correlation of all samples before (A) and after (C) BAFF stimulation for each analysis tool used. Fig 2B and 2D show the correlation of patient samples for the two groups with good prognosis of distant free metastases (good) and a bad prognosis (bad) together with the different analysis pipelines used.

More »

Expand

Table 2.

Overview of the number genes and GO-terms significant (p-value <5%) and after FDR correction for P1-4 and microarray.

The GO-terms for microarray are in bold, because the p-value was used as a cutoff instead of the FDR.

More »

Expand

Fig 4.

Significant overlapping genes for the different strategies after multiple test adjustment.

Shown are two venn diagrams, one for each dataset (BL2 Fig. 4A and RC Fig. 4B). The different pipelines used here are: TopHat2 and Cufflinks (T&C), STAR and HTSeq-Count (S&HT), Sailfish (Sa), STAR and RSEM (S&R). The microarray data is not included, because there were close to no significant genes after FDR adjustment.

More »

Expand

Table 3.

Overview of the proportion of genes and corresponding percentage of differential expressed genes for each pipeline after multiple testing adjustment.

‘consensus’ stands for the amount of genes shared with at least two other pipelines and ‘unique’ for genes not found by any other Pipeline from the total amount of genes found by each Pipeline.

More »

Expand

Fig 5.

Top20 significant enriched GO-categories for BL2 and RC.

In all shown RNA-Seq and microarray strategies the visualized GO-categories were enriched (p-value smaller than five percent and the pathway was bigger than four genes). The enrichment of GO-terms is shown in red: the higher the intensity of red, the lower the p-value. For better scalability of colors the negative log 10 was chosen. The pathways agreeing the most amongst all pipelines are shown at the top.

More »

Expand