Fig 1.
scPipe is an R/Bioconductor package that uses functionality from a number of other packages, including Rsubread to align reads to a reference genome (although in practice any aligner that produces BAM files can be used for read alignment) and SingleCellExperiment to organise the counts and sample annotation information. The major steps in the preprocessing pipeline of scPipe are shown along with the quality control (QC) statistics collected at each stage. The final output of this process is a matrix of counts and QC metrics for use in downstream analysis and an HTML report that summarises the analysis. The resulting SingleCellExperiment object can be used as input to other Bioconductor packages to perform further downstream analysis. scPipe logo created by Roberto Bonelli published under a CC0 1.0 license (https://github.com/Bioconductor/BiocStickers/blob/master/scPipe/README.md).
Fig 2.
Example QC plots that can be created using output from scPipe to assess data quality both within and between experiments.
Within experiment displays include (A) a bar plot illustrating the overall cell barcode matching results to assess sequencing accuracy across all samples and (B) a stacked bar plot showing the mapping rate, separated into reads that map to exon, intron and ERCCs and those that are ambiguously mapped, map elsewhere in the genome and are unaligned for each cell in an experiment (ordered by exon mapping rate). Between experiment displays include (C) a stacked bar plot showing the cell barcode matching results from panel (A) from multiple experiments and (D) a ridgeline plot presenting the distribution of proportions of non-mitochondrial read counts for cells across multiple experiments.
Fig 3.
A pairwise scatter plot of the quality control metrics collected by scPipe for the mouse blood cell dataset.
Sample-specific metrics include the total read count, the number of genes detected and the proportion of non-ERCC or non-mitocondrial reads. The good quality and outlier samples detected by scPipe’s automatic outlier detection method are indicated in each panel by a different colour.
Fig 4.
Screenshot of an HTML report created by scPipe for the mouse blood cell dataset.
This report organises the output of scPipe, including run parameters and QC metrics and also generates basic dimension reduction plots of the data. Such reports provide a convenient format for communicating basic QC information to collaborators to help them evaluate the overall quality of an experiment.
Fig 5.
Analysis results produced with SingleCellExperiment-compatible packages from Bioconductor.
(A) Differential gene expression results from comparing B-lymphocytes and T-lymphocytes known a priori in the CEL-seq2 mouse blood dataset. Highlighted points are genes determined to be significantly differentially expressed from LRT with 0.05 FDR cutoff. (B) tSNE coloured by cell type predictions obtained for SC3 clusters for the first spleen sample from the Human Cell Atlas ischaemic sensitivity dataset. The 10 clusters from SC3 were assigned to 5 distinct cell types from the reference set of 12 lineages.
Table 1.
Summary of the data preprocessing software currently available for scRNA-seq analysis and the particular tasks covered by each package.
Fig 6.
Comparing scPipe and CellRanger.
(A) The workflow for the comparison. An equal mixture of cells from three cell lines are sequenced using the Chromium 10X platform (see Materials and methods). Data were processed by scPipe and CellRanger. (B) The pie chart shows the overlap of cell barcode detected by scPipe and CellRanger. (C) The t-SNE plot generated using CellRanger output. Cell barcodes that only exist in CellRanger are highlighted. (D) Box plots showing the percentage of mitochondrial gene counts in cells that overlap with scPipe or only exist in the CellRanger results.