RNA velocity unraveled

doi:10.1371/journal.pcbi.1010492

Fig 1.

A summary of the user-facing workflow of a typical RNA velocity workflow.

Initial processing of sequencing reads produces spliced and unspliced counts for every cell, across all genes. Inference procedures, implemented in velocyto and scVelo, fit a model of transcription, and predict cell-level velocities. The final embedding of cells and smoothed velocities is displayed in the top two principal component dimensions. Visualizations adapted from [25, 26]; dataset from [1]. The DNA and RNA illustrations are derived from the DNA Twemoji by Twitter, Inc., used under CC-BY 4.0.

More »

Expand

Fig 2.

An RNA velocity workflow, beginning with read processing and ending with two-dimensional projection, and the parameters that must be specified by the user.

More »

Expand

Fig 3.

a. The continuous model of transcription, splicing, and degradation used for RNA velocity analysis. b. Plots of α(t), μ_u(t), and μ_s(t) over time t and the corresponding governing equations for the system. Dashed lines indicate time of switching event. c. Outline of the common phase portrait representation, with both steady state and dynamical models denoted. Adapted from [1]. The DNA and RNA illustrations are derived from the DNA Twemoji by Twitter, Inc., used under CC-BY 4.0.

More »

Expand

Fig 4.

A two-intron mRNA species may not have well-defined “unspliced” and “spliced” forms.

More »

Expand

Fig 5.

Distortions in data and instabilities in the inferred γ/β values introduced by the imputation procedure on the forebrain data from [1].

Column 1: Raw data (points: spliced and unspliced counts with added jitter; color: cell type, as in Fig 1; line: best fit line u = γs/β + q, estimated from the entire dataset). Columns 2–4: Normalized and imputed data under various values of k (points: spliced and unspliced counts; color: cell type, as in Fig 1; line: best linear fit u = γs/β + q, estimated from extreme quantiles). Column 5: Inferred values of γ/β (red, left axis) and inferred fraction of upregulated cells, defined as ∑_iu_i − (γs_i/β + q)>0 (blue, right axis).

More »

Expand

Fig 6.

Normalization followed by two rounds of dimensionality reduction introduce distortions in the local neighborhoods.

a.– d. Histograms of Jaccard distances between intermediate embeddings. e. Empirical cumulative distribution functions of Jaccard distances between intermediate embeddings, as well as the overall distortion (Ambient vs. PCA 2 and Ambient vs. tSNE 2). The palette used is derived from dutchmasters by EdwinTh.

More »

Expand

Fig 7.

Performance of cell and velocity embeddings on the forebrain data.

Top: PCA embedding with linear baseline and nonlinear aggregated velocity directions. Bottom: UMAP and t-SNE embeddings with nonlinear velocity projections. The palette used is derived from dutchmasters by EdwinTh.

More »

Expand

Fig 8.

Markov Chain process time versus expression pseudotime.

a. Simulated gene expression for 2000 cells over 4 states (states A, B, C, and D) with a bifurcation at C/D showing spliced counts of a single gene at the sampled process times. The abbreviation a.u. denotes arbitrary units. b. Ordering of all cells by expression pseudotime coordinate, calculated as the Euclidean distance between each cell and the root cell (cell at time t = 0) and scaled to between 0 and 1. c. The sampled cells colored by the calculated pseudotime value.

More »

Expand

Fig 9.

The RNA velocity count processing and inference workflow, applied to data generated by stochastic simulation.

a. Schematic of the impulse model of gene modulation. b. Demonstration of the concordance between simulation and analytical solution for the occupation measure. i.: nascent mRNA counts; ii.: mature mRNA counts (gray: simulation; blue: occupation measure). c. Smoothing and imputation introduce distortions into the data. i.: raw data; ii.: data normalized to total counts; iii. imputed data (points: raw or processed observations; lines: ground truth averages μ_s and μ_u; red: spliced; yellow: unspliced). d. Local averages obtained by imputation are not interpretable as instantaneous averages. i.: mean unspliced; ii.: mean spliced; iii. variance unspliced; iv.: variance spliced (black points: true moment vs. pooled moment; blue line: identity; blue region: factors of ten around identity). e. Smoothing and imputation improve the inference on extrema. i.: moment-based inference from raw data; ii.: extremal inference from normalized data; iii.: extremal inference from imputed data (black points: true vs. inferred values of γ/β; blue line: identity; blue region: factors of ten around identity). The palette used is derived from dutchmasters by EdwinTh.

More »

Expand

Fig 10.

Performance of cell and velocity embeddings on simulated data, compared to ground truth velocity directions.

a. Linear PCA embedding of ground truth velocities. b. Linear PCA embedding of inferred velocities. c. Nonlinear PCA embedding of inferred velocities. d. Nonlinear, Boolean PCA embedding of inferred velocities. e. Embedding of ground truth principal curve; trajectory directions displayed to guide the eye. f. Distribution of cell-specific angle deviations relative to ground truth velocity directions.

More »

Expand

Fig 11.

Performance of cell and velocity embeddings on simulated data, compared to ground truth principal curve.

Top: PCA embedding with linear baseline and nonlinear aggregated velocity directions, as well as ground truth principal curve; trajectory directions displayed to guide the eye. Bottom: UMAP and t-SNE embeddings with nonlinear velocity projections.

More »

Expand

Table 1.

The datasets used to compare performance of molecule quantification software.

More »

Expand