Fig 1.
A. Three models of RNA life cycle considering different biological processes: Model 1 (CSP-Baseline): Reaction dynamics model for new RNA l(t) ignoring the splicing process, where α is the transcription rate and γt is the total mRNA degradation rate. Model 2 (CSP-Splicing): Reaction dynamics model of new unspliced and spliced mRNA (ul(t), sl(t)) considering the splicing process, where β is the splicing rate, γs is the spliced mRNA degradation rate, and α is the same as Model 1. Model 3 (CSP-Switching): Reaction dynamics model of new RNA l(t) considering gene state switching, where α and γt are the same as in Model 1, kon is the rate at which the gene switches from the inactive state to the active state, koff is the opposite. B. Complete workflow diagram for parameter inference and downstream analysis based on stochastic dynamics of new mRNA considering technical noise. C. Specific parameter inference strategies for one-shot/pulse experiments and steady-state/non-steady-state assumption.
Fig 2.
Stochastic model combined with steady-state assumptions for one-shot data without splicing information.
Storm in this figure refers to the inference strategy of CSP-Baseline model combined with the steady state assumption. A. Streamline projected in the UMAP space plots of primary human HSPCs datasets from scNT-seq [26]. B. Degradation rates γt estimated with steady-based method in Storm compared to that of the Dynamo method in the primary human HSPCs datasets. C. Streamline projected in the UMAP space plots of neuronal activity under KCl polarization datasets from scNT-seq [18]. D. Same as B., but for the neuronal activity datasets. E. Streamline plots of the sci-fate dataset [19] reveal two orthogonal processes of GR response and cell-cycle progression. From left to right: streamline plot on the first two PCs, the second two PCs, and the first two UMAP components that are reduced from the four PCs, respectively. The first row is the result of Storm and the second row is the result of Dynamo. F. Streamline projected in the UMAP space plots of the dataset from PerturbSci-Kinetics [22].
Fig 3.
Storm analyzes one-shot data with both splicing and labeling information without steady-state assumption.
Storm in this figure refers to the inference strategy of CSP-Splicing model combined with scVelo. A. Streamline projected in the PCA space plots of one-shot bifurcation simulation data. Left: Storm; Right: Dynamo. B. Comparison of the estimated degradation rate with the true degradation rate in one-shot bifurcation simulation data. cellDancer uses the average of cell-wise degradation rates. C. Distribution plot of the difference between the estimated degradation rate and the true value, including Storm, Storm (selected) and Dynamo. D. Heat map of absolute error between estimated and true gene-cell-wise transcription rates α of one-shot bifurcation simulation data. Left: Storm; Right: cellDancer. E. Streamline plot in the UMAP space of the murine intestinal organoid system dataset from scEU-seq [21]. F. Streamline projected in the RFP_GFP space plots of cell cycle dataset from scEU-seq [21]. On the left is the result of taking only the data labelled with 15 minutes, and on the right is the data labelled with 30 minutes. G. Comparison of degradation rates (γs in Storm and γt in Dynamo) in cell cycle datasets with labeling duration of 15 and 30 minutes.
Table 1.
The proposed sample-specific hypothesis test results on whether the number of new mRNA molecules in the Cell Cycle dataset obeys the CSP and CSZIP distributions.
UTD means that it is unable to determine because there are too few groupings resulting in zero degrees of freedom, when it is always a perfect fit. The significance level is 0.05.
Fig 4.
Statistical analysis of cell cycle dataset.
A. Observed counts, expected counts of CSP distribution, and expected counts of CSZIP distribution of new mRNA molecules of the two example genes RPL41 and IL22RA1. The first row: Fitting results of the RPL41 gene with a small number of mRNA molecules; The second row: Fitting results of the IL22RA1 gene with a higher number of new mRNA molecules (truncated to 11 for better visualization). PCSP and PCSZIP refer to the p-values of the cell-specific chi-square tests with the corresponding distributions. B. Comparison of the total mRNA counts with different labeling durations of the four example genes TSPOAP1, GPRC5A, ADAMTS6 and APEX1. P refers to the p-value of the Chi-square contingency table independence test. C. Results of chi-square independence test for total RNA counts (significance level 0.05). “Same” here means accepting the null hypothesis of the chi-square independence test that total RNA counts with different time durations obey the same distribution. “Different” means the opposite.
Fig 5.
Parameter inference and enrichment analysis for the cell cycle dataset.
The inference strategy involved in this figure is for kinetics/pulse data. A. Comparison of parameter inference results of our three stochastic models. From left to right are the comparison of γt of CSP-Baseline and CSP-Switching, the comparison of γt of CSP-Baseline and CSP-Splicing, the comparison of γt and γs in CSP-Splicing. The overlapping well-fitted genes were set as the overlap set of genes in the top 40% of the goodness-of-fit for both methods. B. Comparison of inferred parameters between our stochastic models and Dynamo’s method. Left: the comparison of γt between CSP-Baseline and Dynamo. Right: the comparison of β between CSP-Splicing and Dynamo. C. Comparison of the goodness-of-fit of the three stochastic models. Left: all highly variable genes. Right: genes in the top 10% of average new mRNA expression in highly variable genes. Here Base refers to the CSP-Baseline model, Splic to the CSP-Splicing model and Switch to the CSP-Switching model. D. Robust analysis. Left: Landscape of CSP-Baseline-based loss functions for the a typical gene WWTR1. Right: Scatter plot of robustness measure and goodness of fit for parameter inference. E. Enrichment analysis results of genes with high gene-wise γt, β (top 50%) in well fitted genes (top 40% of goodness of fit). F. Heat map of cell-wise parameters for well-fitted genes. From left to right, cell-wise α based on the CSP-Baseline, cell-wise αpon based on the CSP-Switching and cell-wise β based on the CSP-Splicing, respectively. Across all three heatmaps, the X-axis is the relative cell cycle position while the order of genes in the y-axis is arranged such that the peak time of each gene increases from the top left to bottom right.
Fig 6.
RNA velocity analysis of the cell cycle dataset.
The inference strategy involved in this figure is for kinetics/pulse data. A. Comparison of total RNA velocity streamline visualizations between three stochastic methods and Dynamo in cell cycle dataset. B. Comparison of average correctness of total velocity in gene expression space and RFP_GFP space. The p-values are given by the one-sided Wilcoxon test. Here Base refers to the CSP-Baseline model, Splic to the CSP-Splicing model and Switch to the CSP-Switching model. C. Similar to B, comparison of velocity consistency. D. The duration time (unit: hour) of each cell cycle phase of the human RPE1-FUCCI system based on Storm’s CSP-Baseline and Dynamo. E. Total RNA velocity streamlines calculated using Storm’s CSP-Baseline with gene-wise parameters (instead of using gene-cell-wise parameters except for the degradation rate). F. The smoothed expression of DCBLD2 in different cells. G. Comparison of total RNA velocity in DCBLD2 between CSP-Baseline and Dynamo. H. Phase portraits of new-total RNA planes of DCBLD2 of CSP-Baseline and Dynamo. Quivers correspond to the total (x-component) or new (y-component) RNA velocity calculated by the different methods.