Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data

Qi Jiang; Shuo Zhang; Lin Wan

doi:10.1371/journal.pcbi.1009821

Abstract

Time series single-cell RNA sequencing (scRNA-seq) data are emerging. However, dynamic inference of an evolving cell population from time series scRNA-seq data is challenging owing to the stochasticity and nonlinearity of the underlying biological processes. This calls for the development of mathematical models and methods capable of reconstructing cellular dynamic transition processes and uncovering the nonlinear cell-cell interactions. In this study, we present GraphFP, a nonlinear Fokker-Planck equation on graph based model and dynamic inference framework, with the aim of reconstructing the cell state-transition complex potential energy landscape from time series single-cell transcriptomic data. The free energy of our model explicitly takes into account of the cell-cell interactions in a nonlinear quadratic term. We then recast the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the adjoint method of optimal control. We evaluated GraphFP on the time series scRNA-seq data set of embryonic murine cerebral cortex development. We illustrated that it 1) reconstructs cell state potential energy, which is a measure of cellular differentiation potency, 2) faithfully charts the probability flows between paired cell states over the dynamic processes of cell differentiation, and 3) accurately quantifies the stochastic dynamics of cell type frequencies on probability simplex in continuous time. We also illustrated that GraphFP is robust in terms of cluster labelling with different resolutions, as well as parameter choices. Meanwhile, GraphFP provides a model-based approach to delineate the cell-cell interactions that drive cell differentiation. GraphFP software is available at https://github.com/QiJiang-QJ/GraphFP.

Author summary

Dynamic inference of cell development processes from time series scRNA-seq data is a major challenge. Here, we present GraphFP, a coherent computational framework that simultaneously reconstructs the cell state-transition complex potential energy landscape and infers cell-cell interactions from time series single-cell transcriptomic data. Based on the mathematical framework of nonlinear Fokker-Planck equation on graph, GraphFP models the stochastic dynamics of the cell state/type frequencies on probability simplex in continuous time, where the free energy with a nonlinear quadratic interaction term is employed to characterize cell-cell interactions. We formulate the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the celebrated adjoint method. GraphFP allows for 1) reconstructing cell state potential energy, which is a measure of cellular differentiation potency, 2) charting the probability flows between paired cell states over dynamic processes, 3) quantifying the stochastic dynamics of cell type frequencies on probability simplex in continuous time, and 4) delineating cell-cell interactions that drive cell differentiation. We show how GraphFP can be used to faithfully reveal and accurately quantify the cell development processes using the embryonic murine cerebral cortex development time series scRNA-seq dataset.

Citation: Jiang Q, Zhang S, Wan L (2022) Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data. PLoS Comput Biol 18(1): e1009821. https://doi.org/10.1371/journal.pcbi.1009821

Editor: Qing Nie, University of California Irvine, UNITED STATES

Received: May 22, 2021; Accepted: January 10, 2022; Published: January 24, 2022

Copyright: © 2022 Jiang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: GraphFP software is available at https://github.com/QiJiang-QJ/GraphFP.

Funding: This work was supported by the Fund to LW from the National Key Research and Development Program of China under Grant 2019YFA0709501. LW and SZ were also supported by the National Natural Science Foundation of China (No. 12071466 to LW and No.11871465 to SZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The dynamics of cell developmental processes (e.g., cell differentiation and tumorigenesis) are highly complex. Advances in single-cell RNA sequencing (scRNA-seq) technologies enable cell-resolved investigation of heterogeneous cell populations, offering a systematic approach to reveal underlying developmental dynamics, cell communication, and gene regulation [1]. The dynamic inferences of cell development from scRNA-seq transcriptomic profiles draw heavily on advances in computational and systems biology. Many efforts have been advanced to reconstruct cell developmental trajectories and pseudo-time from the single-cell snapshot profile sampled from an evolving cell population [2]. Methods have also been developed to quantify cell developmental landscape [3–6]. However, state-of-the-art dynamic inference methods based on the static single-cell snapshot profile alone may lack identifiability of complex dynamic processes [7].

Recently, time series scRNA-seq data profiled from cells sampled at multiple physical time stages have been accumulating, accounting for additional temporal dimension. The wider dynamic ranges enriched by the temporal dimension show great promise in overcoming the difficulties that arise during the inferences from static single-cell snapshot profiling. Computational methods that explicitly incorporate temporal information have been developed. TASIC determined the temporal trajectories based on the probabilistic graphical model to integrate expression and time information [8], while CSHMM developed a continuous state hidden Markov model to infer trajectory structure and assigned cells to paths [9]. TSEE incorporated temporal information into a nonlinear dimensionality reduction algorithm of elastic embedding to visualize dynamic gene expression patterns, offering enhanced temporal resolution [10]. ScPADGRN reconstructed the dynamic gene regulatory network with a preconditioned ADMM optimization method [11]. Tempora incorporated biological pathway information to accurately identify cell types and then incorporated the time information to infer evolving cell-type trajectories [12].

An emerging number of methods are being developed to reconstruct cell developmental energy landscape from time series single-cell data using the mathematical framework of optimal transport. Optimal transport has received considerable attention in recent years for various disciplines such as machine learning and statistical data analysis, as it has been proven to be a powerful tool in the analysis of complex data [13]. The core concept of optimal transport, Wasserstein distance between two probability distributions, quantifies an optimal cost of transporting one data distribution to the other. As a remarkably rich and fruitful concept, Wasserstein distance “enables a mechanism transforming the probability space into a Riemannian manifold (known as a Wasserstein manifold), so that geometric structures and partial differential equation (PDE) techniques can be established and analyzed” [14]. Amongst existing methods, Waddington-OT [15] reported landmark work that developed an unbalanced optimal transport framework to reconstruct the cell developmental landscape by inferring cell-cell probabilistic couplings based on the distributions between adjacent time points [15]; TrajectoryNet set up a dynamic optimal transport neural network framework to reconstruct the continuous normalizing flows of evolving cell populations on the continuous state space [16]; PRESCIENT modelled cell differentiation as a diffusion process over a potential energy landscape learned by the neural network framework [17]. However, the computation of optimal transport is still a bottleneck when processing large-scale data [13].

In this study, we present GraphFP, a nonlinear Fokker-Planck equation on graph based model and dynamic inference framework, with the aim of reconstructing the cell state-transition complex potential energy landscape from time series single-cell transcriptomic data. The Fokker-Planck equation (FPE) is ubiquitously applied in the modelling of statistical physics and biological systems, including single-cell data analysis [5, 7, 18]. GraphFP is built on the mathematical framework established by the FPE on finite graph originally introduced by Chow et al. [19, 20] and Li [21] (see [14] for a recent survey). Building upon the fundamental form of optimal transport, GraphFP learns the complex geometry of data, as well as provides a novel way to quantify cell-cell interactions during cell development. It models the cell developmental process as stochastic dynamics of the cell state/type frequencies on probability simplex in continuous time. The discrete Wasserstein distance is introduced to transform the probability simplex into a Riemannian manifold, called discrete Wasserstein manifold. The FPE is proven to be the gradient flow of the free energy on the discrete Wasserstein manifold. The free energy of our model consists of a static linear potential energy term and a nonlinear quadratic interaction energy term that characterizes cell-cell interactions [22, 23]. To estimate the parameters which represent the linear potential energy and cell-cell interaction strengths, we recast the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the celebrated adjoint method of optimal control [24]. The adjoint method also played a central role in the development of the well-known neural network algorithm NeuralODE [25].

We evaluated GraphFP on the time series scRNA-seq dataset of embryonic murine cerebral cortex development [26]. GraphFP reconstructed the cell state potential energy, which is a measure of cellular differentiation potency from both static and dynamic points of view. It faithfully charted the probability flows of cell state-transitions, consistent with the gold standard benchmarks. It also accurately quantified the stochastic dynamics of cell type frequencies on probability simplex in continuous time. GraphFP delineated cell-cell interactions that drive cell development in a model-based fashion. We tested the cell-cell interaction term of GraphFP by illustrating its ability to fit the nonlinear curves of experimental data and recover held-out time points. We illustrated that GraphFP is robust in terms of cluster labelling with different resolutions, as well as parameter choices. We compared GraphFP with state-of-the-art cell developmental energy landscape reconstruction methods in Discussion.

Methods

GraphFP is a coherent computational framework that simultaneously reconstructs the cell state-transition complex potential energy landscape and infers cell-cell interactions from time series single-cell transcriptomic data (Fig 1). It models cell state-transition dynamics with cell-cell interactions based on the mathematical framework introduced by Chow et al. [19, 20] and Li [21].

Download:

Fig 1. Overview of GraphFP algorithm.

GraphFP takes the input of time series single-cell transcriptomic data incorporated with experimental temporal information. It identifies cell states/types, estimates the cell type frequencies at each time point, estimates the linear potential energy Φ and the cell-cell interaction matrix W based on the adjoint method. GraphFP outputs the stochastic dynamics of cell type frequencies p(t) on probability simplex in continuous time, the cell state transition potential energy, and the probability flows of cell state-transitions underlying the evolving cell population.

https://doi.org/10.1371/journal.pcbi.1009821.g001

GraphFP allows for reconstructing cell state potential energy, charting the probability flows between paired cell states over dynamic process, quantifying the stochastic dynamics of cell type frequencies on probability simplex in continuous time, and delineating cell-cell interactions that drive cell differentiation [22]. The main steps for GraphFP are organized in the following subsections.

Identifying cell states/types

Given the time series scRNA-seq data, the single-cell samples are collected at f time stages {t₁, t₂, …, t_f}, and for each time stage t_l(1 ≤ l ≤ f), a number of m_l single cells are sequenced with the corresponding gene expression vectors , where D is the number of genes for single cells. Thus, the single cell gene expression profile of a total number of cells is contained in the data matrix

Suppose that the total number of M cells forms n cell states corresponding to n cell types. When the cells are already annotated or clustered, GraphFP directly utilizes the prior information of cell types. When cell type information is not available, GraphFP will apply state-of-the-art single-cell clustering and annotation methods (e.g., Louvain-Jaccard algorithm [27], the single-cell data analysis platform Seurat [28], and the single cell deep generative method scDEC [29]) to cluster cells into n clusters as the cell states/types.

Constructing the cell state-transition graph

The cell state-transition graph G = (V, E) is an undirected graph, where each vertex i in V represents a cell state/type and each edge {i, j} in E represents that cell state i can directly transit to cell state j or vice versa. In this study, with no inherent reliance on any prior information, we construct the cell state-transition graph G as a complete graph supported on all cell types. However, we can also incorporate prior information of cell state-transition into graph G when available.

Modelling cell state-transition dynamics with the nonlinear FPE on graph equipped with discrete L₂-Wasserstein distance

GraphFP models the stochastic dynamics of cell state/type frequencies in the evolving cell population with the nonlinear FPE on graph. The underlying assumption of this model is that cell state-transitions follow the minimum total kinetic energy path during cell differentiation, which can be measured by the discrete L₂-Wasserstein distance on graph. To estimate the parameters in the nonlinear FPE of GraphFP, we propose a gradient method to find the parameters that minimize the discrete L₂-Wasserstein distance.

We use discrete probability distributions supported on the vertices of cell state-transition graph G to represent the state of the system at time t. Suppose the number of vertices of G is n, the set of system states is the probability simplex supported on all vertices of G:

The cell-state frequencies or probabilities are estimated for each time point separately, resulting in .

Here, we mainly follow the setups in Chow et al. [20]; Li [21], and define the discrete nonlinear free energy of the cell-state system as follows: (1) where , , and represent the static linear potential energy parametrized by vector , the quadratic interaction energy of paired cell states parametrized by matrix W = (w_ij)_{1≤i, j ≤ n}, and Boltzmann entropy with a hyper-parameter β ≥ 0, respectively. In general, the interaction matrix W is asymmetry, with its element w_ij representing the interaction strength from cell state j (as a signalling sender) to cell state i (as a signalling receiver) (see Eq (9) and section “Reconstruction of cell developmental energy landscape and modelling of cell-cell interactions” for further discussion). We hereinafter denote the parameters of the free energy as θ ≡ {Φ, W}.

Based on the specific form of free energy , the corresponding FPE on G is defined as follows: for any i ∈ V, (2) where , N(i) = {j ∈ V|{i, j} ∈ E} is the adjacency set of vertex i ∈ V, and g_ij(p(t)) is defined as (3)

Chow et al. [20] and Li [21] proved that, the dynamics of p(t) is evolving along the gradient of free energy (Eq (1)) when is equipped with the discrete L₂-Wasserstein distance on graph G: for any , where the infimum is taken over is a piecewise C¹ curve that satisfies the FPE of Eq (2), p(t₁) = p¹ and p(t_f) = p^f}. Intuitively, the discrete L₂-Wasserstein distance can be understood as the total work for transporting p¹ to p^f on , which is the summation of the kinetic energies of the flows (mass×squared velocity) on the edges of graph G during the time period [t₁, t_f].

Parameter estimation and model optimization

To estimate the parameters θ = {Φ, W} of the free energy using all time series scRNA-seq data collected at time points {t₁, t₂, ⋯, t_f} (assume that θ is constant over the entire time period), we formulate the estimation problem as a minimization problem of the discrete L₂-Wasserstein distance: subject to the constraints (4) where n is the size of vertex set V. However, the dynamic optimization problem with constrains of multiple fixed points at p(t_l)s in Eq (4) is extremely difficult, and maybe even unsolvable.

In this study, to estimate parameters θ*, we follow TrajectoryNet [16] and relax the constraints of ps at time points {t₂, ⋯, t_f} by moving them into the object function as follows (5) subject to the constraints (6) (7) where λ_l ≥ 0 is a constant regularization parameter, n is the size of vertex set V, and is the Kullback-Leibler divergence between the probability distributions p and q.

The optimization problem of Eq (5) can be interpreted as an optimal control problem with fixed initial point and the parameters θ can be regarded as the control. We therefore propose a gradient method to estimate θ* based on the Pontryagin’s Maximum Principle (also known as the adjoint method) [24] to solve the optimal control problem of Eq (5). In our implementation, we treat the integral part and the KL divergence part in Eq (5) separately, solve each of them using the adjoint method, and then combine them together through the tradeoff parameters λ_ls. See the details and the pseudocode of GraphFP algorithm in S1 Text.

In our implementation, we centralize both Φ and W such that

Therefore, in our study, w_ij with a high absolute value indicates strong interaction from cell state j to cell state i (see the following section for further discussion).

Reconstruction of cell developmental energy landscape and modelling of cell-cell interactions

Following Chow et al. [20] and Li [21], we define the dynamic potential energy of cell type i as follows: (8) which consists of three components: (1) a static linear potential energy Φ_i that is time-independent and measures the cell differentiation potency of cell type i; (2) an interaction potential energy that coordinates cell development through intercellular communication; and (3) an entropy energy β log p_i(t) which is an analog of the potential energy induced by white noise in diffusion process [14].

Overall, the Ψ_i(t) depicts a dynamic potential energy landscape of cell type i at time t: cell state i with a higher potential energy Ψ_i(t) tends to transit to more stable states with lower potential energies; while the Φ_i quantifies the cell differentiation potency of cell type i, as well as a way to represent cell developmental time (see The linear potential energy Φ by GraphFP quantifies cell differentiation potency in Results).

Cell development relies on the coordination of cellular activities based on temporal and local cell-cell communication through molecular signalling events [22]. As a key component, the interaction potential energy in Eq (8) models and quantifies cell-cell interactions that drive cell development, where w_ik is the interaction strength from cell type k (as a signalling sender) to cell type i (as a signalling receiver): when w_ik > 0, cell type k will send signals to upgrade the potential energy of cell type i to a higher level, thus inhibiting the decrease of potential energy of cell type i; when w_ik < 0, cell type k will send signals to downgrade the potential energy of cell type i to a lower level, thus stimulating the decrease of potential energy of cell type i; when w_ik = 0 or w_ik ≈ 0, cell type k will send no or weak signals to cell type i, thus unable to alter the potential energy level of cell type i. (9)

We say that cell types i and k have no mutual interactions when both w_ik and w_ki are zero or close to zero. Such modelling of the interaction matrix W is inspired and evidenced by our increasing understanding that both positive and negative feedback circuits composed of stimulatory and inhibitory factors are involved in the regulation of precise coordination of cell fate decisions through intercellular communication [22, 30].

Dynamic inference of cell developmental process

Once we estimate parameters θ* = {Φ*, W*}, we can quantify the stochastic dynamics of the cell type frequencies p(t) on probability simplex in continuous time t(> t₁), according to Eq (2) given the initial point of probability p¹ on probability simplex.

The potential energy difference between cell states i and j is

We can draw the curves of the potential energy of all cell states over time to illustrate the cell state-transition potential energy landscape from a dynamic point of view.

Based on Eq (2) which is the gradient flow of free energy [20, 21], we define the probability flow from vertex j into vertex i through edge {i, j} between time stages [t_l, t_l+1] as (10) which measures the total probability mass transporting from vertex j into vertex i through edge {i, j} between adjacent time stages [t_l, t_{l+ 1}]: a positive value indicates a probability mass gain of vertex i resulted from the flow of cells transiting from state j into state i, while a negative value indicates a probability mass loss of vertex i resulted from the flow of cells transiting from state i into state j. If total probability mass is conservative (e.g., no cell proliferation is considered), we have

The intuition of the probability flow definition is that, when potential energy difference Δ_ji(t) = F_j(p(t)) − F_i(p(t)) > 0, cells tend to transit from a higher potential energy state j to a lower potential energy state i, resulting in a probability mass gain of vertex i.

Results

GraphFP models and infers cell differentiation as a cell state-transition process described by the nonlinear FPE on cell state-transition graph in continuous time (see Fig 1 and Methods for details).

In this study, we evaluated the performance of GraphFP using the time series scRNA-seq dataset of embryonic murine cerebral cortex development [26]. This dataset was analyzed by Tempora [12]. We downloaded the processed data, including the cell type annotations provided by Tempora [12]. The time series transcriptomic profile contains 6,316 cells collected at embryonic day 11.5 (E11.5), E13.5, E15.5 and E17.5. Overall, these cells represent neuronal development states from the early precursors (apical precursors (APs) and radial precursors (RPs)) to intermediate progenitors (IPs) and differentiated cortical neurons. Fig 2a illustrates the gold standard trajectory of the 4 major cell states curated by Tempora. Tempora identified 7 cell types by clustering and annotation methods: two AP/RP clusters denoted as “3-APs/RPs” and “5-APs/RPs”, two IP clusters denoted as “4-IPs” and “7-IPs”, two young neuron clusters denoted as “2-Young Neurons” and “6-Young Neurons”, and one neuron cluster denoted as “1-Neurons” (Fig 2b).

Download:

Fig 2. GraphFP accurately reconstructs the cell state-transition energy landscape of the murine cerebral cortex dataset.

(a) The gold standard trajectory of embryonic murine cerebral cortex development. (b) The t-SNE plot of cells from the murine cerebral cortex dataset, colored by their cell-type labels. (c) GraphFP estimated the linear potential energy Φ. (d) GraphFP estimated the cell-cell interaction matrix W. (e) Static linear potential energy landscape of cells on the t-SNE plot: cells are color-coded according to the linear potential energies Φs of their corresponding cell types. (f) The free energy (Eq (1)) of the system decreased over time. (g) The reconstructed potential energy landscape Ψ(t) of cell types (colored curves) over time. (h) The potential energies of the cell state pairs with the top 3 highest positive values of cell-cell interaction strengths w_ijs: “2-Young Neurons ← 1-Neurons” (left panel), “6-Young Neurons ← 3-APs/RPs” (middle panel), and “4-IPs ← 1-Neurons” (right panel). (i) The potential energies of the cell state pairs with the top 3 lowest negative values of cell-cell interaction strengths w_ijs: “6-Young Neurons ← 1-Neurons” (left panel), “4-IPs ← 3-APs/RPs” (middle panel), and “2-Young Neurons ← 4-IPs” (right panel).

https://doi.org/10.1371/journal.pcbi.1009821.g002

GraphFP reconstructs the cell state-transition energy landscape

We applied GraphFP to the embryonic murine cerebral cortex development scRNA-seq dataset based on the cell state/type labels of 7 clusters provided by Tempora. We estimated the cell state frequencies of the 7 cell states for each of the 4 time points separately. GraphFP first estimated parameters θ = {Φ, W} of the free energy based on the adjoint method (Fig 2c and 2d).

In general, the static landscape of the estimated linear potential energies Φs shows a consistent understanding of the differentiation potencies of the 7 cell states. The early precursors states of “5-APs/RPs” and “3-APs/RPs” that mostly comprise cells at E11.5 have the highest two Φs of 0.059 and 0.054, respectively. The two IP clusters, “7-IPs” and “4-IPs”, and one young neuron cluster, “6-Young Neurons”, have the three intermediate-valued Φs of 0.052, -0.051, and 0.035, respectively. The differentiated cortical neuron clusters “1-Neurons” and “2-Young Neurons” have the lowest two Φs of -0.073 and -0.075, respectively (Fig 2c and 2e).

We modelled the cell state-transition energy landscape from a dynamical geometric point of view. The dynamic potential energy Ψ(t) consists of not only the static linear part of Φ, but also an interaction energy part as shown in Eq (8). It provides a global and holistic view of cell development process. For example, when only looking at the static landscape of Φ, we found that the linear potential energy of “1-Neurons” (Φ₁ = −0.073) is slightly higher than that of “2-Young Neurons” (Φ₂ = −0.075). This is in conflict with our understanding that “1-Neurons” should have the lowest potential since it is located at the terminal node of the cell lineage (Fig 2a). However, when looking at the dynamic potential energy Ψ(t), we found that “1-Neurons” has a strong inhibitory interaction over “2-Young Neurons” (w₂₁ = 0.26 versus w₁₂ = 0.04) which can upgrade the potential energy of “2-Young Neurons” during the process. The resultant dynamic potential energy of “2-Young Neurons” (Ψ₂(t)) surpasses that of “1-Neurons” (Ψ₁(t)) with higher value after time point E13 (Fig 2g and Left Panel of Fig 2h). The potential energy difference (Δ₂₁) between “2-Young Neurons” and “1-Neurons” diverges with enlarging gaps as time evolves, especially in the latter time stages after time point E15.5 (Left Panel of Fig 2h). This is well consistent with our understanding that: (1) “2-Young Neurons” tends to transit to “1-Neurons” during cell development (Fig 2a), and (2) the transition from “2-Young Neurons” to “1-Neurons” mainly occurs at the late neurogenesis between E15.5 and E17.5 [26].

Further results on the linear potential energy, the cell-cell interactions and the dynamic potential energy will be provided in the following two sections.

The linear potential energy Φ by GraphFP quantifies cell differentiation potency

Computational quantification of cell differentiation potency (also known as cell stemness) is a challenging issue [18]. The pioneer work by Shi et al. [5] established a rigorous mathematical theory on quantifying cell stemness from scRNA-seq data based on continuous birth-death process.

Here, we demonstrated that the linear potential Φ estimated by GraphFP can be used to quantify the cell differentiation potency. In this study, each cell will be assigned the same linear potential value as that of its corresponding cell type/state. Shi et al. [5] also quantified the cell differentiation potency at the cluster level (e.g., cell state and cell type), which makes the results more accurate and robust. Quantifying the cell differentiation potency at single-cell level is still difficult, as the single-cell gene expression profiles are known to be error-prone due to various technique issues [31].

Following the study in Shi et al. [5], we tested whether our linear potential energies for pluripotent stem cells (at early time point) are higher than those for differentiated cells (at latter time point). It is clearly shown that, cells from samples collected at earlier time stages tend to have higher potential Φ and vice versa (Fig 3a). When using the one-sided Wilcoxon ranksum statistic as applied by Shi et al. [5], we confirmed with highly statistically significant results that the linear potential values of cells sampled at the earliest time stage E11.5 are higher than those cells sampled at the subsequent time stages E13.5 (p < 1.554e − 07), E15.5 (p < 2.2e − 16), and E17.5 (p < 2.2e − 16), respectively.

Download:

Fig 3. The linear potential energy Φ quantifies cell differentiation potency.

(a) Boxplot of the linear potential energies of cells sampled different time stages of the embryonic murine cerebral cortex development. (b) Trend in the addictive inverse of linear potential (circular points connected by black lines with y axis on left-hand side) and temporal score (triangle points connected by red lines with y axis on right-hand side) across cell types. (c) The linear potential energy Φ estimated by GraphFP. (d) The stationary probability distribution p_ss of the cell types.

https://doi.org/10.1371/journal.pcbi.1009821.g003

Next, we checked the linear potential energies of the cell types with their pseudo-time during cell development process. Tempora [12] provided each cell type with a temporal score by adjusting its cell composition from each time point such that a cell type containing more cells from an early time point will have a lower score and vice versa. We therefore used the temporal scores as the pseudo-time for the 7 cell types. It is clearly demonstrated that the addictive inverse values of linear potential energy are strongly correlated with the temporal scores (Fig 3b, Pearson correlation coefficient = 0.91), further confirming that the linear potential energy well quantifies cell stemness.

It is worth noting that the linear potential energy Φ (Fig 3c) is different from the stationary distribution p_ss of cell types (Fig 3d). The stationary distribution p_ss, which is the cell type frequencies or cell densities calculated using the merged data across all time points, is often used to construct the stationary energy landscape U_ss ≡ −log p_ss in scRNA-seq data analysis [6]. However, as pointed out by Shi et al. [5], the stationary energy landscape U_ss is the equilibrium potential induced by diffusion without birth and death. In some extent, the linear potential energy Φ by GraphFP is an analogy of the cell potential V(x) proposed by Shi et al. [5], which was taken as their quantification of cell differentiation potency.

GraphFP delineates cell-cell interactions

We quantified the cell-cell interactions and intercellular communication during embryonic murine cerebral cortex development using the cell-cell interaction matrix W estimated by GraphFP. The W measures the cell-cell interaction strength between each pair of two cell types/states, one as the signalling sender and the other as the signalling receiver (see Eq (9) and section “Reconstruction of cell developmental energy landscape and modelling of cell-cell interactions” in Methods for its biological interpretations).

The estimated W is a sparse matrix with most elements having values equal or close to zero (Fig 2d), indicating a majority of the pairs having no or weak interactions. For example, the cell types “5-APs/RPs” and “7-IPs” have no mutual interactions with w₅₇ = 0 and w₇₅ = −0.02 (Fig 2d). Furthermore, the estimated cell-cell interaction strengths in the first row of W that corresponds to “5-APs/RPs” are zero or close to zero with |w_ij| ≤ 0.04, making the contributions from its interaction term in Eq (8) negligible. As such, the dynamic potential energy Ψ₅(t) is dominated by its linear potential energy (Φ₅) with a resultant flattening potential energy curve (Fig 2g).

We also observed a number of strong cell-cell interactions with large w_ijs deviating from zero. Cell states other than “5-APs/RPs” and “7-IPs” have at least one w_ij with strong interaction strength (e.g., |w_ij| > 0.1), resulting in the deviation of their potential energies Ψ(t)s from their linear potential energies Φs largely driven by their interaction energies with sharpened potential energy curves (Fig 2g).

We further examined cell state pairs with strong interactions. The pairs of “2-Young Neurons ← 1-Neurons” (w₂₁ = 0.26), “6-Young Neurons ← 3-APs/RPs” (w₆₃ = 0.14) and “4-IPs ← 1-Neurons” (w₄₁ = 0.12) have the top 3 highest positive values of w_ijs (Fig 2d), indicating that the sender cell types (“1-Neurons”, “3-APs/RPs” and “1-Neurons”) pass strong inhibitory signalling to their corresponding receiver cell types (“2-Young Neurons”, “6-Young Neurons” and “4-IPs”, respectively). Their potential energy differences (|Δ_ij|s) diverge with enlarging gaps as time evolves (Fig 2h), resulting in that cells tend to transit in one direction from the cell state with higher potential energy to the cell state with lower potential energy, only rarely transiting in the reverse direction. In particular, “2-Young Neurons” tends to transit to “1-Neurons”, “3-APs/RPs” tends to transit to “6-Young Neurons”, “4-IPs” tends to transit to “1-Neurons”. These results are consistent with our understanding of the cell development process depicted in Fig 2a.

On the other hand, the pairs of “6-Young Neurons ← 1-Neurons” (w₆₁ = −0.21), “4-IPs ← 3-APs/RPs” (w₄₃ = −0.15) and “2-Young Neurons ← 4-IPs” (w₂₄ = −0.14) have the top 3 lowest negative values of w_ijs, indicating that the sender cell types (“1-Neurons”, “3-APs/RPs” and “4-IPs”) pass strong stimulatory signalling to their receiver cell types (“6-Young Neurons”, “4-IPs” and “2-Young Neurons”, respectively). In particular, the potential energy differences (|Δ_ij|s) of the pair “6-Young Neurons: 1-Neurons” and the pair “3-APs/RPs: 4-IPs” converge with shrinking gaps as time evolves (Fig 2i), making the transitions between the paired cell states approaching to equilibrium in both directions. This result is consistent with our probability flow (Fig 4), where the net probability flows from “6-Young Neurons” to “1-Neurons” as well as from “3-APs/RPs” to “4-IPs” gradually decrease over time. The potential energy difference between “4-IPs” and “2-Young Neurons” starts from a small value close to zero at time point E11.5, then gradually increases to its largest gap at E14, and then gradually declines to zero again at E17.5. This result indicates that the transition from “4-IPs” to “2-Young Neurons” mainly occurs at the intermediate time region from E13.5 to E15.5, which is consistent with our understanding that the “4-IPs” cells are the intermediate progenitors of cell development (Fig 2a).

Download:

Fig 4. GraphFP charts the probability flows of cell state-transitions.

The circle point represents cell type (point size is proportional to the cell-type frequency at each time point); the line between cell types represents probability flow from source cell type to target cell type (line width is proportional to the value of probability flow).

https://doi.org/10.1371/journal.pcbi.1009821.g004

Based on our calculation using GraphFP, we also confirmed that free energy (Eq (1)) of the system decreased over time (Fig 2f), which is consistent with accepted mathematical theory [20, 21]. However, according to our calculation, free energy of the system did not converge to its minimum free energy state at time point E17.5 when the experiment ended (see the vertical dashed red line in Fig 2f). We predicted from Fig 2f that the system would reach its minimum free energy state after time point E30.

GraphFP faithfully charts the probability flows of cell state-transitions during cell development

We next examined the ability of GraphFP to quantify the dynamics of cell state-transitions during embryonic murine cortical development by calculating the probability flows (Eq (10)) between each time intervals of the adjacent time stages (Fig 4). In the early stage from E11.5 to E13.5, cells mainly transit from the early precursors of “3-APs/RPs” and “5-APs/RPs” to the intermediate progenitor “4-IPs” and the two neuron clusters of “2-Young Neurons” and “1-Neurons”. In the middle stage from E13.5 to E15.5, the intermediate progenitor “4-IPs” joins in with “3-APs/RPs” and “5-APs/RPs” as the major source clusters that transit to two neuron clusters, “2-Young Neurons” and “1-Neurons”. Meanwhile, as a source cluster, “2-Young Neurons” starts to transit to “1-Neurons”, and in the latter stage from E15.5 to E17.5, “4-IPs” takes a leading role in transiting to the neuron cluster of “1-Neurons” and young neuron clusters, “2-Young Neurons” and “6-Young Neurons”. Meanwhile, “2-Young Neurons” continues as one of the major source clusters transiting to “1-Neurons” (Fig 4). Compared with the gold standard trajectory shown in Fig 2a by Tempora [12], we identified a new path whereby the IP cells of cluster “4-IPs” transit to neuron cells of cluster “1-Neurons”, as confirmed by Yuzwa et al. [26], who reported that cortical RPs divide asymmetrically from E11.5 to E17.5 to generate neurons directly or indirectly via transit-amplifying cells of IPs.

Cell-cell interactions drive the stochastic and nonlinear dynamics of cell development

GraphFP explicitly models cell-cell interactions with a nonlinear quadratic interaction term in the free energy (Eq (1)). To account for cell-cell interactions, we evaluated GraphFP on its ability to fit the experimental data (Fig 5a) and recover held-out time points (Fig 5b–5d) with cell-cell interaction term (W ≠ 0; solid lines in Fig 5) and without cell-cell interaction term (W = 0; dashed lines in Fig 5). To evaluate the performance on estimation accuracy, we applied Kullback-Leibler divergence (KLD) to measure the difference between the estimated probability distribution with/without interaction term and true probability distribution at each time points (Table 1). A lower KLD value is indicative of better performance.

Download:

Fig 5. GraphFP accurately quantifies the stochastic dynamics of the cell type frequencies by modelling cell-cell interactions.

GraphFP calculated the stochastic dynamics of the cell type frequencies p(t) with cell-cell interaction term (W ≠ 0; solid lines) and without cell-cell interaction term (W = 0; dashed lines). Triangle points are the estimated cell type frequencies at each time point where red represents the input data point to GraphFP, while blue represents the held-out data point to GraphFP. (a) Using all 4 time points as input. (b) Held-out E13.5. (c) Held-out E15.5. (d) Held-out E13.5 and E15.5.

https://doi.org/10.1371/journal.pcbi.1009821.g005

Download:

Table 1. Evaluation of GraphFP’s performance on quantifying the stochastic dynamics of cell-type frequencies with cell-cell interaction term (W ≠ 0) and without cell-cell interaction term (W = 0) on the murine cerebral cortex dataset.

https://doi.org/10.1371/journal.pcbi.1009821.t001

First, we applied GraphFP to the embryonic murine cerebral cortex development scRNA-seq dataset using all 4 time points. Based on the estimated parameters θ*, we calculated the stochastic dynamics of the cell type frequencies p(t) on probability simplex in continuous time t(> t₁) according to Eq (2) with given initial point of . Overall, GraphFP with cell-cell interaction term outperforms GraphFP without cell-cell interaction term on the fitting of the nonlinear curves for the 7 clusters (Table 1), especially for “2-Young Neurons”, “4-IPs” and “6-Young Neurons” (Fig 5a).

Next, we applied GraphFP to the scRNA-seq datasets of (i) one held-out time point E13.5 (Fig 5b) and (ii) one held-out time point E15.5 (Fig 5c), separately. GraphFP with cell-cell interaction term always outperforms GraphFP without cell-cell interaction term on both nonlinear curve fitting and held-out time point recovering except for one comparison on Held out E13.5 dataset at time stage E13.5. (Table 1).

Finally, we applied GraphFP to the scRNA-seq data set of two held-out time points, E13.5 and E15.5 (Fig 5d). It is not surprising that both models drop their accuracies markedly on recovering the held-out time points. In addition, the results by GraphFP with cell-cell interaction term still outperform those by GraphFP without cell-cell interaction term (Table 1 and Fig 5d).

As shown in Fig 5, our results illustrate that the stochastic and nonlinear dynamics of cell development are not merely determined by the linear potential energies Φs, but also driven by nonlinear cell-cell interactions. Specifically, the evolving probability frequencies of cell types can be nonmonotonic (e.g., “2-Young Neurons”, “4-IPs”, “6-Young Neurons” in Fig 5). Meanwhile, time series scRNA-seq data with cells profiled at more time points will provide more temporal information to recover the biologically complex dynamic processes.

GraphFP is robust to input data

As also shown in Fig 5a–5c, our results illustrate that GraphFP robustly recovers the stochastic and nonlinear dynamics of cell development by using all datasets or datasets with one held-out time point.

Since GraphFP works on cells with cluster labels or cell type annotations, we then examined whether GraphFP is sufficiently robust to account for the uncertainty presented in the clustering or annotation methods. In the above sections, we have illustrated the outputs of GraphFP based on the labelling of 7 clusters with a fine resolution provided by Tempora [12]. Here, we further grouped the cells into 4 clusters with a coarse resolution as follows: “A-Neurons” constituted by cells from “1-Neurons”; “B-Young Neurons” constituted by cells from “2-Young Neurons” and “6-Young Neurons”; “C-APs/RPs” constituted by cells from “3-APs/RPs” and “5-APs/RPs”; and “D-IPs” constituted by cells from “4-IPs” and “7-IPs”. We compared the results of GraphFP based on the labelling of 7 cell types and the labelling of 4 cell types (Fig 6). To make the results comparable, we aggregated the results based on the labelling of 7 cell types by averaging the results from i) “3-APs/RPs” and “5-APs/RPs”, ii) “2-Young Neurons” and “6-Young Neurons” and iii) “4-IPs” and “7-IPs”, separately, resulting in the same dimensions as those based on the labelling of 4 cell types. It should be noted that the results based on 4 cell types and the aggregated results based on 7 cell types are consistent with similar patterns of linear energies Φs (Fig 6a and 6d), interaction matrices Ws (Fig 6b and 6e), and probability flows (Fig 6c and 6f). In addition, GraphFP is robust to the hyper-parameter choices in wide ranges.

Download:

Fig 6. GraphFP is robust to uncertainty presented in cell type labels.

GraphFP was applied to the murine cerebral cortex dataset based on the labelling of 4 cell types with a coarse resolution (a-c) and the labelling of 7 cell types with a fine resolution (d-f), separately. The estimated Φ (a), W (b) and charted probability flow (c) by GraphFP based on the labelling of 4 clusters (“A-Neurons”, “B-Young Neuron”, “C-APs/RPs”, “D-IPs”). Aggregated results of the estimated Φ (d), W (e) and charted probability flow (f) by GraphFP based on the labelling of 7 clusters, averaging the results from i) “3-APs/RPs” and “5-APs/RPs”, ii) “2-Young Neurons” and “6-Young Neurons” and iii) “4-IPs” and “7-IPs”, separately, resulting in the same dimensions as those based on the labelling of 4 cell types.

https://doi.org/10.1371/journal.pcbi.1009821.g006

The computational cost of GraphFP

We examined the impact of the number of cell types (n) on the computational cost of GraphFP. When working on the murine cerebral cortex dataset with 7 cell types (n = 7) and 4 time stages, the runtime of GraphFP was around 3 minutes for each task on a personal laptop (MacBook Pro with CUP 2.4 GHz Intel Core i5 and Memory 8 GB 2133 MHz LPDDR3) (S1 Table). In our implementation, we set λ_l = 1000(l ∈ 2, 3, 4), β = 0.001, the learning rate as α = 0.01/λ_l and Integral_step as 0.1.

We next examined GraphFP on another time series scRNA-seq dataset of the mouse spinal cord injury healing process provided by [32] (see the detailed results in S2 Text). The new dataset contains 13 clusters (cell types) and 4 time points. We applied GraphFP to this dataset on the same computer with the same hyper-parameter settings as those for the murine cerebral cortex dataset. GraphFP still achieved accurate and robust reconstruction of cell state-transition energy landscape on this dataset (S2 Text), and also achieved a reasonable performance on computational speed: the runtime for each task was around 9 minutes (S2 Table).

Based on the above experiments, we found that the computational speed of GraphFP is sustainable for tasks with moderate number of cell types. Meanwhile, the computation of GraphFP might be problematic for large n (e.g., n > 100) at current settings. As we applied the complete cell state-transition graph, the degree of freedom (e.g., the parameters of the cell-cell interaction matrix W) will grow in the order of O(n²), making the computation difficult. However, this problem is solvable. One way to solve this problem is to take advantage of the sparse structure of the cell-cell interaction matrix W. As we have already noted, the estimated Ws of both the murine cerebral cortex dataset (Fig 2d) and the mouse spinal cord injury dataset (Fig A(b) in S2 Text) are sparse. Therefore, we can solve this problem by adding a L1 regularization term of the matrix W to the loss function to enforce W to be sparse. We plan to pursue this topic in our future work.

On the other hand, in practice, for large number of cell types, we can trade off the estimation accuracy and the information of cell-cell interactions for runtime performance. GraphFP without the cell-cell interaction term will be efficient for large number of cell types since the degree of freedom (e.g., the parameters of linear potential energy Φ) will grow in the order of O(n). For example, the runtimes of GraphFP without cell-cell interaction term (W = 0) for both the murine cerebral cortex dataset (S1 Table) and the mouse spinal cord injury dataset (S2 Table) are all less than 20 seconds.

Discussion

Modelling of cell development has long been a key goal of systems biology. The Waddington landscape is a classic metaphor for describing cell development. Mathematical framework of cell developmental energy landscape has been developed to study the dynamics of cell state-transitions from gene regulatory network (GRN) based perspective (e.g., [33–36]) and state manifold based perspective (e.g., [5, 37, 38]) (see [18] for a recent review of the two approaches). Traditional GRN-based landscape can be hindered by the computational issue raised by high-dimensional GRNs. Recently, a model-based dimension reduction approach of the landscape (DRL) was proposed to construct a low-dimensional energy landscape of high-dimensional GRNs [36], which overcomes the limitations of traditional methods. Although great success has been achieved, the GRN-based landscape depends on prior biological knowledge of the underlying GRN. When the information of GRNs is unavailable or not complete, the state manifold based landscape will be constructed, especially for scRNA-seq data analysis. The state manifold based methods model the cell development with stochastic Markov process and/or drift-diffusion PDE, where cell states (e.g., cell types and cell clusters) represent the local attractors of the underlying dynamic systems [18].

In this study, we propose GraphFP, a state manifold based computational framework, to reconstruct the complex potential energy landscape and infer the stochastic dynamics of cell state-transition during cell development. GraphFP models cell development based on the diffusion process in a discrete spectrum of states [19–21]. It can be viewed from the lens of dynamic optimal transport on networks as solving an optimal control problem to minimize the kinetic energy of flow between adjacent time points [14, 39]. The FPE of GraphFP can be characterized as a gradient flow of free energy when the probability simplex of discrete states is equipped with the discrete L₂-Wasserstein metric defined on the graphs [19–21]. Beyond its clear theoretical importance, GraphFP has enabled critical insight into nonlinear dynamic cell state-transition, as well as cell-cell interactions during cell development. We demonstrated that the cell-cell interaction part of GraphFP plays a key role in capturing the stochastic dynamic of the cell-type frequencies on both the murine cerebral cortex dataset (Table 1 and Fig 5) and the mouse spinal cord injury dataset (S2 Text).

GraphFP has the following strengths over existing methods. First, GraphFP models the dynamics of cell clusters (e.g., cell states and cell types) on a discrete state space. In contrast, methods, such as Waddington-OT [15], TrajectoryNet [16] and PRESCIENT [17], modelled the dynamics of individual cells with drift-diffusion equations on a continuous state space. With the dramatic increase in amount and size of scRNA-seq data, the cluster-based approaches, which work on a relatively small number of clusters that usually represent annotated cell types, warrant both scalability to large-scale scRNA-seq data and ease of biological interpretability [12].

Second, GraphFP is built on a nonlinear model that explicitly takes into account cell-cell interactions in free energy. The current computational methods for inferring cell-cell interactions from single-cell data are mainly based on machine learning or statistics, relying heavily on the domain knowledge as learning materials [22, 23]. On the other hand, GraphFP provides an alternative and model-based approach to decipher cell-cell interactions that drive cell development. In contrast, the underlying models of both Waddington-OT [15] and PRESCIENT [17] are only able to characterize cell state-transition on the static potential energy landscape driven by random noises, failing to account for cell-cell interactions. Although able to reconstruct nonlinear development landscape, TrajectoryNet was based on the neural network framework without explicit system models, thus lacking biological interpretability.

Nonetheless, some aspects still need to be improved. Firstly, the current GraphFP does not account for cell proliferation during cell development, which may result in that probability masses are not conservative over time. We can solve this problem by adopting the unbalanced optimal transport framework that has been used by Waddington-OT [15] and TrajectoryNet [16] to quantify cell proliferation. Secondly, as the existing time series scRNA-seq methods such as Waddington-OT [15] and Tempora [12], the current GraphFP works in an off-line fashion such that the cell clustering and annotation are performed on the entire data by merging cells from all time points together. This approach offers an unbiased, comprehensive and quantitative definition of discrete cell types. However, with the emerging large-scale scRNA-seq data, it may be computationally cumbersome to cluster the massive and continually arriving scRNA-seq datasets as a whole. Therefore, developing an on-line framework of GraphFP that can cluster and annotate the single-cell time series scRNA-seq data in different batches in a serial fashion should be an interesting topic. The newly developed single-cell data analysis tools such as the on-line integration method online iNMF [40] and the cell type annotation method scArches based on transfer learning [41] can be adopted.

Supporting information

S1 Text. Details for the parameter estimation of GraphFP.

This document provides detailed description of the parameter estimation and pseudocode for the GrapFP algorithm.

https://doi.org/10.1371/journal.pcbi.1009821.s001

(PDF)

S2 Text. Application of GraphFP to the mouse spinal cord injury dataset.

Fig A. GraphFP reconstructs the cell state-transition energy landscape on the mouse spinal cord injury scRNA-seq dataset. Fig B. The linear potential energy Φ quantifies cell differentiation potency. Table A. Evaluation of GraphFP’s performance on quantifying the stochastic dynamics of cell type frequencies with cell-cell interaction term (W ≠ 0) and without cell-cell interaction term (W = 0) on the mouse spinal cord injury dataset.

https://doi.org/10.1371/journal.pcbi.1009821.s002

(PDF)

S1 Table. Runtimes of GraphFP on the murine cerebral cortex dataset.

https://doi.org/10.1371/journal.pcbi.1009821.s003

(DOCX)

S2 Table. Runtimes of GraphFP on the mouse spinal cord injury dataset.

https://doi.org/10.1371/journal.pcbi.1009821.s004

(DOCX)

References

1. Tanay A, Regev A. Scaling single-cell genomics from phenomenology to mechanism. Nature. 2017;541(7637):331–338. pmid:28102262
- View Article
- PubMed/NCBI
- Google Scholar
2. Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nature Biotechnology. 2019;37(5):547–554. pmid:30936559
- View Article
- PubMed/NCBI
- Google Scholar
3. Jin S, MacLean AL, Peng T, Nie Q. scEpath: energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data. Bioinformatics. 2018;34(12):2077–2086. pmid:29415263
- View Article
- PubMed/NCBI
- Google Scholar
4. Shi J, Teschendorff AE, Chen W, Chen L, Li T. Quantifying Waddington’s epigenetic landscape: a comparison of single-cell potency measures. Briefings in Bioinformatics. 2018;21(1):248–261. pmid:30289442
- View Article
- PubMed/NCBI
- Google Scholar
5. Shi J, Li T, Chen L, Aihara K. Quantifying pluripotency landscape of cell differentiation from scRNA-seq data by continuous birth-death process. PLOS Computational Biology. 2019;15(11):1–17. pmid:31721764
- View Article
- PubMed/NCBI
- Google Scholar
6. Chen Z, An S, Bai X, Gong F, Ma L, Wan L. DensityPath: an algorithm to visualize and reconstruct cell state-transition path on density landscape for single-cell RNA sequencing data. Bioinformatics. 2019;35(15):2593–2601. pmid:30535348
- View Article
- PubMed/NCBI
- Google Scholar
7. Weinreb C, Wolock S, Tusi BK, Socolovsky M, Klein AM. Fundamental limits on dynamic inference from single-cell snapshots. Proceedings of the National Academy of Sciences. 2018;115(10):E2467–E2476. pmid:29463712
- View Article
- PubMed/NCBI
- Google Scholar
8. Rashid S, Kotton DN, Bar-Joseph Z. TASIC: determining branching models from time series single cell data. Bioinformatics. 2017;33(16):2504–2512. pmid:28379537
- View Article
- PubMed/NCBI
- Google Scholar
9. Lin C, Bar-Joseph Z. Continuous-state HMMs for modeling time-series single-cell RNA-Seq data. Bioinformatics. 2019;35(22):4707–4715. pmid:31038684
- View Article
- PubMed/NCBI
- Google Scholar
10. An S, Ma L, Wan L. TSEE: an elastic embedding method to visualize the dynamic gene expression patterns of time series single-cell RNA sequencing data. BMC Genomics. 2019;20(2):224. pmid:30967106
- View Article
- PubMed/NCBI
- Google Scholar
11. Zheng X, Huang Y, Zou X. scPADGRN: A preconditioned ADMM approach for reconstructing dynamic gene regulatory network using single-cell RNA sequencing data. PLOS Computational Biology. 2020;16(7):e1007471. pmid:32716923
- View Article
- PubMed/NCBI
- Google Scholar
12. Tran TN, Bader GD. Tempora: Cell trajectory inference using time-series single-cell RNA sequencing data. PLOS Computational Biology. 2020;16(9):e1008205. pmid:32903255
- View Article
- PubMed/NCBI
- Google Scholar
13. Peyré G, Cuturi M. Computational Optimal Transport: With Applications to Data Science. Foundations and Trends in Machine Learning. 2019;11(5-6):355–607.
- View Article
- Google Scholar
14. Zhou H. Optimal Transport on Networks. IEEE Control Systems Magazine. 2021;41(4):70–81.
- View Article
- Google Scholar
15. Schiebinger G, Shu J, Tabaka M, Cleary B, Subramanian V, Solomon A, et al. Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell. 2019;176(4):928–943.e22. pmid:30712874
- View Article
- PubMed/NCBI
- Google Scholar
16. Tong A, Huang J, Wolf G, Van Dijk D, Krishnaswamy S. TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics. In: III HD, Singh A, editors. Proceedings of the 37th International Conference on Machine Learning. vol. 119 of Proceedings of Machine Learning Research. Virtual: PMLR; 2020. p. 9526–9536.
17. Yeo GHT, Saksena SD, Gifford DK. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nature Communications. 2021;12(1):3222. pmid:34050150
- View Article
- PubMed/NCBI
- Google Scholar
18. Teschendorff AE, Feinberg AP. Statistical mechanics meets single-cell biology. Nature Reviews Genetics. 2021;22(7):459–476. pmid:33875884
- View Article
- PubMed/NCBI
- Google Scholar
19. Chow SN, Huang W, Li Y, Zhou H. Fokker-Planck Equations for a Free Energy Functional or Markov Process on a Graph. Archive for Rational Mechanics and Analysis. 2012;203(3):969–1008.
- View Article
- Google Scholar
20. Chow SN, Li W, Zhou H. Entropy dissipation of Fokker-Planck equations on graphs. Discrete & Continuous Dynamical Systems—A. 2018;38(10):4929–4950.
- View Article
- Google Scholar
21. Li W. A study of stochastic differential equations and Fokker-Planck equations with applications. Georgia Institute of Technology; 2016. Available from: http://hdl.handle.net/1853/54999.
22. Armingol E, Officer A, Harismendy O, Lewis NE. Deciphering cell–cell interactions and communication from gene expression. Nature Reviews Genetics. 2021;22(2):71–88. pmid:33168968
- View Article
- PubMed/NCBI
- Google Scholar
23. Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, et al. Inference and analysis of cell-cell communication using CellChat. Nature Communications. 2021;12(1):1088. pmid:33597522
- View Article
- PubMed/NCBI
- Google Scholar
24. Bryson AE, Ho Y. Applied Optimal Control: Optimization, Estimation and Control. CRC Press; 1975.
25. Chen RTQ, Rubanova Y, Bettencourt J, Duvenaud DK. Neural Ordinary Differential Equations. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc.; 2018. p. 6571–6583.
26. Yuzwa SA, Borrett MJ, Innes BT, Voronova A, Ketela T, Kaplan DR, et al. Developmental Emergence of Adult Neural Stem Cells as Revealed by Single-Cell Transcriptional Profiling. Cell Reports. 2017;21(13):3970–3986. pmid:29281841
- View Article
- PubMed/NCBI
- Google Scholar
27. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008;2008(10):P10008.
- View Article
- Google Scholar
28. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM III, et al. Comprehensive Integration of Single-Cell Data. Cell. 2019;177(7):1888–1902.e21. pmid:31178118
- View Article
- PubMed/NCBI
- Google Scholar
29. Liu Q, Chen S, Jiang R, Wong WH. Simultaneous deep generative modelling and clustering of single-cell genomic data. Nature Machine Intelligence. 2021;3(6):536–544. pmid:34179690
- View Article
- PubMed/NCBI
- Google Scholar
30. Kirouac DC, Ito C, Csaszar E, Roch A, Yu M, Sykes EA, et al. Dynamic interaction networks in a hierarchically organized tissue. Molecular Systems Biology. 2010;6(1):417. pmid:20924352
- View Article
- PubMed/NCBI
- Google Scholar
31. Kharchenko PV. The triumphs and limitations of computational methods for scRNA-seq. Nature Methods. 2021;18(7):723–732. pmid:34155396
- View Article
- PubMed/NCBI
- Google Scholar
32. Milich LM, Choi JS, Ryan C, Cerqueira SR, Benavides S, Yahn SL, et al. Single-cell analysis of the cellular heterogeneity and interactions in the injured mouse spinal cord. Journal of Experimental Medicine. 2021;218(8). pmid:34132743
- View Article
- PubMed/NCBI
- Google Scholar
33. Wang J, Zhang K, Xu L, Wang E. Quantifying the Waddington landscape and biological paths for development and differentiation. Proceedings of the National Academy of Sciences. 2011;108(20):8257–8262. pmid:21536909
- View Article
- PubMed/NCBI
- Google Scholar
34. Li C, Wang J. Quantifying Cell Fate Decisions for Differentiation and Reprogramming of a Human Stem Cell Network: Landscape and Biological Paths. PLOS Computational Biology. 2013;9(8):e1003165. pmid:23935477
- View Article
- PubMed/NCBI
- Google Scholar
35. Lang J, Nie Q, Li C. Landscape and kinetic path quantify critical transitions in epithelial-mesenchymal transition. Biophysical Journal. 2021;120(20):4484–4500. pmid:34480928
- View Article
- PubMed/NCBI
- Google Scholar
36. Kang X, Li C. A Dimension Reduction Approach for Energy Landscape: Identifying Intermediate States in Metabolism-EMT Network. Advanced Science. 2021;8(10):2003133. pmid:34026435
- View Article
- PubMed/NCBI
- Google Scholar
37. Zhou P, Li T. Construction of the landscape for multi-stable systems: Potential landscape, quasi-potential, A-type integral and beyond. The Journal of Chemical Physics. 2016;144(9):094109. pmid:26957159
- View Article
- PubMed/NCBI
- Google Scholar
38. Zhou P, Wang S, Li T, Nie Q. Dissecting transition cells from single-cell transcriptome data through multiscale stochastic dynamics. Nature Communications. 2021;12(1):5609. pmid:34556644
- View Article
- PubMed/NCBI
- Google Scholar
39. Jordan R, Kinderlehrer D, Otto F. The Variational Formulation of the Fokker–Planck Equation. SIAM Journal on Mathematical Analysis. 1998;29(1):1–17.
- View Article
- Google Scholar
40. Gao C, Liu J, Kriebel AR, Preissl S, Luo C, Castanon R, et al. Iterative single-cell multi-omic integration using online learning. Nature Biotechnology. 2021;39(8):1000–1007. pmid:33875866
- View Article
- PubMed/NCBI
- Google Scholar
41. Lotfollahi M, Naghipourfar M, Luecken MD, Khajavi M, Büttner M, Wagenstetter M, et al. Mapping single-cell data to reference atlases by transfer learning. Nature Biotechnology. 2021. pmid:34462589
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Tanay A, Regev A. Scaling single-cell genomics from phenomenology to mechanism. Nature. 2017;541(7637):331–338. pmid:28102262
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nature Biotechnology. 2019;37(5):547–554. pmid:30936559
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Jin S, MacLean AL, Peng T, Nie Q. scEpath: energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data. Bioinformatics. 2018;34(12):2077–2086. pmid:29415263
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Shi J, Teschendorff AE, Chen W, Chen L, Li T. Quantifying Waddington’s epigenetic landscape: a comparison of single-cell potency measures. Briefings in Bioinformatics. 2018;21(1):248–261. pmid:30289442
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Shi J, Li T, Chen L, Aihara K. Quantifying pluripotency landscape of cell differentiation from scRNA-seq data by continuous birth-death process. PLOS Computational Biology. 2019;15(11):1–17. pmid:31721764
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Chen Z, An S, Bai X, Gong F, Ma L, Wan L. DensityPath: an algorithm to visualize and reconstruct cell state-transition path on density landscape for single-cell RNA sequencing data. Bioinformatics. 2019;35(15):2593–2601. pmid:30535348
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Weinreb C, Wolock S, Tusi BK, Socolovsky M, Klein AM. Fundamental limits on dynamic inference from single-cell snapshots. Proceedings of the National Academy of Sciences. 2018;115(10):E2467–E2476. pmid:29463712
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Rashid S, Kotton DN, Bar-Joseph Z. TASIC: determining branching models from time series single cell data. Bioinformatics. 2017;33(16):2504–2512. pmid:28379537
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Lin C, Bar-Joseph Z. Continuous-state HMMs for modeling time-series single-cell RNA-Seq data. Bioinformatics. 2019;35(22):4707–4715. pmid:31038684
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. An S, Ma L, Wan L. TSEE: an elastic embedding method to visualize the dynamic gene expression patterns of time series single-cell RNA sequencing data. BMC Genomics. 2019;20(2):224. pmid:30967106
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Zheng X, Huang Y, Zou X. scPADGRN: A preconditioned ADMM approach for reconstructing dynamic gene regulatory network using single-cell RNA sequencing data. PLOS Computational Biology. 2020;16(7):e1007471. pmid:32716923
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Tran TN, Bader GD. Tempora: Cell trajectory inference using time-series single-cell RNA sequencing data. PLOS Computational Biology. 2020;16(9):e1008205. pmid:32903255
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. Peyré G, Cuturi M. Computational Optimal Transport: With Applications to Data Science. Foundations and Trends in Machine Learning. 2019;11(5-6):355–607.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref14] 14. Zhou H. Optimal Transport on Networks. IEEE Control Systems Magazine. 2021;41(4):70–81.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref15] 15. Schiebinger G, Shu J, Tabaka M, Cleary B, Subramanian V, Solomon A, et al. Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell. 2019;176(4):928–943.e22. pmid:30712874
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref16] 16. Tong A, Huang J, Wolf G, Van Dijk D, Krishnaswamy S. TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics. In: III HD, Singh A, editors. Proceedings of the 37th International Conference on Machine Learning. vol. 119 of Proceedings of Machine Learning Research. Virtual: PMLR; 2020. p. 9526–9536.

[ref17] 17. Yeo GHT, Saksena SD, Gifford DK. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nature Communications. 2021;12(1):3222. pmid:34050150
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref18] 18. Teschendorff AE, Feinberg AP. Statistical mechanics meets single-cell biology. Nature Reviews Genetics. 2021;22(7):459–476. pmid:33875884
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref19] 19. Chow SN, Huang W, Li Y, Zhou H. Fokker-Planck Equations for a Free Energy Functional or Markov Process on a Graph. Archive for Rational Mechanics and Analysis. 2012;203(3):969–1008.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref20] 20. Chow SN, Li W, Zhou H. Entropy dissipation of Fokker-Planck equations on graphs. Discrete & Continuous Dynamical Systems—A. 2018;38(10):4929–4950.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref21] 21. Li W. A study of stochastic differential equations and Fokker-Planck equations with applications. Georgia Institute of Technology; 2016. Available from: http://hdl.handle.net/1853/54999.

[ref22] 22. Armingol E, Officer A, Harismendy O, Lewis NE. Deciphering cell–cell interactions and communication from gene expression. Nature Reviews Genetics. 2021;22(2):71–88. pmid:33168968
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref23] 23. Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, et al. Inference and analysis of cell-cell communication using CellChat. Nature Communications. 2021;12(1):1088. pmid:33597522
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref24] 24. Bryson AE, Ho Y. Applied Optimal Control: Optimization, Estimation and Control. CRC Press; 1975.

[ref25] 25. Chen RTQ, Rubanova Y, Bettencourt J, Duvenaud DK. Neural Ordinary Differential Equations. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, editors. Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc.; 2018. p. 6571–6583.

[ref26] 26. Yuzwa SA, Borrett MJ, Innes BT, Voronova A, Ketela T, Kaplan DR, et al. Developmental Emergence of Adult Neural Stem Cells as Revealed by Single-Cell Transcriptional Profiling. Cell Reports. 2017;21(13):3970–3986. pmid:29281841
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref27] 27. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. 2008;2008(10):P10008.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref28] 28. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM III, et al. Comprehensive Integration of Single-Cell Data. Cell. 2019;177(7):1888–1902.e21. pmid:31178118
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref29] 29. Liu Q, Chen S, Jiang R, Wong WH. Simultaneous deep generative modelling and clustering of single-cell genomic data. Nature Machine Intelligence. 2021;3(6):536–544. pmid:34179690
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref30] 30. Kirouac DC, Ito C, Csaszar E, Roch A, Yu M, Sykes EA, et al. Dynamic interaction networks in a hierarchically organized tissue. Molecular Systems Biology. 2010;6(1):417. pmid:20924352
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref31] 31. Kharchenko PV. The triumphs and limitations of computational methods for scRNA-seq. Nature Methods. 2021;18(7):723–732. pmid:34155396
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref32] 32. Milich LM, Choi JS, Ryan C, Cerqueira SR, Benavides S, Yahn SL, et al. Single-cell analysis of the cellular heterogeneity and interactions in the injured mouse spinal cord. Journal of Experimental Medicine. 2021;218(8). pmid:34132743
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref33] 33. Wang J, Zhang K, Xu L, Wang E. Quantifying the Waddington landscape and biological paths for development and differentiation. Proceedings of the National Academy of Sciences. 2011;108(20):8257–8262. pmid:21536909
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

[ref34] 34. Li C, Wang J. Quantifying Cell Fate Decisions for Differentiation and Reprogramming of a Human Stem Cell Network: Landscape and Biological Paths. PLOS Computational Biology. 2013;9(8):e1003165. pmid:23935477
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref35] 35. Lang J, Nie Q, Li C. Landscape and kinetic path quantify critical transitions in epithelial-mesenchymal transition. Biophysical Journal. 2021;120(20):4484–4500. pmid:34480928
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref36] 36. Kang X, Li C. A Dimension Reduction Approach for Energy Landscape: Identifying Intermediate States in Metabolism-EMT Network. Advanced Science. 2021;8(10):2003133. pmid:34026435
View Article
PubMed/NCBI
Google Scholar

[125] View Article

[126] PubMed/NCBI

[127] Google Scholar

[ref37] 37. Zhou P, Li T. Construction of the landscape for multi-stable systems: Potential landscape, quasi-potential, A-type integral and beyond. The Journal of Chemical Physics. 2016;144(9):094109. pmid:26957159
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref38] 38. Zhou P, Wang S, Li T, Nie Q. Dissecting transition cells from single-cell transcriptome data through multiscale stochastic dynamics. Nature Communications. 2021;12(1):5609. pmid:34556644
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref39] 39. Jordan R, Kinderlehrer D, Otto F. The Variational Formulation of the Fokker–Planck Equation. SIAM Journal on Mathematical Analysis. 1998;29(1):1–17.
View Article
Google Scholar

[137] View Article

[138] Google Scholar

[ref40] 40. Gao C, Liu J, Kriebel AR, Preissl S, Luo C, Castanon R, et al. Iterative single-cell multi-omic integration using online learning. Nature Biotechnology. 2021;39(8):1000–1007. pmid:33875866
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref41] 41. Lotfollahi M, Naghipourfar M, Luecken MD, Khajavi M, Büttner M, Wagenstetter M, et al. Mapping single-cell data to reference atlases by transfer learning. Nature Biotechnology. 2021. pmid:34462589
View Article
PubMed/NCBI
Google Scholar

[144] View Article

[145] PubMed/NCBI

[146] Google Scholar

Figures

Abstract

Author summary

Introduction

Methods

Identifying cell states/types

Constructing the cell state-transition graph

Modelling cell state-transition dynamics with the nonlinear FPE on graph equipped with discrete L2-Wasserstein distance

Parameter estimation and model optimization

Reconstruction of cell developmental energy landscape and modelling of cell-cell interactions

Dynamic inference of cell developmental process

Results

GraphFP reconstructs the cell state-transition energy landscape

The linear potential energy Φ by GraphFP quantifies cell differentiation potency

GraphFP delineates cell-cell interactions

GraphFP faithfully charts the probability flows of cell state-transitions during cell development

Cell-cell interactions drive the stochastic and nonlinear dynamics of cell development

GraphFP is robust to input data

The computational cost of GraphFP

Discussion

Supporting information

S1 Text. Details for the parameter estimation of GraphFP.

S2 Text. Application of GraphFP to the mouse spinal cord injury dataset.

S1 Table. Runtimes of GraphFP on the murine cerebral cortex dataset.

S2 Table. Runtimes of GraphFP on the mouse spinal cord injury dataset.

References

Modelling cell state-transition dynamics with the nonlinear FPE on graph equipped with discrete L₂-Wasserstein distance