Fig 1.
Colours in the scatter plots represent time where red is the earliest and blue is the latest time point. In the forward model, the population is modelled from left to right. In the backward model, the population evolves from right to left. The blue rectangle on top represents the TrajectoryNet framework by Tong et al [18] which is equivalent to half of one iteration of our FBSDE model. The FBSDE model iterates between the Forward and Backward models; traversing through the Forward model generates new simulated data points which are subsequently used as training set by the backward model and vice versa.
Fig 2.
Fig 2a and 2b) Comparison between precision and recall. Red circles represent observations and green circles represent simulations. In Fig 2a, a high precision model means that simulations are close to observations; however, there can be some observations without any simulations nearby. In Fig 2b, a high recall model means that most observations have some simulations nearby; however, there can be some simulations that are far away from any observations. Fig 2c and 2d) Comparison between OT and SDE. Filled circles represent cells from t = 0. Unfilled circles represent cells from t = 1. The red colour represents observations. The green colour represents simulations. Triangles indicated interpolated cells at time t = h. Fig 2c) OT infers the endpoints of a path and the trajectory is simply a straight line connecting the endpoints. OT can only infer paths whose endpoints come from observations. Fig 2d) SDE infers both linear and non-linear trajectories using differential equations. In addition, SDE also infers paths originating from points that are not observations (i.e., given an arbitrary starting point).
Fig 3.
Difference between FBSDE and Waddington-OT in modelling a simulated dataset.
The time is coded in the vertical bars next to each panel; the time points are normalized with 0 and 1 representing starting and end point, respectively. At t = 0, the population follows the standard normal distribution. At t = 1, the population is uniformly distributed on the ring centred at (0, 0) with a radius of 10. At 0 < t < 1, the distribution is interpolated by either the FBSDE or Waddington-OT model. In Fig 3a, Fig 3b, Fig 3c and Fig 3d, the population evolves based on a stochastic differential equation where a neural network parametrizes the drift term. From Fig 3a to Fig 3d, the population exhibits more angular rotation as it favours a higher level of entropy (i.e. density). In Fig 3e, the endpoints for each point at t = 0 are sampled based on the optimal transport map and the interpolation is performed linearly by connecting each pair of points.
Fig 4.
Performances on a human embryonic stem cells dataset.
Fig 4a) Gene expression profiles of single cells are reduced to two dimensions using the PHATE method. The data correspond to a total of 10 time points over 27 days at 3 days intervals. The time points are color coded on a spectrum from red to deep blue. Fig 4b-d) Trajectory inference by three different methods (Waddington-OT, TrajectoryNet and FBSDE). In each experiment, a total of 200 equally spaced time points are added between the starting and end time points. As can be seen, FBSDE provides a trajectory most resembling the ground truth in Fig 4a.
Fig 5.
Performance on a mouse embryonic fibroblasts dataset.
Fig 5a) Gene expression of single cells are reduced to two dimensions using the force-directed layout embedding proposed in the study by Weinreb and colleagues [29]. The data correspond to a total of 37 time points over 18 days at 12 hours intervals. The time points are color coded on a spectrum from red to deep blue. Two visible sharp turns are indicated by arrows. Fig 5b-d) Trajectory inference by three different methods (Waddington-OT, TrajectortNet and FBSDE). In each experiment, a total of 200 equally spaced time points are added between the starting and end time points. The same color code is used. As can be seen, all three methods provide trajectories similar to the ground truth in Fig 5a although FBSDE outperforms the others modestly.
Fig 6.
Performance on a Arabidopsis thaliana stem cells dataset.
Fig 6a) Gene expression of single cells are reduced to two dimensions using the UMAP embedding method. The data correspond to a total of 51 time points on a pseudotime scale from 0 to 50. Fig 6b-d) Trajectory inference by three different methods (Waddington-OT, TrajectoryNet and FBSDE). in each experiment, a total of 200 equally spaced time points are added between the starting and end time points. The same color code is used. As can be seen. Waddington-OT produces some unrealistic paths among different branches in Fig 6a. TrajectoryNet fails to capture the geometric features of the true trajectories. FBSDE produces some trajectories that resemble the ground truth to some degree.
Table 1.
Performance comparison.