Dynamic distribution decomposition for single-cell snapshot time series identifies subpopulations and trajectories during iPSC reprogramming

Recent high-dimensional single-cell technologies such as mass cytometry are enabling time series experiments to monitor the temporal evolution of cell state distributions and to identify dynamically important cell states, such as fate decision states in differentiation. However, these technologies are destructive, and require analysis approaches that temporally map between cell state distributions across time points. Current approaches to approximate the single-cell time series as a dynamical system suffer from too restrictive assumptions about the type of kinetics, or link together pairs of sequential measurements in a discontinuous fashion. We propose Dynamic Distribution Decomposition (DDD), an operator approximation approach to infer a continuous distribution map between time points. On the basis of single-cell snapshot time series data, DDD approximates the continuous time Perron-Frobenius operator by means of a finite set of basis functions. This procedure can be interpreted as a continuous time Markov chain over a continuum of states. By only assuming a memoryless Markov (autonomous) process, the types of dynamics represented are more general than those represented by other common models, e.g., chemical reaction networks, stochastic differential equations. Furthermore, we can a posteriori check whether the autonomy assumptions are valid by calculation of prediction error—which we show gives a measure of autonomy within the studied system. The continuity and autonomy assumptions ensure that the same dynamical system maps between all time points, not arbitrarily changing at each time point. We demonstrate the ability of DDD to reconstruct dynamically important cell states and their transitions both on synthetic data, as well as on mass cytometry time series of iPSC reprogramming of a fibroblast system. We use DDD to find previously identified subpopulations of cells and to visualise differentiation trajectories. Dynamic Distribution Decomposition allows interpretation of high-dimensional snapshot time series data as a low-dimensional Markov process, thereby enabling an interpretable dynamics analysis for a variety of biological processes by means of identifying their dynamically important cell states.

Both Wishbone [42] and Monocle2 [43] are pseudotime ordering methods: provided with a gene-by-cell matrix these algorithms will search for structures in the data and infer an order from some starting point.In our case, the starting point was chosen to be cells with an Oct-4 + , KLF4 + expression profile.For us to apply these methods, the time label for each cell was omitted.
To assess the quality of the pseudotime ordering we plot each cell's harvest time label against the pseudotime value; for both Wishbone and Monocle2 this lead to uncorrelated plots, see Figures S5 and S6.Were it the case that these pseudotime ordering methods worked well on time series data, we would see a monotonic relationship between both variables.Clearly this is not the case, and therefore we question the benefit of using such a method on time series data.The same conclusion was reached in Schiebinger et.al. [6].
Wishbone was not able to elucidate on any structure within the data and therefore we focused on Monocle2.After application of Monocle2, three branching points were identified, see Figure S6.However, it appears that all three are biologically unrealistic as they have CD73 + , CD140 expression profiles that are inconsistent with Zunder et.al. [28].

Waddington-OT
The method developed by Schiebinger et.al.
[6], known as Waddington-OT, appears to be most similar to DDD in that it is specifically designed for time series data.Waddington-OT uses Optimal Transport to map between pairs of subsequent time series measurements by minimising the "Earthmover's distance", i.e., how far each unit of probability mass needs to be displaced.However, instead of encoding data points into distributions, they map individual data points together from one time point to the next.Combining the R 1 pairwise maps together leads to the creation of a weighted directed graph estimating how the data points at one time point map to data points at the preceding and following times.Downstream analysis can then be carried out; their suggestions included: clustering cells together and looking at maps between these clusters; and fitting models using these maps to infer gene regulatory networks.
While their work suggested how to calculate "decendants" and "ancestors" for a particular cell in question, one point left unaddressed was how to calculate branching points from their maps.This is the point of comparison we would like to use between methodologies.To overcome this difficulty, we calculated the betweenness node centrality measure indicating nodes which frequently lay on paths between other pairs of nodes.To test this method worked, we applied it to the simulated data to find a single clear branching point at location (0.66, 0.37) | which is a reasonable estimate and consistent with the DDD estimate of (0.65, 0.64) | as annotated in Figure 4.
We then used this same approach on the data generated by Zunder et.al.
[28], plotted using the force directed layout from Waddington-OT [6] in Figure S7.From this procedure 3 data points had exceedingly high betweenness centrality, see Figure S7(b): the first and second data points were associated to times t = 4 days and t = 6 days and have similar expression profiles to basis function 16 as found from our earlier analysis; the final data point with high betweenness centrality was found at t = 14 days but did not resemble the expression profile as described by basis function 29 and previously within the original paper by Zunder et.al. [28].
We hypothesise that our method will perform better than Waddington-OT for datasets where: i.) there is a large number of measurements (Waddington-OT can generate large files); ii.) there are erratic/large gaps of time between measurements (Waddington-OT assumes small gaps between measurements); and/or iii.) the system is believed to be autonomous.