Table 1.
Characteristics of some cell lineage patterns.
Organisms are listed in order of increasing complexity. Lineages are characterized in terms of whether cell fate is exclusive to a subclone, the degree of phenotypic variability, and whether the lineage tree measurement is ordered. Lineages from higher organisms are generally unordered, have high variability, and may or may not be clonal.
Fig 1.
Illustration of pedigree data.
Shown are 5 sample pedigrees from each of the 3 lineage types. Each pedigree originates from a different founder cell and is, for compactness, drawn as a radial tree. Each node on the tree represents a cell, where node color reflects the strength of the cell phenotype under analysis (liftetime-averaged cell size for T cells, PHA-4 expression intensity for C. elegans). For T cells the root node is the naive cell while for the worm lineage the root node is the zygote (labelled P0). The absence of a node on the tree represents a missing data point. Labels for each worm cell position are given in Section S1.6 in S1 Appendix. Data shown here, and throughout the paper, are provided in the supporting information. Note how the worm pedigrees display clear, invariant patterns whereas the T-cell pedigrees (and the branching process) have no obvious repeatable structure.
Fig 2.
Expression of each phenotype as a function of generation.
The vertical axis represents the strength of expression for each measured phenotype. For T cells this is the lifetime-averaged cell area in μm2; for C. elegans it is the lifetime-averaged intensity of green fluorescent protein used to tag PHA-4 expression.
Fig 3.
Labeling convention for a lineage tree.
(a) Each lineal position is labeled with a binary number. The founder of the tree is located at generation g = 1. (b) Each subtree is labeled with two indices (ℓ, τ) representing the longitudinal (ℓ) and transverse (τ) coordinates of its root. Because, as we discuss later, roots of subtrees are associated with sources of variation we need to create a ‘subtree’ located outside the lineage, called (0, 0) (in red), to represent variation among pedigrees. Note that, in an unordered tree, τ values are unidentifiable and will often be ignored.
Fig 4.
Permutation symmetry of a tree.
Here the 8 allowed permutations of a tree with 3 generations are shown. These permutations, which involve the swapping of labels starting from the original arrangement in the top left corner, are allowed because they do not change the relationships in the tree. For example, consider the swapping of labels between sisters 101 and 100 (second from left in top row). Despite the swap, those lineal positions still have the same sister and mother. After any of the 8 permutations shown here, every lineal position still has the same mother, sister, cousins, granddaughters etc. In other words, the lineage tree relationships are invariant to this set of permutations.
Fig 5.
Cyclic and tree-structured symmetries.
(a) A cyclic symmetry structure is one that remains invariant under a shift of all the variables (around the circle in the figure shown) that preserves their relative ordering. This cyclic symmetry defines the discrete Fourier transform. (b) A tree symmetry structure is one that remains invariant under permutations within groups and permutations of groups. This symmetry gives rise to ANOVA for nested pairs and also defines the Haar wavelet transform. It is applicable when it is just the leaf nodes that are of interest. (c) When all the nodes of a tree are of interest, the underlying symmetry is still that for the tree. The associated transformation is derived in this paper and discussed in the next section.
Fig 6.
Construction of the natural variables for a tree with 4 generations.
Each natural variable is identified by a source of variation (ℓ, τ), corresponding to the root of a subtree, and a generation g. The + and − at each lineal position illustrate how the original variables are combined to form a natural variable. The 15 natural variables thus defined by the 3-tuple (ℓ, τ, g) are listed in the bottom row. Since the τ coordinates are indistinguishable, only 10 of the natural variables (those with τ = 0, say) are unique.
Fig 7.
Patterns on a tree can be described in terms of natural variables, or elemental components, examples of which are shown here. Each component is a bifurcated pattern centered on a subtree (ℓ, τ) and expressed in a generation g (where τ is ignored in an unordered tree). For example, the blue/non-blue bifurcated pattern occupies a subtree rooted at ℓ = 3 and observed at generations 5, 6, and 7. Note that ℓ = 1 variation (on the right) is a bifurcation across the whole pedigree. Variation among different pedigrees would be labeled with ℓ = 0.
Fig 8.
Heat maps of and ΣΩ for a complete tree.
This example was taken from the first 4 generations of the branching process. Natural variables along the axes of ΣΩ are given in the format (ℓ, τ, g). Isotypic blocks are bounded by dashed squares and correspond to a given ℓ. Irreducible blocks correspond to a source of variation (ℓ, τ) and are bounded by a dotted square. For ℓ = 0 and 1 the isotypic and irreducible blocks coincide since there is only one τ index value.
Table 2.
Generalized spectral analysis.
Well-known quantities in Fourier analysis have their direct analogs in the spectral analysis of a tree.
Fig 9.
These are undirected graphs in the original variables (shown as binary numbers). Each generation is arranged in an arc centered on the root node. The color of edges in each graph corresponds to the correlation (top row) or partial correlation (bottom row) between pairs of lineal positions. To avoid clutter, only the first 4 generations are shown. Note how the graph (f) of partial correlations for the simulated branching process, where daughters are conditionally uncorrelated, is a binary tree. This is not the case for the real lineages.
Fig 10.
These directed graphs in the natural variables show the dynamics of the bifurcated expression pattern in each subtree ℓ. The color (and thickness) of an edge between node j and j′ corresponds to the transmission strength, βℓ jj′. The size of the node corresponds to the innovation strength, .
Fig 11.
Fate profiles for different lineages.
Explained variance (top row) and the cumulative explained variance (bottom row). η2(ℓ|G) (blue) measures how much the fate of a cell at generation G is restricted by each subtree ℓ. R2(g|G) (orange) measures how much a generation-G cell’s phenotype is correlated with its direct ancestor in generation g. Note that because the Markov process is assumed to be first order (see Section ‘Sparsity.’), . For the case of the simulated branching process the exact result is also shown. This illustrates the accuracy of the inference procedure.