Figure 1.
(A) Multicellular organism development can be represented by a rooted labeled binary tree called the organism cumulative cell lineage tree. Nodes (circles) represent cells (dead cells are crossed), and each edge (line) connects a parent with a daughter. The uncrossed leaves, marked blue, represent extant cells.
(B) Any cell sample (A–E) induces a subtree, which can be condensed by removing nonbranching internal nodes and labeling the edges with the number of cell divisions between the remaining nodes. The resulting tree is called the cell sample lineage tree.
(C) A small fraction of a genome accumulating substitution mutations (colored) is shown. Lineage analysis utilizes a representation of this small fraction, called the cell identifier. Phylogenetic analysis reconstructs the tree from the cell identifiers of the samples. If the topology of the cell sample lineage tree is known, reconstruction can be scored.
(D) Coincident mutations, namely two or more identical mutations that occur independently in different cell divisions (blue mutation in A and B), and silent cell divisions, namely cell divisions in which no mutation occurs (D–F), may result in incorrect (red edge) or incomplete (unresolved ternary red node) lineage trees. Excessive mutation rates might result in successive mutations (not shown), which cause the lineage information to be lost.
Figure 2.
Simulation of MS Mutations and Reconstruction Score on Random Trees
Two types of random trees with 32 leaves were generated, and MS stepwise mutations were simulated. Results of simulations of wild-type human using different numbers of MS loci are shown. The white line marks the perfect score limit (according to the Penny and Hendy tree comparison algorithm [29]). The results show that it is possible to accurately reconstruct the correct tree for trees of depth equivalent to human newborn and mouse newborn (marked by blue and green dots, respectively) using the entire set of MS loci. A mathematical analysis proves that any tree of depth 40 (equivalent to mouse newborn) can be reconstructed with no errors. Simulations with MS mutation rates of MMR-deficient organisms demonstrate that cell lineage reconstruction is possible with as few as 800 MS loci (the white line indicates the 0.95 score). The quality of reconstruction depends on the topology of the tree and its maximal depth, which together influence the signal-to-noise ratio.
Figure 3.
(A) Photograph and scheme of the R. pseudoacacia tree used for the lineage experiment. All three identically mutated samples (red) come from the same small branch.
(B) A. thaliana plant used for the experiment. The location of each sample is indicated.
(C) Transverse scheme of the A. thaliana plant showing all sampled stem (rectangles) and cauline leaf (ovals) tissues. Mutations that occurred in two or more samples are depicted by colored circles.
Figure 4.
Automated Procedure for Lineage Tree Reconstruction
The procedure accepts biological samples and PCR primers as input, and outputs a reconstructed lineage tree. It consists of a series of seven consecutive steps (numbered), during which the physical biological samples are “transformed” into digital data, which are then analyzed algorithmically. We built a hybrid in vitro/in silico automated system that performs steps 2–7 of the procedure (outlined), and used it to process DNA from tissue samples and single-cell clones. Incorporation of whole genome amplification techniques in the future may enable processing of single cells as well. For a detailed specification of the procedure, see Protocol S1.
Figure 5.
(A–C) A cell sample lineage tree with a predesigned topology is created by performing single-cell bottlenecks on all the nodes of the tree. Lineage analysis is performed on clones of the root and leaf cells. Three CCTs (A–C) were created using LS174T cells that display MS instability. All topologies were reconstructed precisely. Edge lengths are drawn in proportion to the output of the algorithm. Gray edges represent correct partitions according to the Penny and Hendy tree comparison algorithm [29], and their width represents the bootstrap value [29] (n = 1,000) of the edge. A minimal set of loci yielding perfect reconstruction was found for each CCT (each colored contour represents a different mutation shared by the encircled nodes; see also Figure S2).
(D) There is a linear correlation (R2 = 0.955) between reconstructed and actual node depths.
(E) Reconstruction scores of CCTs A–C using random subsets of MS loci of increasing sizes (average of 500).
Table 1.
Mutation Rates in MMR-Deficient Human Simulations