Fig 1.
Pictorial representation of tumor evolution.
(A—B) A pictorial representation of the evolution of a tumor from the initiating mutation to the heterogeneous tissue at the time of sampling, which consists of four different clones and normal tissue. (C) A phylogenetic tree with single cells as the tips. (D) A clonal lineage tree inferred from sampled cells where each node represents a subclone (cluster of cells). (E) A mutation tree inferred from sampled cells where each star represents the occurrence of one mutation. The box underneath each tip shows which mutations are present in the cell represented by the tip.
Fig 2.
True binary data, observed binary data and binary mutation process example.
(A) True binary mutation matrix of the sequenced tumor cells in the mutation tree in Fig 1E. Each row represents true genotypes for one genomic site in all cells and each column represents the true genotypes of multiple genomic sites for one single cell. (B) Observed mutation matrix with missing and ambiguous values (red), as well as mutation states that are misrecorded with respect to the true mutation matrix (red numbers; these are either false positives or false negatives). The red dash indicates a missing value since the sequencing process does not return signal at this site of this cell, and the red question mark represents an ambiguous value. Each row represents observed states for one genomic site in all cells and each column represents the observed states of multiple genomic sites for one single cell. (C) Binary mutation process example. A mutation is acquired on branch e1 (highlighted in red). The cell descending from branch e8 (highlighted in black) does not carry the mutation, while the cells descending from the blue branches carry the mutation.
Fig 3.
Adjacent order accuracy in scenarios 1 and 2 for MO, SCITE, SiFit and SPhyR when there are 20 mutations.
Each panel includes the results from the specific type of genotype and missing data percentage. In each panel, red, gray, blue, green and yellow colors correspond to MO with the true tree, MO with the estimated tree, SCITE, SiFit and SPhyR, respectively. Each plotting symbol on the line represents a different β. The x-axis is the probability of a false positive error, α, and the y-axis is order accuracy.
Fig 4.
Order accuracy in scenarios 1 and 2 for MO, SCITE, SiFit and SPhyR when there are 20 mutations.
Each panel includes the results from the specific type of genotype and missing data percentage. In each panel, red, gray, blue, green and yellow colors correspond to MO with the true tree, MO with the estimated tree, SCITE, SiFit and SPhyR, respectively. Each plotting symbol on the line represents a different β. The x-axis is the probability of a false positive error and the y-axis is order accuracy.
Fig 5.
Adjacent order accuracy in scenarios 3 and 4 for MO, SCITE, SiFit and SPhyR when there are 20 mutations.
Each panel includes the results from the specific type of genotype and lost mutations. In each panel, red, gray, blue, green and yellow colors correspond to MO with the true tree, MO with the estimated tree, SCITE, SiFit and SPhyR, respectively. Each plotting symbol on the line represents a different β. The x-axis is the probability of an error, α, and the y-axis is adjacent order accuracy.
Fig 6.
Order accuracy in scenarios 3 and 4 for MO, SCITE, SiFit and SPhyR when there are 20 mutations.
Each panel includes the results from the specific type of genotype and lost mutations. In each panel, red, gray, blue, green and yellow colors correspond to MO with the true tree, MO with the estimated tree, SCITE, SiFit and SPhyR, respectively. Each plotting symbol on the line represents a different β. The x-axis is the probability of an error, α, and the y-axis is order accuracy.
Fig 7.
P1 tumor phylogenetic tree and inferred temporal order of the mutations.
The normal cell is set as the outgroup. There are 18 branches in this tree. We do not assume the molecular clock when estimating the branch lengths. Branch lengths in this figure are not drawn to scale. The color and tip shape represent the spatial locations of the samples (normal tissue, location X3 or location X4; see [31]). The temporal order of the mutations is annotated on the branches of the tree. Mutations with very strong signals (probability of occurring on one branch is greater than 0.7) are highlighted in red, while mutations with moderate signals (probabilities that sum to more than 0.7 on two or three branches) are highlighted in blue. Mutation data for 30 genes corresponding to the first 30 rows in Figs R and S in the S1 Text for each tip are shown in the heatmap matrix at the bottom.
Fig 8.
Example measures of accuracy (A) True mapping of mutations along a fixed phylogenetic tree; (B) Inferred mapping of mutations along the same phylogeny.
The adjacent order accuracy is 2/3, while the order accuracy is 5/6.