Figure 1.
Schematic diagram of human circulatory system showing circulating tumor cells (CTCs) detaching from primary tumor and getting trapped in capillary beds and other potential future metastatic locations as outlined by the ‘seed-and-soil’ framework.
Table 1.
Metastatic site numbering system.
Figure 2.
Metastatic distributions from autopsy data set extracted from 3827 patients [6].
Y-axis in each graph represents a proportion between 0 and 1. The sum of all the heights is 1. These are the two key probability distributions used to ‘train’ our lung cancer progression model. (a) Overall metastatic distribution including all primaries. We call this distribution the ‘generic’ distribution as it includes all primary cancer types.; (b) Distribution of metastases associated with primary lung cancer. We call this distribution the ‘target’ distribution that we label
Figure 3.
The converged lung cancer network shown as a circular, bi-directional, weighted graph.
We use sample mean values for all edges connecting sites in the target distribution. The disease progresses from site 23 (lung) as a ‘random walker’ on this network. Arrow heads placed on the end or ends of the edges denote the direction of the connections. Edge weightings are not shown. There are 50 sites (defined in Table 1) obtained from the full data set of [6], with ‘Lung’ corresponding to site 23 placed on top. The 27 sites that are connected by edges are those from the target vector for lung cancer defined in Table 1.
Figure 4.
Weight of outgoing edges from the lung (using sample mean values from ensemble) as compared with the ‘target’ distribution.
Figure 5.
Histogram of edge values from lung to lymph nodes (reg) for 1000 trained ’s, showing that edge values (transition probabilities) are best thought of as random variables which are (approximately) normally distributed.
Dashed vertical line shows initial edge value associated with Normal distribution with sample mean (0.15115) and variance (0.01821) is shown as overlay.
Figure 6.
Histogram of edge values from lung to adrenal for 1000 trained ’s showing that edge values (transition probabilities) are best thought of as random variables which are (approximately) normally distributed.
Dashed vertical line shows initial edge value associated with Normal distribution with sample mean (0.13165) and variance (0.01953) is shown as overlay.
Figure 7.
Panel showing progression of state vector for lung cancer primary using the ensemble averaged lung cancer matrix.
Filled rectangles show the long-time metastatic distribution from the autopsy data in Figure 2(b), unfilled rectangles show the distribution at step k using the Markov chain model. (a) k = 0; (b) k = 2.
Figure 8.
Panel showing progression of state vector for lung cancer primary using the ensemble averaged lung cancer matrix.
Filled rectangles show the long-time metastatic distribution from the autopsy data in Figure 2(b), unfilled rectangles show the distribution at step k using the Markov chain model. (a) k = 5; (b) k = ∞.
Table 2.
One and two-step transition probabilities.
Figure 9.
Probabilistic decomposition of pathways from lung to liver.
First transition probability is directly from lung to liver (0.08028±0.00946). Paths from the first-order sites to liver are shown as solid arrows. Paths from second-order sites to liver are shown as dashed arrows.
Table 3.
Self-edge weightings for each site.
Table 4.
Mean first-passage times from lung.
Figure 10.
Mean first-passage time histogram for Monte Carlo computed random walks all starting from lung.
Error bars show one standard deviation. Values are normalized so that lymph node (reg) has value 1, and all others are in these relative time units.
Figure 11.
Ensemble convergence to , starting from
. y-axis is
z, x-axis is step j.
We use an ensemble of 1000 trained matrices each conditioned on the same initial matrix
The average convergence curve is shown, along with standard deviations marked along each decade showing the spread associated with the convergence rates.
Figure 12.
Average distribution of the 27 non-zero singular values associated with the ensemble of 1000 matrices all obtained using the same
. x-axis is the index n, y-axis is
.
Data points (open circles) indicate the sample average, with error bars showing the sample standard deviations. Line is a least squares curve fit through through
showing linear decrease with exponent
The 27 non-zero singular values reflect the fact that there are 27 entries in the steady-state target distribution for primary lung cancer. The two diamond shaped data points are the two singular values associated with the initial matrix
The 27 ‘asterix’ data points are those obtained from a trained matrix using a perturbed
with Rank 2 perturbation. See text for details.