Fig 1.
Two-type branching process model of colorectal cancer progression.
Cells immigrate from a static population of colonic crypt stem cells (green cells) into the adenoma compartment A with rate μ1. Compartment A grows with rate b1 and decreases with rate d1. With rate μ2 adenoma cells generate malignant cells, M. Cancer compartment M grows with rate b2 and decreases with rate d2. The total number of cells in compartments A and M and time t are denoted A(t) and M(t), respectively.
Fig 2.
Illustration of simulated data for size of compartments A and M.
We performed 100,000 simulations of the two stage branching process with biologically motivated parameters (μ1 = 3.1, b1 = 9, d1 = 8.8, μ2 = 10−5, b2 = 9.2, and d2 = 8.8). Presented are the empirical densities of compartments A and M, given non-extinction. Heights indicate the density of the size distribution at each given age.
Fig 3.
Agreement of simulation and mathematical model for compartment A.
(A) Comparison of 1-CDF(number of cells) (percent of simulations with more than N cells at a given age) for the 100,000 simulations and the model prediction for the same parameters. Dashed dark blue line: model prediction using simulated parameters. Light blue area: empirical probability of observing more than N cells at a given age for the simulated parameters (μ1 = 3.1, b1 = 9, d1 = 8.8, μ2 = 10−5, b2 = 9.2, and d2 = 8.8). (B-D) Empirical MCMC-derived 2D density of posterior distribution of parameters. Warmer color indicate parameter values which are more likely to have produced the data. Black dots indicate the simulated parameter values: μ1 = 3.1, b1 = 9, and γ1 = .2
Fig 4.
Agreement of simulation and mathematical model for compartment M.
(A) Prevalence-only likelihood landscape which takes into account extinction of compartment M. Warm colors indicate parameter values which are more likely to have produced the data, variation in warm-ridgeline band is an artifact of grid choice. (B) Comparison of empirical 1-CDF(number of cells) (Percent of simulations with more than N cells) at age 50 for the simulated data and our new approximation. Separation at low sizes demonstrates that our approximation is most accurate at large ages. Black dots are exact calculation of probabilities taking the derivative of the probability generating function. (C) Likelihood landscape around our utilized parameters using our new approximation and the complete empirical size distribution of the simulated data. Warmer regions indicate parameter values which are more likely to have produced the data. The black dots indicate our biologically simulated parameters: μ1 = 10−5, and b1 = 9.2.
Fig 5.
Agreement of simulation and mathematical model for compartments A and M.
We performed 10,000 steps of adaptive MCMC on the simulated data and present the posterior distributions of the chain. Parameters used in the simulated data are: μ1 = 3.1, b1 = 9, γ1 = .2, μ2 = .00001, γ2 = .4, λ = .4. All parameters besides λ have the units per cell per year. λ is a population proportion. Upper right triangle: Pairwise parameter correlation. Diagonal: Univariate density of posterior parameter distribution. Lower left triangle: 2D posterior density distributions for pairs of parameters. Warmer colours indicate parameter values which are more likely to have produced the data.
Fig 6.
Agreement of simulation and mathematical derivations regarding the conditional probability of cancer given number of adenoma cells.
We compare the empirical probability of cancer given three size ranges (points), as derived from the simulations and compare this to our model derived values (lines). (A) Probability of cancer given compartment A has 100-1000, 2500-25000, or 50000-500000 cells. (B) Probability of cancer given compartment A has more than 1000, 25000 or 500000 cells. (C) Probability of cancer given compartment A has fewer than 5000, 25000 or 100000 cells.
Fig 7.
Model prediction of real-world adenoma size and rates of carcinoma.
(A) Average adenoma size in mm. Black line: Model predicted average adenoma size layered on top of binned count data for the CORI data. Colored bins: Colored bins: Number of individuals in the CORI data set with a reported adenoma of a given size. Dashed line: Beyond age 50 we do not see an age-dependent increase in average adenoma size and this data was excluded for our calculations. (B) Cancer prevalence and incidence rates. Red lines: Model-predicted cancer incidence and prevalence rates for given ages. Violin plots: Density of estimated rates for 5-year age-bins as derived from the SEER data.
Fig 8.
Parameter distributions and correlations for the model fit to CORI and SEER data.
We performed 10,000 steps of adaptive MCMC on the parameter space, evaluated on the CORI and SEER data and present the posterior distributions of the chain. All parameters have the units per cell per year. Upper right triangle: Pairwise parameter correlation. Diagonal: Univariate density of posterior parameter distribution. Lower left triangle: 2D posterior density distributions for pairs of parameters. Warmer colours indicate parameter values which are more likely to have produced the data.
Fig 9.
Conditional probability of cancer given adenoma size for CORI and SEER data-derived parameters.
Predicted probability of cancer given adenoma prevalence of a particular size. Parameters used are inferred from the two-type branching process model fit upon the CORI and SEER data. Labels indicate size of adenoma growth. Y-axis denotes the probability of cancer presence given adenoma size.