Shape-specific characterization of colorectal adenoma growth and transition to cancer with stochastic cell-based models

Colorectal adenoma are precursor lesions on the pathway to cancer. Their removal in screening colonoscopies has markedly reduced rates of cancer incidence and death. Generic models of adenoma growth and transition to cancer can guide the implementation of screening strategies. But adenoma shape has rarely featured as a relevant risk factor. Against this backdrop we aim to demonstrate that shape influences growth dynamics and cancer risk. Stochastic cell-based models are applied to a data set of 197,347 Bavarian outpatients who had colonoscopies from 2006-2009, 50,649 patients were reported with adenoma and 296 patients had cancer. For multi-stage clonal expansion (MSCE) models with up to three initiating stages parameters were estimated by fits to data sets of all shapes combined, and of sessile (70% of all adenoma), peduncular (17%) and flat (13%) adenoma separately for both sexes. Pertinent features of adenoma growth present themselves in contrast to previous assumptions. Stem cells with initial molecular changes residing in early adenoma predominantly multiply within two-dimensional structures such as crypts. For these cells mutation and division rates decrease with age. The absolute number of initiated cells in an adenoma of size 1 cm is small around 103, related to all bulk cells they constitute a share of about 10−5. The notion of very few proliferating stem cells with age-decreasing division rates is supported by cell marker experiments. The probability for adenoma transiting to cancer increases with squared linear size and shows a shape dependence. Compared to peduncular and flat adenoma, it is twice as high for sessile adenoma of the same size. We present a simple mathematical expression for the hazard ratio of interval cancers which provides a mechanistic understanding of this important quality indicator. We conclude that adenoma shape deserves closer consideration in screening strategies and as risk factor for transition to cancer.


S1 Mathematical implementation
Number and size distribution for K = 0 For K = 0 the unconditioned size distribution of pre-malignant clones created at age s with cell size Y (s, t) at age t is given in Jeon et al. [1] (their Eqs. Dewanji et al. [2] present an equivalent formulation in their Eq. (9) Adenoma i may be detectable at age t (N (s i , t) = 1 for Y (s i , t) > y 0 ) or not (N (s i , t) = 0 for Y (s i , t) ≤ y 0 ). Consequently, arises from a filtered Poisson process for the binary variable N (s i , t) with success probability of finding no detectable adenoma at age t.
As a special case the probability of finding no cell at age t is represented here in a compact expression.
The normalized size distribution of detectable adenoma at age t is given Adenoma are measured in size intervals with lower and upper bounds which pertain to cell numbers y l < y h above the detection limit y 0 . The probability of finding an adenoma within these bounds Shape-specific characterization of adenoma growth and cancer risk ∞ y=y0+1 y p y (s, t)ds (αζ) y0 ζ lerch(αζ, 1, 1 + y 0 ) (S12) with ∞ y=y0+1 y p y (s, t) = 1 G(s, t) G(s, t) g(s, t) + G(s, t) y0+1 G(s, t) g(s, t) + y 0 + 1 can be derived.
To compare with the measured mean adenoma size for d-dimensional growth the expectation value is needed. The size conversion factor transforms the number initiated cells in an adenoma into physical adenoma size measured in cm. For d = 2, 3 y r denotes the reference number of cells at reference size S r = 1cm. The mean adenoma size is then given by Number and size distribution for K = 1 and µ 1 /α 1 By setting µ 1 /α 1 we assume that only a single sub-clone arises from P 0mutated cells. Now we can draw on the equations for K = 0 and adjust them accordingly.
Starting from Eqs. (S4), (S6) the expectation value for the number N (t) of detectable adenoma is given by ds.
The probability of detecting an adenoma of size Y (t) > y 0 and age t ds, and the normalized size distribution of detectable adenoma at age t can be analogously calculated for K = 1 and µ 1 /α 1 from Eqs. (S4) and (S15).
Conveniently, we define the probability of finding no cell at age t ds (S18) as a special case.
The probability of finding an adenoma within cell numbers y l , y h Based on the size distribution in Eq. (S17) cell-based expectation values such as the expected number of cells per detectable adenoma To compare with the measured mean adenoma size for d-dimensional growth the expectation value is needed. By applying the size conversion factor from Eq. (S14) the mean adenoma size is then given by Number and size distribution for K = 1 After arrival of a P 1 cell with mutation rate µ 0 the unconditioned size distribution of a pre-malignant clone follows a Generalized Luria-Delbrück Distribution (GLD) [2]. For constant coefficients the GLD becomes a Negative Binomial Distribution [3] p y (s, The probability p y (s, t) has the same meaning as in Eqs. (S1,S2) under the condition that a mutation leading to a P 1 cell has already occurred.
Below we repeat the formulation of the quantities for K = 1 related to Eqs. (S8) -(S12) for K = 0 as given in Dewanji et al. [3] using Eq. (S22) with the hypergeometric function 2 F 1 .
Analogously, the mathematical formulation for the expected number of the probability of detecting an adenoma of size Y (t) > y 0 at age t and the normalized size distribution of detectable adenoma at age t can be constructed for K = 1 from the expressions (S22), (S23) and (S24).
Additionally, we present as the probability of finding no cell at age t.
From Eq. (S26) follows probability of finding an adenoma in interval (y l , y h ] for K = 1. If the upper cell size y h → ∞ the second term in Θ(y l , y h , t) pertaining to y h disappears.
Based on the size distribution for K = 1 in Eq. (S26) the values (S14) to E[Y 1/d (t)] yields the mean adenoma size in cm from growth in d = 2, 3 dimensions, respectively.
The detection probability [3] (their Eq. (23)) and the normalized size distribution of detectable adenoma at age t [3] (their Eq. (24)) can be constructed for K = 2 in the same way for as for K = 1 by obeying As a special case of Eq. (S31) it is convenient to present as the probability of finding no cell at age t, From Eq. (S32) follows the probability of finding an adenoma in the interval If the upper cell size y h → ∞ the second term in Θ(y l , y h , t) pertaining to y h disappears.
one detectable adenoma. For constructing the likelihood function for K = 2 we assume that from each 'special' P 1 -mutation just one (and not several) adenoma arises. This is the case if µ 1 t s p (1) (s , t)ds < µ 1 t 1. Hence, the probability of harboring a 'special' P 1 -mutation during lifetime must be small.
In this case the joint probability of having n detectable adenoma [3] (their Eq. (26)) reduces to Eq. (S32). This approximation leads to a substantial simplification of the likelihood function [3] (their Eq. (27)) which for K = 2 can now be couched in the same general form developed below.
Likelihood function for adenoma with discrete cell numbers The contribution L j of patient j to the total likelihood L = Npat j L j for N pat patients is given by if n adenoma of size y i , i = 1 . . . n have been detected at age t j with a detection limit of y 0 cells [3].
In Eq. (S36) P [N (t j ) = n] denotes the probability of detecting n adenoma above the detection limit and P [Y (t j ) = y i | Y (t j ) > y 0 ] denotes the probability that each of these adenoma consists of y i > y 0 cells.
which is determined by the choice of the growth model. For with the model-specific expectation value E[N (t j )] for the number of detectable adenoma from Eqs. (S6), (S24) and (S30). The model-specific function Θ(y i , t j ) from Eqs. (S10), (S26) and (S32) is determined by the number of cells y i in adenoma i at age t j of patient j.

Likelihood function for adenoma in categories of size and number
To meet the layout of the data set for Bavarian outpatients the likelihood function of Eq. (S36) [3] needs to be modified. The Bavarian data set reports the prevalence of adenoma in patient j in size categories and categories for the number of adenoma. If several adenoma were detected only the size category of the most advanced adenoma is indicated. Here we assume that the most advanced status implies the largest adenoma size. For this size category y l , y h denote the cell numbers of the lower and upper boundary, respectively. For the remaining adenoma we must assume a size between the detection limit at y 0 and the upper bound of the largest adenoma y h .
The definitions for models K = 0 in Eq. (S11), K = 1 in Eq. (S28) and K = 2 in Eq. (S34) are helpful to present the likelihood function.
The resulting probability for the size distribution of n adenoma is based on n! k!(n−k)! possible permutations for k indistinguishable adenoma in the size category of the largest adenoma. The condition n ≥ k ≥ 1 ensures that at least one adenoma populates this category. Hence, with Eq. (S37) the contribution L j of patient j to the total likelihood function becomes analogous to Eq. (S38).
In the Bavarian data set four categories for the number n of detected adenoma are provided: n ∈ {0} with detection limit y 0 , n ∈ {1}, n ∈ {2, 3, 4} and n ∈ {5, . . . , ∞}. Following Eq. (S39) we introduce the cumulative probability Θ = Θ(y 0 , y h , t) for cell numbers between y 0 and the upper boundary y h , and the corresponding cumulative probability ϑ = Θ(y 0 , y l , t) for the lower boundary y l . By starting from the patient-specific likelihood The model-specific Θ-functions are defined in Eqs. (S11), (S28) and (S34).
Note, that Θ = λ for y h → ∞ in the largest size category and ϑ = 0 for y l = y 0 in the smallest size category.
To simplify Eq. (S41) we set the number of adenoma to 2 for n ∈ {2, 3, 4} and to 5 for n ∈ {5, . . . , ∞}. These numbers pertain to maximal Poisson probabilities per count category. For patients with more than one adenoma we Estimates of identifiable model parameters are obtained from the deviance defined with the maximized likelihood functionL = jL j .

Conditional size distribution
If more than one adenoma has been detected in a patient, only properties (i.e size or histology) of the most advanced adenoma has been reported.
This lack of information in the data set does not allow a direct comparison of recorded and model-expected size distributions for all adenoma. The size distribution can only be estimated conditioned on the count category, for which the adenoma size has been reported. A breakdown of the probabilities P (size = sc, counts = cc) is given in Table A has been applied to calculate the sum over count categories in Table A.

Extinction probability
The conditional probability for a clone with y initiated cells at age t b to become extinct at age t e is For constant parameters the extinction probability  (Table L). Since the contribution from newly initiated cells via P K -mutations is also quite small, the extinction probability is mainly determined by the relation between α and β. Note, that the extinction probability pertains to clones of initiated cells but not to whole adenoma which comprise a by orders of magnitude higher number of cells.

Hazard functions
In the MSCE framework the hazard h(t) at age t for incidence of colorectal cancer increases linearly with transition rate ν which transforms a single cell from an initiated adenoma into full-blown cancer. The hazard is given by where E[C(t − t lag )|y > 0] denotes the mean number of initiated cells per individual which are susceptible to transformation into cancer. Tumor growth can be described in the same way as adenoma growth with a net expansion rate γ C (see Fig. 1). By simplification γ C is replaced with a lag time t lag which must elapse before a tumor becomes clinically relevant. For colorectal cancer Luebeck et al. [4] give estimates of about 2 yr −1 for γ C , 5 yr for t lag and For K = 0 Heidenreich [5] has shown that the hazard h(t) propagates to

Transformation probability
With Eq. (S46) the hazard function at age t + ∆t for an adenoma at age t with Y (t) cells is given by defined analogously to h c (∆t) in Eq. (S47).
In our simplified cancer induction model the adenoma is transformed into a clinically relevant tumor with age-dependent rate ν(t) of Eq. (2). Age dependence must be taken into account since ν increases substantially with older age.
The probability of having an adenoma transformed at time t + ∆t is then given by We omit competing risks which can be easily accounted for by replacing S Y (s) → S Y (s)S c (s) in the integrand. By approximating integration with summation over M t time intervals of width ∆s = ∆t/M t we obtain We can discard S Y (t) when we assume that an adenoma is always present at age t .

Quantifying screening efficiency
If optimal screening at age t removes all adenoma above the detection limit y 0 , the cell-weighted probabilities for remaining cells in adenoma below the detection limit are given by Hence, the expectation value for the number of remaining cells per adenoma larger than y 0 at age t can be written as The corresponding expectation value for the number of remaining adenoma is expressed as The expected number of remaining cells per individual becomes In reality the optimal screening efficiency for a given detection limit y 0 is never achieved. Adenoma removal depends on external factors such as the experience of the colonoscopist. These observations suggest a simple model assumption for screening success. It is assumed that screening reduces the number of adenoma and also changes their size distribution.
To quantify screening efficiency we define two efficiency factors 0 ≤ r ef f N , r ef f Y ≤ 1 pertaining to the removal of adenoma (r ef f N ) and to the reduction of initiated cells per adenoma (r ef f N ). Optimal screening efficiency means the complete removal of all detectable adenoma and is defined by Finally, the expected effective number of remaining cells per patient be-

Hazard ratio for risk decrease
With Eq. (S55) the hazard ratio for risk decrease in a screening cohort is written as Eq. (S56) also constitutes an analytical expression for the hazard ratio of interval cancer which has been measured by Corley et al. [6]. Interval cancers are diagnosed in patients after they had a colonoscopy.
Since ν cancels out, for ∆t = 0 the hazard ratio pertains to the share of remaining cells at age of examination t as a fraction of all initiated cells in a patient. The abbreviation E y0 applies to the remaining expectation values, correspondingly.
For ∆t < 10 yr after a screening examination the hazard h(∆t) for de novo created tumors can be neglected in Eq. (S56). We use this observation to equate which enables a direct comparison of the reported hazard ratios of Corley et al. [6] with estimates from the present study.

S3 Supplementary Tables
Breakdown of screening data in age groups   Goodness-of-fit and parameter estimates for models of level III