Investigating the Relation between Stochastic Differentiation, Homeostasis and Clonal Expansion in Intestinal Crypts via Multiscale Modeling

Colorectal tumors originate and develop within intestinal crypts. Even though some of the essential phenomena that characterize crypt structure and dynamics have been effectively described in the past, the relation between the differentiation process and the overall crypt homeostasis is still only partially understood. We here investigate this relation and other important biological phenomena by introducing a novel multiscale model that combines a morphological description of the crypt with a gene regulation model: the emergent dynamical behavior of the underlying gene regulatory network drives cell growth and differentiation processes, linking the two distinct spatio-temporal levels. The model relies on a few a priori assumptions, yet accounting for several key processes related to crypt functioning, such as: dynamic gene activation patterns, stochastic differentiation, signaling pathways ruling cell adhesion properties, cell displacement, cell growth, mitosis, apoptosis and the presence of biological noise. We show that this modeling approach captures the major dynamical phenomena that characterize the regular physiology of crypts, such as cell sorting, coordinate migration, dynamic turnover, stem cell niche correct positioning and clonal expansion. All in all, the model suggests that the process of stochastic differentiation might be sufficient to drive the crypt to homeostasis, under certain crypt configurations. Besides, our approach allows to make precise quantitative inferences that, when possible, were matched to the current biological knowledge and it permits to investigate the role of gene-level perturbations, with reference to cancer development. We also remark the theoretical framework is general and may be applied to different tissues, organs or organisms.

The lattice-based model of cellular tissue Within a n ⇥ m 2D lattice of positions L 2 {1, . . . , k} n⇥m a population of k cells of any type is placed. We denote with l i,j the single-site variable (spin in CPM terminology) associated with position (i, j), we write l i,j = (c, ⌧ ) if the position is occupied by a cell c 2 C of type ⌧ 2 T , i.e. T = {St, TA1, TA2-A, TA2-B, Pa, Go, Ec, Ee} [ {Ecm} being the finite set of cell types we consider plus a special type for the ECM. A cell C(c, ⌧ ) with current area a(c, ⌧ ) is defined by where a 0 is a basic area (in unit of pixels p) assigned to a single lattice position (a pixel). The mutual interaction energy is defined via the Hamiltonian H : Let N (i, j) denote the Von Neumann neighborhood of (i, j), and let ⌧ i,j denote the type of cell in l i,j , the contribution to the energy according to the DAH is where is the Kronecker delta function (which ensures that only the surface sites between di↵erent cells contribute to the adhesion energy), and J is the surface energy discussed in the main text. Let ⌧ c denote the type of cell c, the contribution to the energy of the area constraint is where the ECM has unconstrained area, i.e. A(Ecm, ·) is constant and negative, and its constraint is suppressed by the Heaviside function Heav(·). Notice that here we made explicit the time-dependency of the area constraint by using t c , the relative time since the beginning of the cell cycle for c. Given a lattice L, we define the flip of a position (x, y) to a cell c to be the new lattice L[c (x, y)] where The CPM simulation is as follows: uniformly pick a position (i, j) of L and pick a neighbor position occupied by a cell c (uniformly distributed on the set N (i, j)), i.e. the candidate flip. The CPM probabilistically accepts or rejects the flip, i.e. keeps L or update it to L[c (i, j)], according to the energy of both lattices and the temperature-dependent probability distribution discussed in the main text, when A computation starting at time t s and ending at time t e performs t e t s Monte Carlo steps each one attempting nmk random flips, as discussed in the main text. Once all the attempts of flips are finished, the new lattice is determined as a result of all the accepted flips. A stochastic process {L(t) | t = 0, 1, . . .} whose states are the lattice configurations, underlies the CPM. Technically, such a process is a Discrete Time Markov Chain.

The boolean model of Gene Regulatory Network
Let x(t) be the RBN binary vector-state at time t, its i-th component in x(t + 1) is An execution of an RBN is a series of steps from an initial state x(0) . . which, since the state-space is finite (i.e. for w genes there exist at most 2 w vectors in {0, 1} w ) and the dynamics is fully deterministic, will enter a limit cycle x(k 1 ) ! . . . ! x(k n ) from any initial condition. Such a cycle is termed RBN attractor, the sub-sequence from x(0) to x(k 1 ) (excluded) is its transient; the set of initial conditions ending up to the same attractor is its basin of attraction.
Our approach first selects, either exhaustively or randomly, a set I = {x 2 {0, 1} w } of initial conditions, and determines numerically its associated set of attractors A I . Iteratively, in each attractor in A I random flips are performed, i.e. some x i is complemented in {0, 1} yielding a new state x 0 . The attractor ↵ reachable from x 0 is then computed and included in A I , if ↵ 6 2 A I . By an arbitrary iteration of this process the ATN A of the subsumed NRBN is drawn as follows: for each perturbation yielding a jump among attractors ↵ and the entry a ↵, 2 A is incremented, the matrix is finally normalized. As running example, consider the following ATN 0 The di↵erentiation tree T induced by A is generated according to the strategy discussed in [30]. With no aim of exhaustivity we briefly recall it here. Firstly, A is interpreted as the adjacency matrix of the graph of the attractors G A . Secondly, the set of strongly connected component in G A is evaluated by standard algorithmic techniques; this is the TES at threshold 0, T 0 , which is supposed to contain all the nodes of G A . If T 0 does not contain all the required nodes, the RBN is rejected and the process restarts with a new network. Otherwise, T 0 is assigned to the root of the di↵erentiation tree. In the above ATN, This process now iterates proportionally to the size of the tree we want to generate, which has a maximum theoretical depth proportional to the number of nodes when T reduces to a path. The idea is that a threshold 0 < < 1 is picked, A is pruned of any a ↵, < , and the new number of TESs at threshold , T , is evaluated. The set of direct descendants of the root of T is defined by T via an injective map, i.e. each TES in T defines a unique descendant. In the example above, when = 0.053 the pruned ATN is (in bold the pruned entries) 0 The process stops when no more TESs emerge, and the last descendants determined become the leafs of T . For the example above further candidates thresholds are 0.06 and 0.068.

The multiscale link
Each A, re-normalized once a threshold is fixed, is actually a Discrete-Time Markov Chain (DTMC) and the sub-matrix A ✓ associated to the emerging TESs ✓ is, by definition, a ergodic DTMC (i.e. it is possible to go from each state, in any number of steps, to every other state). In DTMCs terminology the transition probability matrix A ✓ is irreducible and thus the stationary probability distribution of the chain is unique and can be evaluated as the fixed point of the recursive transformation where p 0 is an arbitrary initial distribution over the states of the chain. By fixing an arbitrary small ✏, this fixed point can be evaluated yielding an approximation ⇡ ⇤ ⇡ ⇡, which can be used as explained in the main text.

Algorithmic implementation of the model
We describe the pseudocode of the model implementation. For clarity, we separate the general simulation driver for the model (Algorithm 1), the search of the suitable NRBN (Algorithm 2), the set up of the multiscale link (Algorithm 3), the external CPM dynamics (Algorithm 4) and the internal (Algorithm 5) dynamics.

14:
determine the thresholds, the Threshold Ergodic Sets (TESs) and the NRBN-induced di↵erentiation tree T NRBN with the algorithm discussed in [30]; 15: until T matches T NRBN {use a standard tree-matching algorithm} 16: Output: the matched NRBN, the ATN, the TESs and the thresholds; Algorithm 3 Multiscale link set up. 1: Input: the ATN and the TESs; 2: given the TESs, determine the steady state probability ⇡; {use numerical algorithms for eq. (17)}; 3: given ⇡, determine for each cell type ⌧ the cycle length`⌧ as of equation (5); 4: determine the average cell cycle lengthˆ⌧ ; 5: evaluate the time unit conversion as of equation (6); 6: Output: each value`⌧ andˆ⌧ ; Algorithm 4 Cellular Potts Model dynamics, single MonteCarlo step.
1: Input: the current lattice L; 2: for h · w · k times (see Table 2 if c has completed its cell cycle then 9: evaluate the next di↵erentiation type ⌧ according to the attractor the cell is assigned to, and T ; increase linearly the area target A(c); {this is as of equation (7)

Networks analysis
In our model GRNs drive the spatial dynamics, thus we analyzed di↵erent properties of the networks matching the crypt lineage tree. In Table 1 one can find some statistics for the 7 suitable network used in the simulations: (i) the length of any single attractor of the network (i.e. the number of states of all the gene activation patterns) |↵|; (ii) the robustness of any attractor, in terms of probability of remaining in the same attractor after one single flip perturbation, which is the first entry of ATN matrix, A(↵, ↵); (iii) the stationary distribution of the ATN for each attractor ⇡(↵), which accounts for the average portion of time (in percentage) spent in each attractor and which is used to weigh the length of attractors when computing the cell cycle length.
Please refer to the Results section for the comments on these measures.