Supporting information for: A mesoscopic simulator to uncover heterogeneity and evolutionary dynamics in tumors

1 Deparment of Mathematics, Mathematical Oncology Laboratory (MOLAB), Universidad de Castilla-La Mancha, Avda. Camilo José Cela, 3, 13071 Ciudad Real, Spain. 2 Department of Mathematics, Universidad de Cádiz, Avda. República Saharaui s/n, 11510 Puerto Real, Cádiz, Spain. 3 Biomedical Research and Innovation Institute of Cádiz (INiBICA), Avda. Ana de Viya 21, 11009 Cádiz, Spain 4 Department of Physics, University of Tehran, Tehran, 14395-547, Iran.


S2 Appendix. Bayesian optimization of mutation weights
We performed a parameter optimization in order to estimate a set of mutation weights that provide an alteration frequency resembling that of glioblastoma. There is a mutation weight for each alteration (3 considered in the model) affecting each basic process (4 processes), so we need to explore a 12-dimensional parameter space in an efficient way. In this work we resorted to Approximate Bayesian Computation (ABC) rejection algorithm, due to the lack of direct experimental measures of mutational weights. In this sense, our ab initio approach is not intended to provide the best 12-dimensional set of mutational weights for glioblastoma, but a reasonable guess (in the form of a probability distribution), considering that parameter exploration in 12-dimensional space may be prone to getting stuck in local minima. That is the reason why much more thorough and computationally intensive procedures were discarded for this purpose.
The ABC rejection algorithm is based on Bayes' theorem: where the conditional probability of a specific parameter value θ given a dataset D is related to the probability of D given the parameter value θ. Using the nomenclature of Bayesian statistics, p(θ|D) denotes the posterior, p(D|θ) the likelihood, p(θ) the prior, and p(D) the evidence. The prior is understood as the probability distribution that encompasses our beliefs regarding parameter value θ before knowing D. Applied to the case of the model, our prior will be the knowledge-based guess about how alterations in glioblastoma affect basic cell processes. According to Bayes, from a reasonable prior it is possible to estimate a posterior given that we know the evidence p(D) (the data that we have) and the likelihood p(D|θ) (the probability of obtaining D given the parameter value θ). The goal of ABC rejection algorithm is precisely to approximate the posterior by iteratively refining the prior.
To do so, ABC rejection algorithm first step consist on sampling N parameter pointŝ θ from the prior, and then running one simulation per sampled parameter point with model M . This will lead to a simulated datasetD, that will differ to some extent from observed dataset D. Depending on a tolerance , we accept simulations fromD that keep true this criterium: is a measure of the discrepance between observed and simulated datasets. From accepted simulations we can build a posterior distribution p(θ|D), and use it as a prior for the next iteration of the ABC algorithm. After several steps, it is expected that: so that we inferred a plausible posterior solely from our prior and simulation data, using observed data as rejection criteria.
In our approach, we run a cohort of N = 10 4 simulations, fixing characteristic times of basic processes, and letting mutation weights be randomly chosen from a uniform distribution between 0 and 1 (p(θ) = U(0, 1)). The only restraint applied at this point was that the sum of mutation weights for a single process could not be greater than one (see Methods in the main text). Once all simulations had finished, we selected a diagnostic volume for all of them, randomly sampled from an empirical distribution built with data from The Cancer Genome Atlas (TCGA) (Fig 1). The alteration frequencies at diagnostic (D) were compared with their equivalents coming from TCGA data (D): 75 % of tumors sampled from TCGA have alterations in at least one gene from RTK/PI3K/RAS pathway; 66.67 % of tumors have alterations in at least one gene of Retinoblastoma pathway, and finally 15.48 % of tumors have P53 altered. Following this comparison, we rejected simulations whose alteration frequencies were not similar to those from real patients ( = 0.3, with d(D, D) being the euclidean distance). The mutation weights from accepted simulations were retrieved and used to build a posterior distribution, thus providing a first coarse estimate of mutation weights for glioblastoma. The repetition of this process of random selection of weights, simulation, and rejection of non plausible tumors is intended to refine the posterior distribution of mutation weights, given that random sampling is performed over the posterior distribution obtained in previous step. alterations in P53 pathway. Blue corresponds to cell division, red to cell death, green to mutation, and pink to migration. (B) Mutation weights for each process associated to alterations in Retinoblastoma pathway. (C) Mutation weights for each process associated to alterations in RTK/PI3K/RAS pathway. Note that weights associated to death are all negative, as alterations increase characteristic time associated to this process (instead of reducing it, as in the other processes).
In our case, we accepted 80 of 10 4 simulations in the first iteration, what makes a rejection percentage of 99.2%. With accepted simulations, we built a posterior distribution for each mutation weight. To do so, we used ksdensity function from Matlab. It is noteworthy that our initial guess was close to the most likely mutation weight in some cases, such as RTK weights for division and death. Moreover, attending to all posteriors, our initial guess seems plausible.
We explored the relationship between mutation weights from accepted simulations in order to uncover any hidden correlations (see Fig 4). Apart from the obvious positive self-correlation, we observed that mutation weights from different alterations affecting the same process were negatively correlated. This is a result of the implemented restraint that ensures G i=1 w i < 1. Aside from that, no other strong correlations were observed. (K) Mutation weight associated to mutation of RTK. (L) Mutation weight associated to migration of RTK. Red dots denote our prior choice of weights for initial simulation tests. Note that some of our prior speculations seem consistent for the model: i.e., division and migration weights are likely small for P53 and RB, while for RTK they can take high values.
This already gives an idea about reasonable parameter values. However, in order to see the algorithm convergence, we performed a second iteration, random sampling mutation weights from previous posterior distribution. This iteration yielded 230 accepted simulations from 10 4 , making a rejection percentage of 97.7%. In Fig 2 there can be seen the mutation weights of accepted simulations from second iteration, while in