Epigenetic regulation of cell fate reprogramming in aging and disease: A predictive computational model

Understanding the control of epigenetic regulation is key to explain and modify the aging process. Because histone-modifying enzymes are sensitive to shifts in availability of cofactors (e.g. metabolites), cellular epigenetic states may be tied to changing conditions associated with cofactor variability. The aim of this study is to analyse the relationships between cofactor fluctuations, epigenetic landscapes, and cell state transitions. Using Approximate Bayesian Computation, we generate an ensemble of epigenetic regulation (ER) systems whose heterogeneity reflects variability in cofactor pools used by histone modifiers. The heterogeneity of epigenetic metabolites, which operates as regulator of the kinetic parameters promoting/preventing histone modifications, stochastically drives phenotypic variability. The ensemble of ER configurations reveals the occurrence of distinct epi-states within the ensemble. Whereas resilient states maintain large epigenetic barriers refractory to reprogramming cellular identity, plastic states lower these barriers, and increase the sensitivity to reprogramming. Moreover, fine-tuning of cofactor levels redirects plastic epigenetic states to re-enter epigenetic resilience, and vice versa. Our ensemble model agrees with a model of metabolism-responsive loss of epigenetic resilience as a cellular aging mechanism. Our findings support the notion that cellular aging, and its reversal, might result from stochastic translation of metabolic inputs into resilient/plastic cell states via ER systems.

In this appendix, we present supplementary materials regarding the methods of our manuscript Epigenetic regulation of cell fate reprogramming in aging and disease: A predictive computational model. The materials presented below further contains a summary of the Approximate Bayesian Computation method, as well as further simulation results, statistical analysis, and materials (such as parameter values). We also include Supplementary Figures which complement the results discussed in the main text.

I. FURTHER SIMULATION RESULTS REGARDING THE BIFURCATION ANALYSIS
We present simulation results verifying the bifurcation analysis of the equations discussed in Section Variation in the abundance of HDM and HDAC drives epigenetic switch, of the main text. In particular, we explicitly show the existence of the hysteresis cycles predicted by our bifurcation analysis. For concreteness, since the behaviour of both differentiation-and pluripotency-promoting genes is qualitatively similar, we specifically focus on simulations of the differentiation-promoting gene. Differences with the pluripotency case only concern the quantitave value of the critical (bifurcation) points, not the behaviour of the system. Fig. A shows simulation results where we have set the number of HDAC enzymes, v 0 , to v 0 = E (see Table A). Note that according to the description in the main text, this is equivalent to fixing the HDAC concentration to e HDAC = 1, since e HDAC = v0 E . We then vary z 0 which, according to the main text, where we define e HDM = z0 E , is the same as varying HDM concentration. Fig. A shows results regarding the empirical distribution of x 3 , P (x 3 ), where x 3 ≡ X 3 (t inf )/S with t inf the duration of the simulation which is taken long enough so that the system settles onto its quasi-steady state starting from prescribed initial conditions X i (t = 0). In order to ascertain whether the system exhibits the hysteresis cycle predicted by our bifurcation analysis as z 0 changes (see Fig. 3(c), main text), we first set an initial condition with X 1 (t = 0) = 0, X 2 (t = 0) = 0.9S, X 3 (t = 0) = 0.1S. This allows us to explore the behaviour of the system along the lower stable branch (corresponding to closed chromatin) of the diagram Fig. 3(c) of the main text. Similarly, by setting initials conditions to X 1 (t = 0) = 0, X 2 (t = 0) = 0.01S, X 3 (t = 0) = 0.99S, in Fig. A(b) we trace the behaviour of the system along the upper stable branch (associated with open chromatin). Fig. A(a) shows that, for the prescribed initial condition, the system exhibits a unimodal distribution around the closed chromatin state for small values of z 0 . As z 0 increases and approaches the critical value where the closed chromatin state ceases to exist, fluctuations increase (as shown by the bimodal behaviour of P (x 3 )) thus heralding the onset of a phase transition. Beyond this point, P (x 3 ) exhibits unimodal behaviour around the open chromatin state. These results, including the value of the critical point (which in the simulations is formally characterised by a divergence of the variance of x 3 ), are in agreement with those obtained from the bifurcation analysis (see Fig. 3(c), main text).
In Fig. A(b) we trace the other half of the hysteresis cycle. By setting initial conditions to X 1 (t = 0) = 0, X 2 (t = 0) = 0.01S, X 3 (t = 0) = 0.99S, P (x 3 ) exhibits unimodal behaviour around the open chromatin state for larger values of z 0 . As z 0 is reduced and approaches the critical value for which the open chromatin state ceases to exist, fluctuations are again observed to increase, i.e. P (x 3 ) becomes bimodal around the critical point. Beyond this point, P (x 3 ) recovers unimodal behaviour but peaked around the closed chromatin state. Results, including the value of the critical point, are again in excellent agreement with the bifurcation analysis (see   Fig. 3(e), main text. Here, we have set the number of HDM enzymes, z 0 , to z 0 = 0.2E (see Table A). We then simulate the behaviour of the system as the number of HDAC molecules, v 0 , is changed. We observe the same excellent agreement between simulations and bifurcation analysis as we do between the numerical and analytical results shown in Figs. A and 3(c) (main text), respectively.  Table 1, main text. Parameter values are obtained from Tables A and 2, main text. See main text for more details.

II. REFERENCE PARAMETER VALUES: REFRACTORY & PLASTIC SCENARIOS
Here we give the sets of parameter values used to produce the phase diagrams   genetic state for the differentiation (pluripotency) gene. Besides this, we further require that, for the reprogrammingresilient phenotype there is no overlap of the bistability regions, whereas for the plastic phenotype we require the area between the solid red line and the dashed blue line (see

III. SENSITIVITY ANALYSIS AND ROBUSTNESS OF THE PLASTIC SCENARIO
As a first step towards the analysis of the robustness of the plastic scenario associated with the situation depicted in the bifurcation diagram shown in Fig. 3(f) of the main text, we have carried out an exhaustive parameter sensitivity analysis. In particular, we are interested in which of the reaction rates k j are critical to obtain the epigenetic regulation associated with the differentiation gene (i.e. the steady state behaviour corresponding to open chromatin) and which ones are critical for the epigenetic regulation corresponding to the pluripotency gene (i.e. the steady state behaviour corresponding to closed chromatin). We have used an Approximate Bayesian Computation (ABC) method [1] to perform our exploration of parameter space.
ABC methods have been devised to tackle those inference problems for which the estimation of the likelihood function is computationally too demanding. Let θ = (k 1 , . . . , k 16 ) be the vector whose components are the parameters to be estimated and x be the data. The general aim is to approximate the so-called posterior distribution, π(θ|x), i.e. the conditional probability of θ given the data, from a prior distribution of the parameters, π(θ). In general, the posterior is given by π(θ|x) ∼ f (x|θ)π(θ), where f (x|θ) is the likelihood function. All ABC methods follow the same generic procedure: • Sample a candidate sequence of parameters, θ, from the proposed prior distribution, π(θ).
• Sample or simulate a data set x from the model represented by the conditional probability density f (x|θ).
• Compare the simulated data set, x, to the experimental data, x 0 , according to some distance function, d(x, x 0 ).  If d(x, x 0 ) ≤ , where is an a priori prescribed tolerance, then θ is accepted.

Rescaled parameter Scaling parameter Units
The result of this algorithm is a sample of parameters from a distribution π(θ|d(x, x 0 ) ≤ ).

Rescaled parameter Scaling parameter Units
2. Simulate data set, x * , from the Master Equation with transition rates as per Table 1, main text, using Gillespie's stochastic simulation algorithm. We generate 10 realisations and collect data at times t i , i = 1, . . . , 25, corresponding to the raw data time points.
3. For each time point, t i , we compute two summary statistics: the mean over the 10 realisations, x * (t i ), and the associated standard deviation, σ * (t i ).

If
where x 0 denotes the experimental data.

Go back to
Step 1.
We run this algorithm until the number of accepted parameter sets reaches 10000. This method has been used to generate an ensemble of differentiation epigenetic regulation systems (see Fig. C) and an ensemble of pluripotency epigenetic regulation systems (shown in Fig. D).

IV. KOLMOGOROV-SMIRNOV ANALYSIS
In order to analyse the statistical significance of our results of Section Heterogeneity and robustness of the refractory and plastic scenarios (main text) regarding the shapes of the distributions of the kinetic parameters, k j , associated with the different scenarios we consider (refractory vs plastic), we resort to a well-known method, namely, the Kolmogorov-Smirnov (KS) test [2,3]. The KS test is a non-parametric test which allows to evaluate the equality of continuous probability distributions. This test can be used both to compare an empirically obtained sample with a reference probability distribution, or to compare two empiricial samples. In other words, the KS test allows us to tell whether two distributions are the same within the level of confidence we desire. Since this is a wellknown technique, we will not go into its details and we will report our results. Interested readers are referred to the specialised literature for details [2,3]. Throughout this section, we impose a level of confidence of 95 %.

A. Comparing the viable subset with the uniform distribution
The first test we are interested in carrying out consists on checking which kinetic constants, k j , exhibit a non-uniform distribution within the viable subset. These parameters are the ones deemed to play a substantial role in the associated behaviour (i.e. viable (base-line) conditions of the differentiation/pluripotency ER system) [4]. The null hypothesis for the test is therefore whether the empirical cumulative frequency distribution (CFD) of each k j is equal to the uniform. Since we are using a confidence interval of 95%, whenever the p-value is larger than 0.05 the null hypothesis cannot be rejected, i.e. the parameter is deemed to be uniformly distributed. As we mention in Section Heterogeneity and robustness of the refractory and plastic scenarios (main text), the null hypothesis is rejected only for k 1 , k 3 , k 6 , k 7 , k 12 , k 14 , and k 16 (differentiation-promoting gene) and for k 3 , k 8 , k 12 , k 14 , k 15 and k 16 (pluripotency-promoting gene). The reported p-values are given in Table E and Table F, respectively.

B. Comparing the plastic sets with the viable subset
We continue our analysis by testing the CFDs of the kinetic constants when we consider those parameter sets that exhibit plastic behaviour. We analyse which parameters have different distributions when compared to their distributions within   the viable set. These parameters are the ones deemed essential for the associated behaviour (i.e. plastic behaviour). The null hypothesis for the test is therefore whether the empirical CFD of each k j for the plastic sets is equal to their empirical distributions within the whole viable set. Since we are using a confidence interval of 95%, whenever the p-value is larger than 0.05 the null hypothesis cannot be rejected, i.e. the parameter is deemed to be uniformly distributed. As reported in Section Heterogeneity and robustness of the refractory and plastic scenarios (main text), the null hypothesis is rejected only for k 1 , k 9 and k 14 (differentiation-promoting gene) and for k 2 and k 6 (pluripotency-promoting gene). The reported p-values are given in Table G and Table H