Stochastic modeling reveals kinetic heterogeneity in post-replication DNA methylation.

DNA methylation is a heritable epigenetic modification that plays an essential role in mammalian development. Genomic methylation patterns are dynamically maintained, with DNA methyltransferases mediating inheritance of methyl marks onto nascent DNA over cycles of replication. A recently developed experimental technique employing immunoprecipitation of bromodeoxyuridine labeled nascent DNA followed by bisulfite sequencing (Repli-BS) measures post-replication temporal evolution of cytosine methylation, thus enabling genome-wide monitoring of methylation maintenance. In this work, we combine statistical analysis and stochastic mathematical modeling to analyze Repli-BS data from human embryonic stem cells. We estimate site-specific kinetic rate constants for the restoration of methyl marks on >10 million uniquely mapped cytosines within the CpG (cytosine-phosphate-guanine) dinucleotide context across the genome using Maximum Likelihood Estimation. We find that post-replication remethylation rate constants span approximately two orders of magnitude, with half-lives of per-site recovery of steady-state methylation levels ranging from shorter than ten minutes to five hours and longer. Furthermore, we find that kinetic constants of maintenance methylation are correlated among neighboring CpG sites. Stochastic mathematical modeling provides insight to the biological mechanisms underlying the inference results, suggesting that enzyme processivity and/or collaboration can produce the observed kinetic correlations. Our combined statistical/mathematical modeling approach expands the utility of genomic datasets and disentangles heterogeneity in methylation patterns arising from replication-associated temporal dynamics versus stable cell-to-cell differences.


Monotonic vs. Reversible Methylation Models
We analyzed CpG sites using both the monotonic, 2-parameter model (Eq. 2 in the main text) and the reversible, 3-parameter model (Eq. 5 in the main text). We compared the output of these two models and performed model selection using the Bayesian Information Criterion (BIC), using the formula BIC i,m = −2l(θ i,m ) + p m ln(n i ), (1) where i is an individual CpG site index, l(θ i,m ) is the log-likelihood which is maximized for model m by parametersθ at that given site, p m is the number of parameters for the model m, (i.e., m = 2 or m = 3), and n i is the number of datapoints for site i. We use a threshold ∆BIC of 2 to identify a site as "reversible". That is, we select the reversible, 3-parameter model for site i when: For Chromosome 1, 14220 out of 876410 sites were identified as reversible with this procedure, or 1.6%. Examples of sites selected for either the monotonic or reversible models are shown in Fig. A.

Choice of Sites to Retain in Analysis
In the experimental Repli-BS dataset, read-depth measured at each CpG site and each timepoint is highly variable. In general, higher read-depth (more samples) leads to higher confidence in estimated parameters. We compared two methods of filtering sites: using a read-depth-based cutoff, and using a confidence-interval-based cutoff.
In the first method (results presented in Main Text and all supplementary figures, unless otherwise noted), all CpG sites are subjected to a cumulative read-depth cutoff of N = 15, where at least 10 reads must be acquired at time 0, and at least 5 over subsequent three timepoints. This leads to retaining approximately 40% of CpGs genome wide. A drawback of this method is that statistically it leads to different stringency in different kinetic regimes, since the information gain from reads at different timepoints depends on the kinetics at that site.
Alternatively, we developed a confidence-interval-based cutoff to retain sites only when the width of the Confidence Interval (CI), as estimated by the Profile Likelihood method, is narrower than some threshold. For this analysis, we chose to use CIs on k. For many CpGs, it is not possible to estimate the full width of the 95% CI, since the fast kinetics are not constrained by the experimental timepoints, as described in the Main Text. This poses a challenge to determining a uniform CI-cutoff that can be used across all CpGs. Thus, we estimate the CI half-width (CI HW ). For fast sites where the upper CI limit is not identifiable, we use CI HW = logk − logCI − 95 , or the difference between the ML estimated k value and the lower 95 CI limit. For sites where the full 95 CI is identifiable, we use CI HW = (logCI + 95 − logCI − 95 )/2, or the average width of the upper and lower sides of 95 CI interval.
Choosing variable CI HW thresholds, we find that the number of CpGs retained varies, but that qualitative fitted parameter distributions and correlation functions remain largely unchanged. Results are shown in Fig. B.

MLE Validation
Statistical inference of f and k for individual CpG sites required an assessment of the accuracy of such estimation. To do so, 1.5 · 10 4 sites were each assigned known values of fraction methylation (f ai ) and remethylation rate (k ai ), and remethylation kinetics were simulated in silico according to our two-parameter model. Then, MLE was used to infer for each site a fraction methylation (f ini ) and remethylation rate (k ini ), which could be compared to f ai and k ai , respectively, so to determine the accuracy of the inference.
To perform the simulation, each site i was assessed the probability of being remethylated at each time-point, j = [0.5, 1.5, 4.5 and 16.5 hr], according to f ai and k ai (p ij , see Equation 2 in the Main Text). Also, the number of reads each site would display at each timepoint (n ij ) was sampled from the experimental values of Chr1. Then, for a given site i at the time point j, n ij random numbers from 0 to 1 were generated, and compared to p ij . All random numbers below p ij were considered as methylated, while the rest unmethylated. This was repeated for the other 3 time-points, and all sites, resulting in 1.5 · 10 4 sites, with methylated and unmethylated reads at the 4 time-points according to their assigned kinetic parameters. Hence, this in silico data could be analyzed using MLE, and infer for each site f ini and k ini , which could subsequently be compared to the "ground-truth" values, that is the assigned, f ai and k ai .
In general, we observe how MLE is able to qualitatively recover assigned k ai values (Fig. C, A Top). However, when a significant fraction of sites are assigned k a values beyond the established lower bound (approx. 10 hr −1 ) k in -distributions appear to be abruptly trimmed (Fig. C, A Middle), for limited time resolution do not allow us to infer rates faster than this limit. Finally, when using more complex distributions, such as the combination of two lognormal functions, MLE allows the recovery of those shapes with relative accuracy (Fig. C, A Bottom).
Also, using a uniformly populated discrete distribution comprising 15 values from 0.03 to 100 hr −1 to assign k a , it is observed that the accuracy of the MLE inference of the remethylation rate constant depend on the magnitude of the assigned k. k-values lower than 0.32 hr −1 are inferred with increasing uncertainty (Fig. C B Top), and values faster than 10 hr −1 are again assigned to our upper limit. However, our method can accurately estimate a wide range of values, from 0.5 to 5 hr −1 . In the case of f , a discrete distribution comprising 10 values from 0 to 1 was used to assign f a values. It is observed that values of 0 or 1 are assigned with greater accuracy than intermediate values, especially in the (0, 0.5) interval ( Fig. C B Bottom).  (Top) and f in (Bottom) when assigning discrete distributions of f a and k a . Assigned values lie onto the red-stripped line, while for the distributions of inferred parameters, the median (blue circles), the 50th percentile (black lines), the 75th percentile (orange lines), and 95th percentile (green lines) are represented. For f -analysis, assigned values of f a ranged from 0 to 1, while k a values were sampled from fitted k values in Chr1. For k-analysis, assigned k a values spanned from 0.03 to 100 hr −1 , and assigned f a were sampled from Chr1 inferred values of f .
Effect of the number of reads on the accuracy of the estimation MLE validation allowed us to analyze the effect of the number of reads, or read-depth (RD), on the inference of remethylation rates. RD shows great variability along a chromosome, with some sites displaying more than 100 reads in total, while others hardly contained more than 5. On average, however, most sites displayed 5 reads at t=0.5, and 5 more at other time-points. Intuitively, statistical inference of poorly covered sites was expected to be more inaccurate than those with more reads. For this reason, we included into the MLE method a restriction regarding the RD at t=0.5 (RD t0 ), and the total RD for the rest of timepoints (RD later ). Any site with less reads would be disregarded, for it was considered that it did not contain enough reads to be analyzed. However, the more restrictive the method was, the larger the fraction of sites that were neglected. Therefore, we wanted to assess to what extent the RD could affect the accuracy of the remethylation rate inference, so as to reach a compromise between the quality of sites in terms of their sampling, and the quantity of sites that were perserved.
To that end, a set of 6000 in silico sites were assigned remethylation rates from a discrete and uniform distribution of 6 values from 0.56 to 10 hr −1 , and fitted using increasingly restrictive conditions in terms of RD t0 and RD later . For the less restrictive conditions (RD t0 =0 and RD later =0), the average relative error was around 40%, while being less than 32% for the most restrictive method, RD t0 =10 and RD later =10, (Fig D  A Left). Accordingly, the fraction of sites in Chr1 which were disregarded by MLE with increasing restrictiveness amounted to 75% for the most restrictive conditions (Fig D A  Right). Therefore, MLE is proven to be suitable to estimate the order of magnitude the remethylation rate constant of a given site lies in, rather than estimating the exact value with accuracy.
In that sense, when observing the mean square error (MSE) of f and k values obtained when applying MLE on a subset of sites in Chr1, using increasing number of reads, we observed how more reads contributed to a linear reduction in the MSE in terms of k (Fig D B Top). However, when representing log10(k), which provides information regarding the order of magnitude of k, the MSE reached a plateau around 15 reads (combining RD t0 and RD later ), and that more reads did not contribute to a significant improvement (Fig D B Bottom). After this analysis, RD t0 of 10 and a RD later of 5 were chosen as suitable conditions that allowed that sites with remethylation rates ranging from 0.56 to 10 h −1 were inferred, on average, with less than a 32% of relative error, and preserving, on average, 40 % of the CpG sites of each chromosome.
Regarding f −estimation, f a values were sampled from WGBS measurements from Chr1 in arrested HUES64 cells [1], and k a were sampled from fitted k values in Chr1. It was determined that fraction methylation could be on average inferred with a ±0.1 in terms of absolute error. Effect of the Read-depth on the average relative error of k-inference of those sites assigned k ai values from 0.56 to 10 hr −1 (left) and fraction of Chr1 sites that fulfill different read-depth restrictions in terms of RD t0 and RD later (right). B: Effect of the Read-depth on the mean-squared error for k (top), log10(k) (middle), and f (bottom). The analysis was performed selecting 121,000 sites from Chr1 that had at least 30 reads (>= 20 at time 0, >= 10 later). From this subset, increasing number of reads were sampled [2,4,...,N-2,N,N+2,...,26) for each site, without replacement, and MLE was performed to fit f and k. The mean squared error of estimates for k, log10(k), and f was determined comparing f and k values of a given subset of fitted sites sampling N reads with f and k inferred after sampling N-2 reads.

Effect of the Bayesian Prior
In order to test the robustness of the two-parameter exponential model in terms of the value of inferred k and f values, they were compared to the results of using a Bayesian Prior method that could incorporate previous information regarding f . The major difference in the estimation method between these results and those of the main text was that, while in the main text no specific assumptions were made on the values of the parameters, here WGBS experiments in arrested cells were assumed to be informative on f parameters. These independent experimental estimates were included in the calculation of the likelihood function as priors according to Bayes formula. However, note that, for the main text results, although a prior distribution on the parameters was not defined explicitly, priors were implicitly included by construction of the parameter space in calculation of the likelihood surface. That is, f technically has a uniform prior from [0,1], and k has uniform (in log10 space) prior from 10 2 to 10.) Overall, we find that results were qualitatively consistent between the two inference approaches. Some quantitative differences were seen, but the general results were the same. Specifically, the general shapes and variances of inferred parameter distributions were insensitive to the design of the prior distribution on f . The correlation of k with genomic distance was qualitatively similar in both cases (though the correlation appears quantitatively reduced overall in the experimental-prior case). Also, the correlation of f with genomic distance was increased by the experimental prior, as expected. While k-estimates on individual sites were affected by the design of the f-prior, there was a high degree of correlation ( .84) between individual estimates from both methods.

Effect of Including Experimental Error Estimates
The probability of a methylated read (denoted '1') to be present on the nascent strand at a time t post-replication can be extended to account for the experimental errors. Namely, given a false-positive rate E p (the probability of a false methylation count) and a false-negative rate E n (the probability of a false non-methylation count), the probability of observing a methylated read is described in Eq. 6 (Main text).
We compared the results of MLE of the parameters from Repli-BS data using the original formula (no explicit accounting of experimental error) to results using the extended formula with error, above. In the absence of quantified error values for the specific experimental system, we tested a range of error estimates, and found no significant effect on the distributions of inferred parameters chromosome-wide, though some individual site-estimates are impacted (Fig. F). Here, the false-positive rate E p was assumed to be 1%, and the false-negative rate E n was assumed to be 0.1%. The largest source of error is assumed to occur in bisulfite conversion efficiency. Since this is estimated to be arround 99% or higher, E p is estimated at 1%. Smaller sources of error in sequencing, base-calling can affect both E p and E n . B: Correlation of fraction methylation f (Right) and remethylation rates k (Left) with GD including the experimental error. C: Correlation of inferred f and k-values with and without experimental error. Overall correlation between the two sets of estimates is 0.9862 for f and 0.9967 for k.

Effect of Including Time as a Random Variable
An alternative method to treat the 0-hour-post-pulse as t = 0.5 hours post-replication (and so on for the other experimental timepoints), is to treat post-replication time as a uniformly distributed random variable over the interval of one hour (the duration of the BrdU pulse). Hence, if in our previous model t was defined as: now each timepoint t j will be a random variable, uniformly distributed following: hence assuming that replication initiated anytime within the 1-hr BrdU pulse window.
We compared the results of MLE of the parameters from Repli-BS data using the original method (no random time-variable) to results using time as a random variable. Overall, we found no significant effect on the distributions of inferred parameters chromosome-wide, though some individual site-estimates are impacted (Fig. G).

Single-basepair-level stochastic enzyme-kinetic models
Distributive mechanism The distributive model, which serves as a common backbone for the Processive and the Collaborative models, is based on a Compulsory-Order Ternary-Complex Mechanism (COTCM), by which DNMT1 (E) first binds the hemimethylated CpG (h) to form the Eh complex. Then, SAM (S) can form a ternary complex named ESh. Species m stands for the methylated CpG, and Q for a SAM molecule which, after methylation, has lost its methyl group. In order to assess the value of the forward and reverse rate constants for the first two binding reactions 1 and 2, being the first the binding and unbinding of E with h, whose rate constants are presented as k 1f and k 1r respectively, and the second the incorporation of S to the Eh complex to form ESh and its dissociation, presented as k 2f and k 2r respectively, four mathematical relationship can be derived from the classical model COTCM [2]. When doing so, it has been assumed that after ESh formation, methylation and enzyme turnover can take place in one irreversible and limiting step, whose rate constant is k 3 [3] (See Fig. 2 in the Main Text).
Therefore, at t=0, the rate of the reaction can be defined as: For the purpose of this analysis, we assume that that all intermediates are in the steady state, thus being: is the initial concentation of SAM. On the other hand, the total concentration of enzyme, [E] 0 , can be defined as: From Eq. 7, it can be derived that: For the sake of simplicity, we will group some terms of Eq. 10 into p: From Eq. 8, it can be derived that: Again, for the sake of simplicity we will group some terms of Eq. 12 into a term named q: Thus, given Eq. 9: Therefore: We can now define the initial velocity of the reaction v 0 as: 13 Rearranging some terms in Eq. 16: If we now retrieve the definitions of p and q (Eq. 11 and 13, respectively),we obtain: Simplifying Eq 18: From Eq 19, we can define 4 mathematical expressions as: leaving Eq 19 as: The value of these 4 parameters (K mh , K mS , K ih , and k cat ) have been extracted from [4]. They reported the Michaelis constants K mh and K mS for the case of recombinant human DNMT1 for both hemimethylated small nuclear riboprotein-associated peptide N (SNRPN) exon-1 and SAM, as well as DNMT1 catalytic turnover k cat and the constant K ih . In the case of K mh and K mS , authors presented values for two different hemimethylated SNRPN exon-1 substrate, one methylated on the upper strand and another on the lower. Since our experiments were based on the remethylation of the whole genome after replication, in which two hemimethylated molecules are present, one with the upper and one with the lower methylated strand, an average value was taken for both. The same strategy was used to determine the value of k 3 , that can be directly associated with k cat by Eq 23. The assumed value of K ih , on the other hand, was extracted from the double reciprocal plot obtained by Pradhan et al. when representing different rates of reaction as a function of the concentration of SNRPN exon-1, while keeping SAM initial concentration constant.
Since the value of 5 kinetic constants (k 1f , k 1r , k 2f , k 2r and k 3 ) had to be determined out of 4 mathematical relationships (equations 20 to 23), based on 4 experimental parameters (K mh , K mS , K ih and k cat ), the value of either k 2f or k 2r had to be arbitrarily assessed. Eventually, the value of k 2r was determined to be 100 hr −1 . Interestingly enough, since the value of the forward process is proportional to the reverse (Eq 21), the effect of varying this arbitrary parameter on the model did not have any significant effect in terms of the kinetics of the model (Data not shown) or k-correlation with genomic distance in the case of the Processive and the Distributive mechanisms ( Fig. X and Y). Other parameters that had to be assessed were the number of CpG-sites in the whole genome (h 0 ), the number of DNMT1 copies in a cell (100-fold lower than h 0 ), or the nuclear volume of a mammalian cell (V). The concentration of SAM was set to be 200 times larger than DNMT1, so it was always present in sufficient amounts. Identical values for all these parameters were also used in both the Processive and the Collaborative model. The value of all parameters is displayed in Table A.
Since reaction rate constants k 1f and k 2f presented (µM · hr) −1 units, they had to be converted to copy-number units and scaled to the number of sites we were simulating (N sites ) following: Where k ifcopy corresponds to k 1f or k 2f in (copy · hr) −1 units, k if molar corresponds to k 1f or k 2f in (µM · hr) −1 units, h 0 corresponds to the number of CpG-sites in a cell nucleus, V corresponds to the volume of a cell nucleus, and N A corresponds to the Avogadro's constant. Similarly, the number of DNMT1 and SAM copies at the beginning of a simulation (E i and SAM i respectively) was determined scaling nuclear values (E 0 and SAM 0 respectively) according to the number of sites that were simulated (N sites ):

Processive mechanism
In the Processive mechanism, DNMT1 can diffuse linearly along DNA towards neighbor CpG-sites after methylation, traveling either upstream or downstream. In order to incorporate diffusion efficiently into the stochastic simulations, we applied a First Passage Time Kinetic Monte Carlo algorithm based on ref [?]. The diffusion model uses a 1D lattice with inter-lattice spacing r = 1 basepair. Consider an enzyme bound to a CpG at position x start along DNA where it has just catalyzed methylation, with potential target sites (neighbor hemimethylated sites) at distance d U and d D upstream and downstream, respectively, on DNA. The enzyme moves with diffusion coefficient D and can unbind from any site with rate k of f . The value of the diffusion coefficient of DNMT1 along DNA was assumed to be in the order of other 1D sliding coefficients of well-known transcription factors such as LacI and p53, reported in [8,9] (See Table A The Master Equation is expressed as an N × N matrix. The Eignevalues and Eigenvectors of the matrix can be utilized to numerically compute the First Passage Time Densities, given that the enzyme starts at position start , according to Gillespie's Eigenvalue Approach. Furthermore, the exit probabilities (the relative probability of exiting to each of the three exit states at time τ , given that the enzyme has not yet left the region before τ , and given that it started at x start ) was obtained by numerical integration of the Master Equation to obtain the relative relative flux of probability into the absorbing states at waiting time τ .

Collaborative mechanism
Collaborative recruitment reactions were assigned propensities k RecU and k RecD , which result from weighting k 1f with a distance-dependant function (See Eq 16 in the main text). Note that in this case the two sites involved in the recruitment do not have to be contiguous, and no distance restrictions were imposed. Hence, the second DNMT1 copy could virtually be recruited onto any neighboring site. However, this would imply including a large number of recruitment reactions to the model, significantly increasing the computation time. Extensive analysis of different simulations showed that recruitments onto sites further than the 200 th neighbor were practically nonexistent, albeit having non-zero propensities. Reactions corresponding to these further recruitments constituted less than 10% of all recruiting reactions, even when performing simulations with the highest CpG d substrates. In those substrates, CpG sites are all separated by a distance of 2 bp (the minimum distance these dinucleotides can be apart), so the propensities of furthest recruitments are maximized. Therefore, with the aim of reducing the computation times, the finite number of recruitments upstream or downstream was set to be nN col = 200.

Stochastic simulations
Both the Distributive, the Processive, and the Collaborative mechanisms were used to stochastically simulate DNA maintenance methylation by DNMT1 kinetics in the context of replication, using the Stochastic Simulation Algorithm [11]. Simulated DNA substrates contained N sites CpG sites that could be either hemymethylated (h), and thus undergo remethylation, or unmethylated (u), according to the methylation landscape of arrested HUES64 cells.
In the case of the Distributive mechanism (See Fig 2A in the main text), each site that was h at t=0 could present 4 different states along the simulation: h, Eh, EhS and m. For each time-step, 5 possible reactions could occur (1 f , 1 r , 2 f , 2 r , and 3) on one of the N sites sites. Then, the current state of the system changed according to the stoichiometry of the chosen reaction. The propensities of every reaction for every site were subsequently recalculated, and the process was repeated. To choose the reaction, the site it would occur on and the duration of every interval, two random numbers where generated in every step, following the Gillespie algorithm.
Immediately after time exceeded one of the experimental time-points (0.5, 1.5, 4.5 and 16.5 hr), the whole state of the system was recorded, saving as 'methylated' any site in m, and as 'unmethylated' any site in u, h, Eh or EhS. Eventually, the simulation stopped right after time would exceed 16.5 hr.
Regarding the Processive mechanism, the same procedure was followed, but the number and type of reactions, as well as the possible states a CpG site could adopt were different. A total of 8 reactions could take place, including reactions 1 f , 1 r , 2 f , 2 r , 3, and 6, in addition to the two diffusion reactions, upstream or downstream (See Fig 2B  in the main text). The propensities of the two processive reactions (k Dif U and k Dif D were calculated for every site, according to the distance between it and its two contiguous neighbors. The propensities of hops towards neighbors placed at a distance larger than 3600bp where directly set to 0. Moreover, the propensity of impossible hops, such as the one from the first CpG site on the 5' top of the DNA strand towards an hypothetical neighbor upstream, were also assigned a value of zero. The Processive mechanism contemplated 6 possible states for any CpG site, u, h, Eh, EhS, Em and m, being the first four included in the category of unmethylated, while the last two were considered as a methylated read when recording the state of the system at times 0.5, 1.5, 4.5 and 16.5 hr. Aside from these differences, the procedure in terms of simulation was identical to the Distributive mechanism. For the Collaborative mechanism, a total of 5+2nN col reactions could take place, where 5 includes reactions 1 f , 1 r , 2 f , 2 r , and 3 (See Fig. 2 C in Main Text), and nN col stands for the range of neighbors, upstream or downstream the recruiting site, onto which other DNMT1 copies can be incorporated in a collaborative fashion. This way, nN col = 4 would mean that a second DNMT1 could be recruited onto the first, second, third or fourth neighbor upstream the site where the first enzyme copy was bound, or onto the first, second, third, or fourth neighbor downstream, generating 8 additional reactions for each site. The propensity of each recruitment reaction onto each neighbor within the nN col range was determined according to the distance between the recruiting and the neighbor hemimethylated CpG sites, and a non-dimensional distance-dependent function (See Eq. 13 in the Main Text). Just like with the Processive mechanism, the propensity of impossible reactions, like DNMT1 recruitment downstream another copy bound at the top 3' of the DNA strand, was set to 0. In the Collaborative model, 5 possible states were contemplated for any site : u, h, Eh, EhS, and m, and the state of the system was again recorded at times 0.5, 1.5, 4.5, and 16.5 hr. While those sites displaying m where considered as methylated reads, any site in u, h, Eh, or EhS was again considered to be unmethylated.
Either using the Distributive, the Processive or the Collaborative mechanism, by Dataset N  Table B. Table of computed correlation coefficients (Pearson and Spearman) between ML inferred k and f values. Ground truth simulations show that the MLE procedure introduces correlation (-0.0475 and -0.0989, Pearson and Spearman, respectively, average of 3 trials.) The parameters inferred from data show a more pronounced negative correlation.
repeating the simulation on the same substrate a certain number of times (N Reads), at the end of the whole process N sites CpG sites were simulated, each with a certain number of methylated reads (m) and unmethylated reads (u) at every time-point. Therefore, each repetition out of N Reads can be compared to every experimental read. In that sense, and to replicate experimental conditions, in which sites show a larger number of reads at t=0.5 than the other three time-points, the results of 5 short simulations from 0 to 0.5 hr were added to 5 simulations from 0 to 16.5 hr, thus yielding 10 reads at t=0.5, and 5 reads at 1.5, 4.5, and 16.5 hr.
Eventually, this procedure gave rise to a simulation of the remethylation kinetics of N sites CpG sites. Each site displayed a certain number of methylated and unmethylated reads at times 0.5, 1.5, 4.5 and 16.5 hr, just like the experimental data, but according to any of the 3 possible mechanisms. This way, MLE fitting could be used to infer f and k for each of these simulated sites (f model and k model respectively), and compare their distributions and correlation with GD and CpG d , to elucidate if mechanistic differences can account for experimental observations.

Correlation between inferred k and f values
We find that the inferred k and f values are weakly negatively correlated. However, it is possible for some spurious correlation to be introduced by the MLE fitting procedure, due to the limited read depth. To assess this, we carried out "ground truth" simulations, similar to those described in Fig. C. Ground-truth values of k and f were generated in an uncorrelated manner. Simulated Repli-BS data was then produced with the timepoints and per-site-per-time read-depths chosen by sampling from the true data read-depths for Chr1. k and f were inferred from the simulated data and correlation was assessed (see Table B).
Remethylation rates for Chr 13 to 22  Remethylation rates distributions for low, medium, and high-density CpG sites with fraction methylation f < 0.9 and f > 0.9 for Chr1   Table A. 32 While a deep understanding of the effect of each parameter on the obtained correlation functions is beyond the scope of this work, some of the observed changes can be explained in the light of the mechanistic aspects of the Processive model. Namely, when setting D (the 1D-diffusion coefficient of DNMT1 along DNA) to 0, the correlation function drastically tends to 0 (Fig. X), because the propensity of DNMT1 to diffuse towards a neighboring site, defined as k Dif , is set to 0. This is also observed for nDist (i.e the maximum distance DNMT1 can travel in a processive way). When set to 0 bp, no processive reactions can take place, and correlation indeed disappears. These observations reveal that k model correlation with GD observed for the Processive model can be attributed to the set of processive reactions, rather than other reactions the model shares with the Distributive mechanism.
In the case of the Collaborative mechanism (Fig. Y), it can be observed how the correlation function of k model and GD is in general insensitive to changes on many of the model's parameters, such as nN col or k 2r , while being sensitive to parameters such as K mS or a. Interestingly enough, correlation is absent when a is set to 0, for the propensity of any recruiting reaction (defined as k rec ) is null (See Eq. 13 in the main text). This is also observed for nN col (i.e the furthest neighbor onto which recruitment can occur). When set to 0, no collaborative reactions are considered, and correlation again disappears. Thus, in a similar way to the Processive mechanism with the processive reactions, k model correlation with GD for the Collaborative model can be attributed to the set of recruitment reactions the mechanism incorporates.