A Statistical Framework to Identify Deviation from Time Linearity in Epigenetic Aging

In multiple studies DNA methylation has proven to be an accurate biomarker of age. To develop these biomarkers, the methylation of multiple CpG sites is typically linearly combined to predict chronological age. By contrast, in this study we apply the Universal PaceMaker (UPM) model to investigate changes in DNA methylation during aging. The UPM was initially developed to study rate acceleration/deceleration in sequence evolution. Rather than identifying which linear combinations of sites predicts age, the UPM models the rates of change of multiple CpG sites, as well as their starting methylation levels, and estimates the age of each individual to optimize the model fit. We refer to the estimated age as the “epigenetic age”, which is in contrast to the known chronological age of each individual. We construct a statistical framework and devise an algorithm to determine whether a genomic pacemaker is in effect (i.e rates of change vary with age). The decision is made by comparing two competing likelihood based models, the molecular clock (MC) and UPM. For the molecular clock model, we use the known chronological age of each individual and fit the methylation rates at multiple sites, and express the problem as a linear least squares and solve it in polynomial time. For the UPM case, the search space is larger as we are fitting both the epigenetic age of each individual as well as the rates for each site, yet we succeed to reduce the problem to the space of individuals and polynomial in the more significant space—the methylated sites. We first tested our algorithm on simulated data to elucidate the factors affecting the identification of the pacemaker model. We find that, provided with enough data, our algorithm is capable of identifying a pacemaker even when a weak signal is present in the data. Based on these results, we applied our method to DNA methylation data from human blood from individuals of various ages. Although the improvement in variance across sites between the UPM and MC was small, the results suggest that the existence of a pacemaker is highly significant. The PaceMaker results also suggest a decay in the rate of change in DNA methylation with age.


Introduction
DNA methylation is an important component of the epigenetic code that defines and maintains the state of cells [1][2][3]. Mammalian cells contain three DNA methyltransferases that preferentially methylate CpG dinucleotides. These enzymes faithfully maintain cytosine methylation patterns during cell division. However, as cells undergo differentiation, from stem cells to mature cells, the patterns of DNA methylation change substantially, and help define the changing cellular states [4]. The genomic profiles of DNA methylation across multiple cell types have been defined during the past few years using techniques such as bisulfite sequencing and DNA methylation arrays, that allow one to measure the methylation state of many cytosines in the genome [5]. Consequently, it has been shown that DNA methylation also changes as organisms age [6][7][8][9][10][11][12].
The seminal work of Steve Horvath [13] has identified three hundred CpG dinucleotides, whose methylation state can be used to accurately predict the age of an individual. The epigenetic clock is now widely used in aging research and is far more accurate than alternative approaches that rely on the measurement of telomere lengths or gene expression. The Horvath epigenetic clock model uses a linear combination of the methylation status of several hundred sites to predict the age of an individual. It also uses a nonlinear transformation to modify the ages of young individuals (less than 20 years), while leaving the ages of adults untransformed.
Here we try to develop a more general formalism for modeling changes in DNA methylation during aging. To this end, we use the universal pacemaker (UPM or simply pacemaker-PM) of genome evolution [14,15], which was devised in the setting of molecular evolution in order to relax the evolution rate constancy imposed by the molecular clock (MC) hypothesis [16]. Under UPM, the relative evolutionary rates of all genes remain nearly constant (i.e constant pairwise ratio) whereas the absolute rates can change arbitrarily (See Fig 1 for illustration). It was shown on several taxa groups spanning the entire tree of life that the UPM model describes the evolutionary process better than the traditional molecular clock model [14,17,18]. The UPM model relies on a statistical framework encompassing simultaneously all evolving genes in genomes, and across the entire tree of life, therefore making it doubly universal.
Here we propose to adapt the UPM to model changes in DNA methylation during aging, making no a priori assumption about the relationship between chronological and epigenetic time, i.e. linearity in time as asserted by the MC model. The UPM is one degree of freedom more relaxed than MC in the sense that it still requires rate uniformity of a site among all individuals, yet it allows the individual's aging rate to play a role. By relaxing the constraint that epigenetic age is linear with chronological age, we can explore a rich parameter landscape, and identify complex nonlinearities using the UPM formalism. Our goal is not only to develop site specific models of changes in DNA methylation as a population ages, but also to discover the nonlinearities in the rates of change. This richness has its cost in terms of computational intensity. In general, statistical analysis and in particular the approches we pursue here-maximum likelihood (ML) solutions-are computationally intensive [19]. However, although the current setting, methylation modeling, is more complex than the evolutionary model considered in [14] due to an additional array of variables to be optimized, under the MC model we were able to formalize it as a linear least squares, allowing us to obtain a closed form solution in polynomial time. Under the PM model, we show that no closed form solution is achievable. However, through a series of observations, we could reduce the search space significantly to the degree that the heuristic search, done by a fast optimization method, is performed only in the, relatively small, space of individuals. The rest of the search is polynomial is the space of methylation sites, hence enabling us to analyze problems of non-negligible size. Although the focus of this work is on the description of the algorithm, such as the model formulation and the statistics involved, we also demonstrate its performance in a real dataset. We first applied this formalism in a simulation study to discover the effect of the parameters involved and their interplay. Among other things, we show that the scheme is capable of identifying a pacemaker, i.e. a deviation from linearity in time, even when the pacemaker signal is relatively faint, if enough data is provided. Next we analyze a dataset of DNA methylation collected from the blood of humans of different ages. The signal in these data is indeed fairly small, however, the size of the data allows us to confidently infer coordinated, nonlinear changes in methylation. Further analysis shows that the changes in the rates resemble the empirical transformations used in the Horvath model.

Results
Our Results Section contains three parts: A likelihood based scheme to identify an effective PM affecting the methylation sites, a simulation study to demonstrate the performance of this scheme, and results on two human methylation datasets.

The Evolutionary Models
Our basic objects are a set of m individuals and n methylation sites in a genome (or simply sites). Each individual has an age, forming the set T of time periods {t j } corresponding to each individual j's age. Henceforth we will interchangeably refer to individuals with their age. Each individual has a set of sites s i undergoing methylation changes at some characteristic rate r i . Each site s i starts at some methylation start level s 0 i . All individuals have all the sites s i . As r i and s 0 i are characteristic of the site s i , by the model, they are the same in all individuals. The latter fact, links between same sites but across different individuals, but also between different sites within and across individuals by the fact that sites generally maintain the same characteristic rates across the whole population. Henceforth, we will index sites with i and individuals with j. Now, let s i,j measure the methylation level at site s i in individual j after time t j . Hence, under the molecular clock model, we expect: However, in reality we have a noise effect ε i, j that is added and therefore the observed valueŝ ij iŝ Our goal is to find, given the input matrixŜ ¼ ½ŝ i;j , the maximum likelihood (ML) values for the variables r i and s 0 i for 1 i n. For this purpose, we assume a statistical model for ε i,j by assuming that it is normally distributed, ε i,j * N(0, σ 2 ).
In contrast to the MC, in the UPM model we do not just use the given chronological age but estimate the age of each individual. Therefore under the UPM we must find the optimal values of s 0 i , r i , and t j . The solution to this optimization is described in detail below. We note that the deviation between the chronological age and the estimate epigenetic age under the UPM results is an age difference which, when positive, we denote as age acceleration, and when negative as age deceleration.

Identifying Methylation Rate Acceleration/Deceleration
Our first result is a maximum likelihood (ML) scheme to detect a coordinated, or rather genome wide change in methylation rate under UPM. We note that such a change is distinct from a single, uncoordinated, site change. We start with an overview of the approach.
Two competitive explanations (i.e. likelihood functions) are developed, in which one (MC) is restricted to linearity with time by estimating a constant rate of methylation at each site, and using the given chronological age of each individual. The competing, relaxed, model (UPM) has no such restriction, and we estimate an "epigenetic" age for each individual. By definition, the ML solution under the relaxed model cannot be worse than the constrained model. Therefore, in order to compare the approaches, we use the likelihood ratio test that penalizes the UPM model proportionally to the loss of parameters in the MC model. In the Methods section we prove that under our model, the ML solution is equivalent to minimizing a quantity denoted as the residual sum of squares, RSS. The computational question of how we solve the problem, i.e. minimizing the RSS, under the two models is unique to this framework and hence we describe it here in the Results section below.
Minimizing RSS. In the statistical framework defined in the Methods section, we showed that minimizing RSS is equivalent to maximizing the likelihood function L. In particular the ML RSS, d RSS, is used for computing χ 2 . We now show how we minimize RSS. RSS is a polynomial over the variables r i and s 0 i where every monomial in the RSS stands for an entry in our input matrixŜ, that isŝ i;j , and is of the form: where in our case the inputs are theŝ i;j and t j and the variables sought are r i and s 0 i , for every i n (our set of sites).
In order to find the critical points of the RSS, we find the gradient of the RSS, that is the partial derivative of the RSS with respect to every such variable. The critical points are the points in the 2n spaces where all these partial derivatives simultaneously vanish [20]. Finding these points is normally carried out using some numerical method.
In our case however, the special structure of the problem allows us a more efficient solution. A least squares (LS) solution is called linear if the residuals are linear in all unknowns. In this case LS can be formalized in a matrix format which has a closed form solution (given that the column of the matrix are linearly independent). Under this formalization the optimal (ML) solution is given by the vectorb as follows: where X is a matrix over the variable's coefficients in the problem, y is a vector holding the observed values-in our case the entries ofŜ, and the RSS equation can be written such that for every row i in X, Recall that for m subjects, our RSS contains mn components each of which corresponds to an entry inŜ in the formŝ i;j À t j r i À s 0 i whereŝ i;j and t j are input parameters. This leads to the following observation: Observation 0.1 Let X be a mn × 2n matrix whose kth row corresponds to the (i, j) entry in S, the first n variables of β are the r i 's and the second n variables are the s 0 i 's, and the im + j entry in y contains s i,j (see Fig 2). Then, if we set the k row in X all to zero except for t j in the i'th entry of the first half and 1 in i'th entry of the second half, we obtain the desired system of linear equations (see again illustration for row setting in Fig 2).
Proof: The proof follows trivially. The k component in the RSS that corresponds to the (i, j) entry inŜ (and also to row k in X), is of the form Therefore, by the definition of X, β, and y, the observation follows.
To complement the task, we assign the values obtained inb in Eq 3 for all r i and s Þ i in the RSS and obtain the ML value.
Solving the RSS under the Pacemaker Model. Recall that under the UPM model, we allow sites in an individual to accelerate or decelerate their methylation rate arbitrarily. Sites start at their characteristic rate r i , but in the UPM we no longer have a constant rate for the site s i in all individuals, and at all times. Instead, we have the instantaneous rate r t i;j for site s i in individual j at time τ, where τ is less then t j -the age of individual j. We also use r i,j to denote the average rate of site i at individual j: and we note that this average rate r i,j can be measured (as opposed to the instantaneous rate at the site and individual).
In particular, relaxation of the constant rate property invalidates use of the closed form solution for our problem as in Eq (3), partially since the ordering of rates, or precisely the ratios between them, imposed by the closed form solution is not necessarily the ML solution. The latter implies that we will have to search a very large space of all possible parameter values in order to arrive to the ML solution. The following, seemingly counterintuitive, theorem shows that we can do much better.  The proof of Theorem 0.2 relies on the fundamental property of the UPM model that asserts rate correlation among sites. Hence, while under this model we relax the constant rate requirement, we still require that if a site at an individual changes its rate, then all sites at that individual change their rate by the same proportion. We prove the theorem.
Proof: We first show the following simple observation whose proof is given in S1 Text: Observation 0.3 For two methylation sites s i and s i 0 with characteristic rates r i and r i 0 , let ρ i,i 0 = r i /r i 0 . Then for any individual j and time τ t j holds Observation 0.3 is important as it shows that ρ i,i 0 is independent of any time or individual. We now use the following definition. For a site s i and individual j, let r Ã i;j be the ML value for r i,j , that is, the value r i,j takes under the ML solution to the RSS. We note that since r i,j changes many times through the life of individual j and hence there is no real such r Ã i;j rather r Ã i;j represents the weighted average, or the integral over possible trajectory of r i,j . Also recall that the corresponding (i, j) component in our RSS looks and since r i,j appears only in that component, we could set and then (after derivation) all RSS components vanish. However the following observation (whose proof is deferred to the S1 Text) shows that this may violate the UPM model. Observation 0.4 Setting at every component of the RSS, may violate the constant ratio between rates assumption.
The following lemma is instrumental to our procedure of finding the ML solution. Lemma 0.5 Let r Ã i;j the ML value for r i,j . Also let d Ã i;j ¼ r Ã i;j =r i be the change in proportion from r i to r Ã i;j . Then the ML solution is obtained if r i,j is intact (i.e. remains at its initial value r i ) but the time t j is stretched or shrunk by d Ã i;j . The proof to the lemma is given in the S1 Text. We now clarify two points. First, t j appears in several components while d Ã i;j may be different in every such component (pertaining to different i's). Nevertheless we show in the proof that all these d Ã i;j are the same. Second, from the lemma it may appear as if we know r i , so we can set r i;j ¼ r Ã i . This is incorrect as we explain in the proof. The importance of Lemma 0.5 is that it reduces the search space of the ML solution substantially as we only need to search in the (m dimensional) space of times (T) that is typically smaller than n.
Another important feature of the PM solution, is that once we relax the times t j the optimization is not linear anymore. Here we simply pursued the following straightforward strategy. Assume we restrict a subset of the variables in the problem to their ML values under the global, unrestricted solution. Next, we look for the local, restricted, ML solution, by optimizing the rest of the variables. Then we obtain the same global ML solution as the unrestricted problem.
This completes the proof of Theorem 0.2. In practice, the algorithm proceeded as follows. We performed a heuristic search in the (restricted) space of T. For every value T 0 in that space that was offered by the optimization procedure, we performed the fast analytic LS solution of Eq (3) to obtain the ML values of the rest of the variables, but constrained to the value T 0 . We proceeded this way until the ML point is obtained, i.e. the point under which RSS was minimised. Let T Ã be the point in the m dimensional space corresponding to the ML values of T. By the above, the closed form algebraic solution, as is done for the MC case, will find the ML values for the rest of the variables.
To perform the LS optimization, we used the function fmin_slsqp implemented by the LAPACK software [21], which is found at the scipy.optimize package of Python that minimizes a function using sequential least squares programming.

Simulation Study
In order to test our method we first conducted a simulation study as we now describe. The goal was to examine the effect of the various parameters on the performance of the method, i.e. its capability to distinguish between a PM and the MC. Performance was measured by means of the p-value of the likelihood ratio test (LRT). We now describe the study's parameters. Our model is comprised of an m-dimensional vector times T where t j corresponds to the jth individual's age that we draw randomly to obtain variation in individuals' ages. Next we have two ndimensional vectors, rates r and methylation starting position s 0 , where r i and s 0 i correspond to the ith site's methylation rate and methylation starting position respectively. Both vectors were drawn randomly. These are the base parameters used to generate the input matrixŜ. However recall that our goal was to test the sensitivity of our algorithm to distinguish between a PM and a MC. Also recall that by Lemma 0.5, a PM is simply another linear correlation to time periods t 0 j only that these correspond to the PM ticks and each such PM ticks at an arbitrary rate. Therefore, to simulate the PM perturbation of the astronomical clock, we perturbed each t j by some ε j (i.e. multiplied by 1 + ε j ) where ε j $ Nð0; s 2 t Þ. Hence, the constant parameters of the PM model are the (perturbed) times t 0 j and the original r i and s 0 i values. So by our model we have s i;j ¼ s 0 i þ r i t 0 j . Finally, to simulate biological noise, we sampledŝ i;j $ Nð0; s 2 s Þ. Given the matrixŜ and the time vector T, we ran both algorithms on that input and compared the results. The MC model fit the site rates and methylation start levels while adhering to the times in T while the PM model considered only the matrixŜ and disregarded the times in T. Both models returned their RSS's. Since under PM the times T 0 are also inferred, we used LRT to compare between the models with m degrees of freedom which is the size of the vector T 0 . The score of a single run is the p-value of the χ 2 test.
Since that setting is non trivial, we now discuss the parameters and their interpretation. Obviously, the signal to the method comes only if there is any variation in the pacemaker ticks with respect to the chronological clock, since otherwise both the PM procedure and the MC procedure will converge to the same values and will produce the same error (RSS). Therefore our first parameter, the PM variance s 2 t , that determines the size of the deviation of the PM from chronological time, is distinct from other parameters. Indeed we divided the study into two parts in which different values were used and the differences are significant. The second parameter is the variance at each site, or simply the amount of pure noise in the signal. Our experiments show that this is a major factor inhibiting the identification of the PM. The last two parameters are the number of sites that are included and the number of individuals. The results of our simulations are presented in Figs 3 and 4. In all figures, the y axis represents the success rate in terms of the p-value returned from the LRT. The x axis represents the noise s 2 s , the site variance.
We now explain the results. The graphs in Fig 3 correspond to experiments with weaker PM signals, s 2 t ¼ 0:1 . Fig 3(a) corresponds to 50 individuals. The graph contains three curves that correspond to individuals with {50, 70, 100} sites (colors blue, red, and green respectively). That is, each experiment is done over a population of 50 individuals, each with 50 (alternatively 70 or 100) methylation sites. Additionally, each individual is associated with a PM that modifies the methylation rate of that individual. That PM rate distributes, IID at each individual, normally with variance s 2 t ¼ 0:1. The x-value of a point represents the background noise we apply to each site, that also distributes normally and IId at each individual and site, with variance s 2 s . The y-value of a point represents the relative number of times (or success frequency) our scheme described in the Results section, was able to identify the PM (a PM always exists but its signal may disappear due to confounding signals).
Let us focus on the curve in Fig 3(a) that corresponds to 50 sites (blue curve). It is shown that for a small amount of noise, s 2 s 2, reconstruction quality is high but then it starts to diminish with success rate less than 1/2 for s 2 s ! 7. We can also see that this trend is generally true for each curve in the experimental study. We also see that there is an obvious benefit for the inclusion of additional sites (red and green curves in Fig 3(a)) or individuals (Fig 3(b)). Here we can observe that the clear trend of a weak PM and small number of individuals, as depicted in Fig 3(a), is not always maintained due to the high success rate and the stochastic nature of the process. However, that general behavior is still maintained. As can be seen, under this PM signal, the PM is identified with a high rate, (! 85%), even with only 50 individuals (Fig 4(a)) and 50 sites for all levels of noise. With 100 individuals (Fig  4(b)), 100 sites suffice for almost perfect identification.
We conclude this part by noting that for a fairly weak signal of PM and even under quite high levels of noise, our procedure is capable of identifying the deviation of methylation rate from linearity in time. This observation is critical when analyzing real data where we expect that the signal is stronger and noise is weaker. We remark that due to the fairly involved setting with many confounding parameters such as the amount of information (sites, individuals), stochastic processes (PMs, sites), the same behavior as we observed in Figs 3 and 4, can be observed for many other combinations of parameters.

Results on Human Methylation Data
Based on our simulation results, we next tested our approach on DNA methylation data previously reported in [22]. The data was collected using the Illumina 450K DNA methylation array platform.
The resulting data matrix contains about 450,000 CpG sites measured across 657 human individuals. In order to limit ourselves to a manageable size for parameter estimation of our model we had to apply a selection criterion over the sites. We took the 300 sites with the maximum variance where the highest variance was 0.105 and the lowest around 0.0079. These sites are more likely to be relevant for our model, as they have methylation levels that vary across the population. We ran both algorithms on this reduced data. The following results were obtained. The average error per entry inŜ under MC was 0.138. The UPM search algorithm started from 10 random stating points all of them converged to the same ML point-0.135. This is a mild improvement of about 2% indicating that sites are correlated and also there are shifts from linear correlations to chronological time. The χ 2 for these values under LRT is 3517.468. Since we had measurements across 300 individuals and under PM their values were optimized, we had an additional 300 free variables (the "epigenetic" age) in the PM model with respect to MC. Under the χ 2 distribution with degree of freedom 300, in order to achieve a pvalue 0.01, a χ 2 of 360 is required. Therefore the null hypothesis (MC) is rejected outright.
As illustrated, the PM model guarantees an optimal ranking between the rates of sites such that the model likelihood is optimized. However there is one degree of freedom here, allowing us to assign an arbitrary value to one of the rates. This value in turn determines the values of the rest of the variables. By picking one of our ML points we obtain an ML assignment to rates. In order to compare how MC and PM rates behave under the different sites, we did the following. For each of the sites, we calculated the ratio between its MC and PM rates. We sorted the sites according to that value. After removing a few Eq (8) outliers at each side, we plotted this result. Fig 5(a) depicts this result. We note a few facts about this ratio. The majority of the sites (5/6) maintain the same sign (i.e. increasing or decreasing methylation), about half (55%) of these sites decelerate (i.e. ratio 1). Fig 5(b) shows an even more interesting phenomenon that corroborates certain conjectures. The figure depicts the ratio between the chronological times (ages), taken as parameters (i.e. fixed, unoptimized) under the MC model, versus ML times inferred under PM. The x axis is the chronological time of the individual, meaning that ratios are presented from the youngest individual at the left to the oldest at the right. The y axis is the MC/PM age ratio. A conspicuous phenomenon emerging from this figure is the diminishing ratios between times (or equivalently aging) as individual becomes older. Another property arising from that comparison, is that the variance of this measure (MC/PM age ratio) in young ages is substantially larger than in more advanced ages. We comment that this data set of [22] does not contain individuals of very young ages. Therefore we expect even more extreme contrasts in data that does include young individuals, however this is beyond the scope of the current work and is left for further research.

Discussion
In this work we developed an approach to model changes in DNA methylation with age and measure acceleration/deceleration of methylation rates with age. This approach is based on a novel, probabilistic framework where two competing explanations are compared, where one of the explanations is a special, restricted case of the other, and the comparison is made by the likelihood ratio test.
The underlying mechanism in the novel framework is the universal pacemaker that was devised to find correlations among evolving genes in a genome, while relaxing the rate constancy imposed by the traditional molecular clock model. The methylation setting is typically more complex than the genomic evolution setting as it involves more variables, making the procedure and the analysis more computationally demanding. Therefore, we believe we have made here only the first step in this direction. Nevertheless, the results we present, first in the simulation analysis, but especially in the analysis of a human blood dataset with individuals of different ages, mark this approach as promising These results on the human methylation data, although based only on a sample of CpG sites, indicate that the rate of methylation changes tend to diminish with age, suggesting that the use of the PM framework is appropriate in this setting.
We remark that the emphasis in this work is on the mathematical and computational aspects of this approach. These properties, as illustrated also in our simulation study, but also in the algorithmic part of the Method section, are far from being trivial and we believe further investigation will follow. The same also holds for the biological findings we indicate in our real data study. These result are significant, but should be verified on larger data sets. In particular, the finding of diminishing ratios PM/MC should be tested in a population that contains young individuals. Finally, we expect that the model may also be of use when investigating epigenetic aging in other species, and in the future intend to apply this formalism to datasets across species.

Minimizing RSS as Maximum Likelihood Solution
We now show that, under our formulation, the RSS is minimized at the Maximum Likelihood (ML) solution.
Let the residual sum of squares, RSS be defined as follows: The formulation in Eq (10) is called least squares (LS) and is a very common criterion in optimization [23]. Although the fact that RSS is minimised under least squares under a normal distribution, since our formulation is somehow unique, we now show the the following lemma (see detailed proof in S1 Text): Lemma 0.6 Minimizing RSS is equivalent to finding the maximum likelihood solution to our formulation.

Likelihood Ratio Test
The likelihood ratio test (LRT) is a statistical test used to compare the goodness of fit of two competing models, one of which (the null model) is a special case of the other, more general, one. The log of the ratio of the two likelihood scores distributes as a χ 2 statistic and therefore can be used to calculate a p-value. This p-value is used to reject the null model in the conventional manner. Specifically, let Λ = L 0 /L 1 where L 0 and L 1 are the ML values under the restricted and the more general models respectively. Then asymptotically, −2log(Λ) will distribute as χ 2 with degrees of freedom equal the number of parameters that are lost (or fixed) under the restricted model.
In our case, (see Eq (6) in the S1 Text for a detailed explanation), it is easy to see that log LÞ Supporting Information S1 Text. Proof of claims in the paper body. (PDF)

Author Contributions
Conceived and designed the experiments: MP SS.
Performed the experiments: SS.