TimeTeller: A tool to probe the circadian clock as a multigene dynamical system

Recent studies have established that the circadian clock influences onset, progression and therapeutic outcomes in a number of diseases including cancer and heart diseases. Therefore, there is a need for tools to measure the functional state of the molecular circadian clock and its downstream targets in patients. Moreover, the clock is a multi-dimensional stochastic oscillator and there are few tools for analysing it as a noisy multigene dynamical system. In this paper we consider the methodology behind TimeTeller, a machine learning tool that analyses the clock as a noisy multigene dynamical system and aims to estimate circadian clock function from a single transcriptome by modelling the multi-dimensional state of the clock. We demonstrate its potential for clock systems assessment by applying it to mouse, baboon and human microarray and RNA-seq data and show how to visualise and quantify the global structure of the clock, quantitatively stratify individual transcriptomic samples by clock dysfunction and globally compare clocks across individuals, conditions and tissues thus highlighting its potential relevance for advancing circadian medicine.


Preparatory RNAseq methods.
All RNA-seq datasets were downloaded as .srafiles from NCBIU's SRA database (accessed via the GEO database).SRA files were converted to fastq files using "fasterq-dump" from NCBIU's SRA Toolkit.FASTQ files were aligned to the mouse genome (GRC release m38.84) and converted to SAM files using HISAT2 v2.2.0 (Kim et al., 2019).SAM files were compressed to BAM using Samtools v1. 10 (Heng Li et al., 2009).Transcript read counts were determined from the BAM files and the mouse transcriptome (GRCm38.84.gtffile) using LiBiNorm v2.4, an in-house software package, in HTSeq-count mode (Anders et al., 2015;Dyer et al., 2019).Raw read counts were concatenated for all samples and exported as one text file for all subsequent analysis on a Mac OS.For dataset specific arguments see Table 4.2.Raw count normalisation was carried out in R Studio.The edgeR package was used to normalise the raw counts to log2 counts per million (logCPM) or log2 trimmed mean of M-values (logTMM).Data was also inspected for quality by checking library sizes, replicate plots and PCA plots.
Preparatory microarray methods.
The raw data was downloaded from NCBI GEO in the form of 288 .CEL files.The bioconductor package in R was used to perform fRMA normalisation of protein coding genes, and annotate them with gene names.After fRMA processing, the gene expression values are expressed in log2, and values are in the range 2-14.The distribution of gene expression values for all 288 sets, summarising 35,556 probes was analysed to verify similar distribution amongst samples and to check that the fRMA normalisation was successful.

Batch Effects
In high-throughput studies batch effects occur because measurements are affected by variations in experimental conditions such as laboratory conditions, reagent lots, and personnel differences.We used the R packages BatchQC, ComBat and sva to analyse batch effects.
For the training data we are only concerned with the G = 9 − 16 selected clock-associated genes.We inspected the embedding of the data into G-dimensional space and the subsequent projections into 3-dimensional space using SVD for any sign of batch effects especially with respect to timing of the samples (see e.g.SI Fig S4).Following analysis, the timing, Θ and ML values, and the likelihood ratio curves are inspected for signs of batch effects where this is relevant.A similar analysis is carried out after TimeTeller normalisation, a process that also reduces batch effects.We want to avoid batch correction as this would arbitrarily destroy the statistical distribution of the data (e.g., the covariance information) which is a crucial ingredient of our algorithm.
For the test data there is no question of a batch effect affecting the determination of Θ in a test data set as each sample is treated individually i.e. it is treated as an independent single sample in a way that does not depend upon any other test samples under both TimeTeller normalisation and the initial fRMA normalisation and then the probability model (which only depends upon the training data) is applied to the data from this sample independently.Our analysis also found no evidence of batch effects in this data when projected into d = 3-dimensional space using PCA and when analysed as above.
Using algorithms such as ComBat to remove batch effects between different tissue types has several disadvantages.Firstly, batches will typically not be evenly balanced as the training data will only contain WT/healthy data, whilst test data batches will contain both WT/healthy and perturbed data (e.g., genetically manipulated knock-out (KO) samples).The inappropriate application of batch correction methods to datasets with unbalanced batches is reviewed elsewhere (14).A second disadvantage to the use of batch correction is that it necessitates that the model be re-trained for every new training and test dataset combination post-batch correction, therefore it is difficult to directly compare findings from different test datasets.Ultimately, one of the primary aims is to apply TimeTeller to human biopsies -both healthy and unhealthy.Batch correction of human data would require a priori knowledge of covariates, which may not even be known.In other words, 'real world' data contains many variables which would confound batch correction such as patient age, sex, time of sampling, health status etc.Therefore, a method for time prediction should ideally be applicable without the need for batch correction.

Data availability
All data is currently publically available except for the Bjarnason et al. data used in the paper.This will be made available on publication via the University of Warwick servers and as part of the R package that will be available via GitHub.

Note B. Does the rhythmic expression profile (REP) provide a faithful representation of clock dynamics?
It is necessary to try and ensure that the choice of the the REP and rhythmic properties of the genes included provide a faithful representation of the clock state i.e. so that the mapping from clock state to NREV (g 1 , . . ., g n ) is an embedding.This means that the NREV is a smooth function of the clock state and that each state of the NREV arises from at most one clock state.Conditions for this to be the case are discussed in the area of applied dynamical systems known as embedology (15)(16)(17).This addresses the situation where one has a dynamical system whose state depends upon many variables y 1 , . . ., y N and one seeks a faithful representation of the dynamics in terms of a much reduced number of variables x 1 , . . ., x n and shows that for oscillating non-chaotic systems like ours it is reasonable to find such a representation by the sort of projections we use.In our case the y i represent the expression levels of all the molecular components of the GRN underlying the clock and the x i are the expression levels of the genes in the REP.There is no way to prove that such a REP does fully represent the dynamics but for a tightly coupled dynamical system like the circadian clock, one can validate this to a great extent by checking that increasing the size of the REP does not alter results and, more specifically, checking the singular values of the local projections introduced in Methods as in SI Fig S3 to ensure that the dynamics of the n-dimensional REP can be reduced to an even smaller dimension d as explained in Methods.We deal with it by the use of a REP with significantly more genes than are likely to be needed.Note that we are not suggesting that the genes in the REP are the only genes regulating the clock but are hypothesising that they are enough to provide a full representation of the clock dynamics for the data sets considered.
Note C. Choice of l thresh .
We discuss how the cut-off threshold l thresh is chosen using the example of the two Kinouchi The setting l thresh = −10 is seemingly an acceptable value because then only one of the FED control data samples has a flat region that intersects C(t|T ).However, rather a lot of the test samples have their ML below e −10 so a smaller l thresh of -12 or -14 is preferable.With l thresh = −12 none of the FED samples intersects the curve C(t|T ) and this setting is what we have used in the main paper.D, E and F in SI Fig S15 show the corresponding maximum likelihood values.With this choice of l thresh there are still some test data samples with their ML below e l thresh so that they will have Θ = 1.They are just identified as all having extreme lowM L dysfunction.
The MLs hardly change with changing l thresh the key difference just concerns the change in the number of samples with their ML value at the value of the threshold exp(l thresh ).This change occurs because there are FAST samples where ML is below the cut-off threshold and the number of these below-threshold samples decreases as we decrease l thresh .
The Θ values that are obtained with these thresholds are shown in SI This choice is for the skeletal data.However, when we analyse the Kinouchi et al. liver data using l thresh = −12 we obtain the results shown in SI Fig S8 .These suggest that the substantially worse timing for the FAST data is due to type lowM L dysfunction.However, all the structure in the FAST LRFs except for the central peak has been removed as discussed above and consequently the observed Θ values do not pick this lowM L dysfunction up.A quick inspection of the MLs suggests that l thresh = −12 gives too low a threshold because only two sample have a ML below exp(−5.5)and these are much lower at approximately exp(−10).This suggests using a threshold with l thresh = −6 or −7.In SI Fig S15 we used l thresh = −6 and we see that then the Θ spectrum nicely picks up the lowM L dysfunction.One might ask why we did not just use the ML to stratify things in this case and, indeed, this would give essentially the same answer for this data.However, this is not true when our samples have significant structures like second peaks as, for example seen in the Fang

Note D. Role of gene-to-gene correlations in the likelihood function Θ contains an estimate of a direct measure of clock precision
In what follows g = (g 1 , . . ., g G ) denotes a vector consisting of the possibly normalised levels of G clock-associated genes.We call these normalised rhythmic expression vectors (NREVs).
We consider the probability P (t, g) that g is observed at time t in WT conditions and the associated conditional probability distributions P (g|t) and P (t|g).Then if we want to assess how well the clock is working in an independent sample with expression vector g * ∈ R G we would want to estimate the conditional distribution P (t|g * ).The time T would be estimated to be the value of t maximising P (t|g * ) i.e. the maximum likelihood estimate (MLE).
The corresponding likelihood ratio is given by As is well-known (e.g., Chaps.8 & 9 of ( 18)) the likelihood ratio confidence interval gives a good estimate of the confidence intervals for the MLE when the log likelihood function is approximately quadratic on some scale as is the case with our likelihood functions.If α is the required sensitivity (i.e. 1 minus the confidence level) then the confidence interval is given by the set of times t that satisfy We can relate this to Θ (see Note S2) when the likelihood curve only has a single peak (or, more generally, only twice meets the curve C in Note S2).Since the likelihood ratio curve is quadratic near its maximum we deduce that the contribution to Θ from the region around the maximum is where C is the length of the confidence interval and η is the parameter in Methods.Since the confidence intervals define the precision of the clock we see that, in this case, Θ is a direct measure of clock precision.
P (t|g) is estimated using P (g|t) To measure P (t|g) we will use the fact that, by Bayes theorem, so far as dependence upon t is concerned, provided we assume that P (t) is reasonably independent of t.Therefore, we can use P (g|t)/P (g|T ) to estimate Λ(t) = P (t|g)/P (T |g).Consequently, TimeTeller tries to estimate P (g|t) directly from WT data.For the estimation it is assumed that P (g|t) is multivariate normal.

Θ depends crucially on the covariance structure of P (g|t)
To see what Θ depends upon consider the case where P (t|g * ) is relatively sharply peaked around T .Expanding around this estimate, the distribution is approximately Gaussian with and the variance σ t is given by where Σ is the covariance matrix of P (g|t) and δg is the derivative with respect to t of the mean of P (g|t) at g = g * , t = T .Indeed, if we drop the tightness assumption we can use the Cramer-Rao theorem to deduce that the term on the righthand side is a lower bound for the variance because this term is the dominant part of the Fisher Information matrix of P (g|t) with respect to t.Therefore, we see that it is crucial in estimating our clock dysfunction metric that we take account of the covariance structure of P (g|t).

Note E. Singular Value Decomposition and projections
where U [d] is the transpose of the matrix whose d columns are U 1 , . . ., U d .This is optimal in the sense that it minimises the mean values of the L 2 norms of the differences v i − v i,d .

Note F. Precision assessment without time stamps: Feng et al. and Cadanas et al.
In this section we explain how we calculate an upper bound on the variance of P (T |g) from data where T is the TimeTeller estimated time and g is the gene expression vector or REV.This approach can also be used to get MAEs or MdAEs for this distribution.P (T |g) gives the distribution of the TimeTeller estimate conditional on the gene expression state g.The methods currently used to quantify the precision of phase estimation algorithms focus on P (T |t) where t is the sample time.This restricts such assessments to data that is time stamped.Moreover, for an individual sample the value T − t is strongly biased by any chronotype for the individual, tissue or condition associated with the sample.
For our human data we saw a range from -3.5h to 2.5h.Therefore, one can argue that it is more natural to instead try to assess the variance of P (T |g) instead.We now describe in more detail how we do this.
We assume that we have a dataset and a determination of T for each data point.This might be the straightforward assessment of T by TimeTeller or it might be some adjusted value of T using TimeTeller such as the use of second peaks as done for the Feng et al. data.In addition, we will have the value of the REV g for each datapoint.We use the Feng et al.We carry out PCA analysis as described in SI Note S6 where we take for the vectors v j the REVs g in our dataset.We then project the REVs onto the first principal component b 1 to get projected values g = g • b 1 .From this we can produce a scatter plot as in Figs.3I &  We calculate the standard deviation (std) of these (overall std) and also check that it does not deviate much from the std found of we restrict to a subinterval of g values with a reasonable number of data values in it (local std).This gives an estimate of the std of the distribution P (T |g) which is an upper bound of the std of P (T |g).
For the Cadenas et al. data the analysis has been restricted to the data between T = 10 and T = 20.
For some datasets it will be necessary to generalise the approach and use more than one principal component in order to more accurately bound the variance.If we have q timepoints, for each gene g and each of the m organs or individuals we take the m q-dimensional timeseries vectors v i and calculate their principal components U g,i and the associated singular values σ g,i as in SI Note S6.To characterise the relative strength of the first principal component we use S 2 g = σ 2 g,1 / k σ 2 g,k and this is our synchronicity score.These time-course normalised genes were ranked for goodness of 24 hr period cosine fit using the p-values from Cosinor analysis.One small change was made to the code, such that the p-value of the zero-amplitude f-test was calculated correctly using the fcdf function rather than the fpdf function.The p-values for the null hypothesis that the gene was not rhythmic were used to as our periodicity score.This was compared for consistency with analysis by JTK using the package at https://github.com/mfcovington/jtk-cycle.A-C.Scatter plot of the periodicity score against the synchronicity score.A.

Time 4am
Ind   Ranking when lthresh = -5 Ranking when lthresh = -12 Proportion with smaller ML For each time the points over the time are for uncorrected timing and the points moved slightly to the right show the corrected timings (i.e. the predicted timing when corrected by the timing displacement for the individual).I.An analysis of how the Θ stratification changes as we lower l thresh from -5 to -12.The scatter plot shows the change in the Θ stratification of samples when l thresh is changed from -5 to -12.While 14 points change their ranking, for almost all the change preserves whether they are high or low.Only two samples change from high to low.The reason we generally choose l thresh = −5 here is because (i) higher values have too many samples in the training data with flat regions significantly intersecting C(t|T ) and (ii) lower values are too far below the minimum ML of the test data.Visualisation and quantitative analysis of Arntl and Cr1/Cry2 knockouts.Timing displacement for condition (h)   PCPs for a subset of the geing genes from (7).The Zhang et al. data was used for training, timecoursematched normalisation was used for test data and l thresh = -8.This compares the mean absolute timing errors for the eight tissues used in the microarray training data from Zhang et al. .The top row are for ZeiZeiger and are extracted from Fig. S3 of (21).The 2nd and 3rd row are for TimeTeller using timecourse normalisation and the timing errors in the 3rd row have been corrected using the timing deviations for the tissues.
et al. datasets, one for skeletal muscle and the other for and liver.It is also useful to consider SI Fig S6I which looks at how Θ changes as l thresh is reduced from -5 to -12 for the Bjarnason et al. human data.SI Fig S15 concerns the Kinouchi et al. skeletal muscle data.The first step is to get an estimate of the the values of ML.From Fig S4 Note we see the median of these for FAST samples from this data is a bit below 10 −5 which is approximately exp(−11.5) which suggests a value of -10 or -12 for l thresh .The three columns in SI Fig S15 Fig O. correspond to respectively l thresh = −10, −12 and −14.
Fig S15 The way that the values change is shown in SI Fig S15 Fig. Fig O.J,K.We see that when we change from -12 to -14 there is hardly any change in the ordering of the Θ values and hence the stratification, and this further justifies a choice of l thresh = −12.
data vectors and M is the N × n matrix whose columns are these vectors v i .Let M = U DV * be the SVD of M ( * denoting transpose).Singular Value Decomposition (SVD) gives a decomposition of any N × n matrix such as M into a product of the form M = U DV * where U is a N × n column-orthonormal matrix (U U * = I N and U * U = I n ), V is a n × n orthonormal matrix and D = diag(σ 1 , . . ., σ n ) is a diagonal matrix.The columns U i of U are the principal components (PCs) and the ordered elements σ 1 ≥ • • • ≥ σ n are the singular values.The projection of a vector v i onto the first d PCs is given by (12) data and that discussed in Cadenas et al. (13) as examples (SI Fig S16A).
SI Fig S16B where the projected value g of g is plotted against the corresponding value of T .We then use kernel based smoothing or related methods to obtain a smooth curve T = m(g) giving the mean of the T values as a function of g (Figs.3I & SI Fig S16B) For the Feng et al. data the kernels used are normal with variance 0.5.Then for each data point (with timing T i and REV g i ) we calculate the quantity T i − m( gi )).These are the timing deviations.The resulting distributions are shown in (Figs.3J & part C of SI Fig S16).

Fig A.
Fig A. Effect of time course normalisation Fig A. Effect of time course normalisation

Fig
Fig B. Analysis of synchronicity and rhythmicity Analysis of Hughes et al. liver data (19) when l thresh = −7.Timecourse normalisation is used for both training and test data.A. Visualisation of the Hughes et al. liver data plotted against the curve given by the means of P (g|t) for the Zhang et al.RNA-seq training data.B. Legend.Colors are consistent across all plots.C. Plot of the Θ values against the estimated time T for each test sample.The vertical lines show the true time with colours indicating the sampling time.D. Predicted T times plotted against actual times.E. Boxplot of the maximum likelihoods.F. Boxplot of the Θ values.G. Boxplot of the signed errors.H. Centred likelihood curves for each sample.

Fig D .
Fig D. Crossing platforms on Zhang et al. data Fig E. Local projections for human data.

Fig
Fig F. Variance explained by each PC.

Fig 4 A.
Fig G. Zhang et al. microarray data

Fig H .
Fig H. Analysis of Kinouchi et al. liver data.

Fig
Fig I. Bjarnason et al. data The likelihood curves for the leave-one-out analysis of the Bjarnason et al. data.The sample time is shown above each subplot.B-I.Leave-one-individual-out analysis of Bjarnason et al. training data.Intergene normalisation is used with l thresh = −5.B. Boxplots showing the distribution of the log maximum likelihood values log ML by individual.C. CDF of the log ML values.D. Centred likelihood ratio functions for all individuals at all timepoints.E. Boxplots showing the distribution of Θ values by individual.F. CDF of the Θ values.G. Boxplots showing the distribution of the signed errors for each individual.The means increase from right to left.H. Analysis of the timing for the Bjarnason et al. human microarray data with each data point assigned the color corresponding to the individual.

Fig J. Θ
Fig J. Θ values for Mure et al. data

Row 1 :
Weger et al.ArntlKO (3); row 2: Yeung et al.ArntlKO (20); row 3: Weger et al.Cr1/Cry2KO (3).All shown p-values are from the Wilcoxon Rank-Sum test.A, B & C. The visualisation of the data for each dataset compared to the training data.In each case the clustering of the KO data about a particular timepoint is evident.D, E & F. Plots of the Θ value against the estimated time T .The vertical lines show the true time with colours indicating the sampling time as shown in the various legends.These plots show the clustering of the KO clocks around a specific time of day.

Fig L.F
Fig L. Boyle et al. data & Feng et al. data.

Fig M .
Fig M. PCP plots for Bjarnason et al. data

Fig N .
Fig N. Ageing genes identified in Acosta-Rodríguez et al.

Gnmt
Timing displacement for condition (h) Timing displacement for condition (h)

Fig O. Choice of l thresh .
Fig O. Choice of l thresh .

Fig P .
Fig P. Precision assessment without time stamps: Cadanas et al.
Swarm chart showing the TimeTeller timings T of the Cadenas et al. data.B. A scatter plot of the projection g of the data REVs g onto the first PC (vertical axis) against the TimeTeller timing T .The smooth red curve approximates the means of P (T |g).C. The distribution of the deviations i.e. the horizontal difference between each data point (T, g) and the mean of P (T |g).The standard deviation of this distribution is shown.
et al. , Boyle et al. and Feng et al. datasets and in other cancer datasets that we have studied.for these Θ integrates the different types of dysfunction.
Zhang et al. (2014) microarray dataset.B.Zhang et al. (2014) RNA-seq dataset C. Bjarnason et al. microarray dataset.Fig C. Hughes et al. data (19) Cumulative percentage of variance explained by each of the principal components found by SVD for the Zhang et al. data.Data for day 1 and day 2 (e.g., CT22 and CT46) is combined to build a 24 hr model.3 dimensions are sufficient to explain more than 90% of the variation in the dataset for each local projection.Such a situation follows when the eigenvalues of the covariance matrices of the P (g|t i ) decay rapidly.

Table A .
Use of normalisations.This shows where the different normalisations are used in the data presented.Where test data normalisation is given for a training dataset this refers to how the test data comes from a leave one out analysis.

Table B .
Table Timing comparison for Zhang et al. microarray data

Table C .
Mure et al. data Mure et al. data The p-value and r 2 values for the linear regression of the gene phase of each gene shown in the plots against the timing displacements of the the 33 tissues for the Mure et al. data.TimeTeller has been trained on the central tissues only and l thresh = -12.