Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

LimoRhyde2: Genomic analysis of biological rhythms based on effect sizes

  • Dora Obodo ,

    Contributed equally to this work with: Dora Obodo, Elliot H. Outland

    Roles Conceptualization, Data curation, Formal analysis, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America, Program in Chemical and Physical Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America

  • Elliot H. Outland ,

    Contributed equally to this work with: Dora Obodo, Elliot H. Outland

    Roles Conceptualization, Data curation, Formal analysis, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America

  • Jacob J. Hughey

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    jakejhughey@gmail.com

    Affiliations Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America, Program in Chemical and Physical Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America, Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America

Abstract

Genome-scale data have revealed daily rhythms in various species and tissues. However, current methods to assess rhythmicity largely restrict their focus to quantifying statistical significance, which may not reflect biological relevance. To address this limitation, we developed a method called LimoRhyde2 (the successor to our method LimoRhyde), which focuses instead on rhythm-related effect sizes and their uncertainty. For each genomic feature, LimoRhyde2 fits a curve using a series of linear models based on periodic splines, moderates the fits using an Empirical Bayes approach called multivariate adaptive shrinkage (Mash), then uses the moderated fits to calculate rhythm statistics such as peak-to-trough amplitude. The periodic splines capture non-sinusoidal rhythmicity, while Mash uses patterns in the data to account for different fits having different levels of noise. To demonstrate LimoRhyde2’s utility, we applied it to multiple circadian transcriptome datasets. Overall, LimoRhyde2 prioritized genes having high-amplitude rhythms in expression, whereas a prior method (BooteJTK) prioritized “statistically significant” genes whose amplitudes could be relatively small. Thus, quantifying effect sizes using approaches such as LimoRhyde2 has the potential to transform interpretation of genomic data related to biological rhythms.

Introduction

Much of life on Earth shows rhythms that follow the ~24-hour cycle of day and night. To produce these daily rhythms, each organism has a system of cell-autonomous oscillators, or circadian clocks, that senses environmental cues and drives cellular, physiological, and behavioral outputs [1]. In mammals, these clocks “tick” in nearly every tissue [2], although their tissue-specific mechanisms and inter-tissue interactions are only partially understood [36]. To study circadian systems and their relevance to human health, genome-scale approaches are invaluable, e.g., revealing mechanisms of intercellular communication in the circadian response to feeding [7] and highlighting drug targets for circadian medicine [8].

Nonetheless, current methods for analyzing rhythms in genome-scale data have multiple limitations. Some methods require that timepoints be equally spaced, or assume that rhythms are sinusoidal. Most importantly, almost all current methods—including our method LimoRhyde [9]—focus on hypothesis testing (i.e., p-values and statistical significance), raising at least two issues [10, 11]. First, the null hypothesis, e.g., that a gene has a log fold-change of exactly zero, is seldom true. Rather, observed biological effects typically vary from small to large. Second, statistical significance does not necessarily imply that a result is biologically relevant, since the p-value depends on both the estimated effect size and its uncertainty.

An alternative to calculating p-values is to estimate effect sizes for rhythmic properties (e.g., amplitude and phase) directly. The broader field of genomics has developed multiple methods for estimating effects (e.g., log fold-change) [1214], which has become an important part of differential expression analysis. Previous work has applied effect size estimation to biological rhythms [15], but not to genome-scale data, while other work has incorporated confidence intervals, but only using cosinor regression on a small set of clock genes [16].

Therefore, we developed LimoRhyde2, a new approach to quantify rhythmicity in genomic data. LimoRhyde2 integrates and builds on state-of-the-art tools and practices to rigorously analyze data from genomic experiments [9, 17, 18], capture non-sinusoidal rhythms [19], and accurately estimate effect sizes [14, 20]. Whereas prior methods to analyze rhythmic data seek to answer the question “Is there an effect?”, LimoRhyde2 seeks to answer the often more relevant question “How strong is the effect?”. To illustrate LimoRhyde2’s utility, we applied the method to multiple circadian transcriptome datasets, comparing its output to that of a prior method. Our findings suggest that LimoRhyde2 can enable new insights into biological rhythms and circadian systems.

Methods

The LimoRhyde2 R package is available at https://limorhyde2.hugheylab.org. Code, data, and results for this study are available on Figshare (https://doi.org/10.6084/m9.figshare.22001519).

LimoRhyde2 algorithm

Fit linear models.

Similarly to LimoRhyde, LimoRhyde2 starts by fitting a linear model to the measurements of each genomic feature in the dataset (e.g., expression of each gene). By default in LimoRhyde2, the model terms for time (e.g., zeitgeber or circadian time) are based on a periodic cubic spline with three internal knots. Alternatively, the terms for time can be based on sine and cosine components (i.e., cosinor, the default in LimoRhyde). The model can include additional terms for covariates, e.g., to account for batch effects, but assumes all samples come from the same condition. Thus, the model could be Where E(yg,i) is the expected (log-transformed) measurement for feature g in sample i, βg,j are the unknown coefficients for feature g, n is the number of spline knots, Bj are the periodic spline basis functions with period τ, , and ti is the time for sample i. LimoRhyde2 fits the models, i.e., estimates the coefficients, using limma-trend, limma-voom [25, 26], or DESeq2 [12], all state-of-the-art approaches for analyzing genomic data.

In initial testing we observed that the fitted curves of the periodic spline sometimes varied noticeably depending on the locations of the spline knots, particularly for more rhythmic genes. To avoid this behavior and make the fits more robust if the linear model is based on periodic splines (not cosinor), LimoRhyde2 repeats the above procedure multiple times, fitting a series of models for each feature such that the locations of the knots in each model are shifted by a different amount. For m shifted models (default 3), the value of the shift dk for model k is set to

The overall raw fit fg,k(t,…) for feature g is calculated as where fg,k(t,…) indicates the expected measurement of feature g according to model k, as a function of time t and any covariates.

Moderate model coefficients.

LimoRhyde2 then moderates the model coefficients to obtain posterior fits using multivariate adaptive shrinkage (Mash) [14], which uses Empirical Bayes to learn patterns of similarity between coefficients and to improve estimates of effect sizes. LimoRhyde2 runs Mash on the coefficients for time for all shifted models. LimoRhyde2 does not moderate the intercept coefficients, as the relatively large number of samples in typical circadian transcriptome experiments makes these coefficients’ standard errors (and the effect of Mash) quite small. By default, LimoRhyde2 runs Mash with data-driven covariance matrices, computed based on principal component analysis of strong signals in the data, with the number of principal components set to the number of spline knots. Given the raw estimates and standard errors for each coefficient for each feature, Mash computes corresponding posterior distributions, including posterior means and standard deviations. Mash’s approach to estimating posteriors takes the place of the usual multiple testing adjustment, e.g., estimation of false discovery rates.

Calculate rhythm statistics.

LimoRhyde2 then uses the moderated coefficients (or optionally, the raw coefficients) to calculate the following rhythm statistics, i.e., properties of the fitted curve with respect to time between 0 and τ of each feature:

  • mesor (mean value)
  • peak or maximum value
  • peak phase (time at which the peak value occurs)
  • trough or minimum value
  • trough phase (time at which the trough value occurs)
  • peak-to-trough amplitude (peak value minus trough value)
  • root mean square (RMS) amplitude, calculated as , where

Quantify uncertainty.

To quantify uncertainty in the fits, LimoRhyde2 can draw samples from the posterior distributions computed by Mash. For each posterior sample, which corresponds to a set of possible model coefficients for each feature, LimoRhyde2 can calculate the expected measurement of each feature as a function of time (and possibly any covariates), as well as the corresponding rhythm statistics. LimoRhyde2 then uses the resulting posterior distributions to calculate quantities such as the 90% (equal-tailed or highest-density) credible interval, the Bayesian analogue of a confidence interval.

Because amplitude estimates are non-negative, a credible interval based only on the posterior samples’ amplitudes would nearly always be strictly positive (implying that that feature is rhythmic). This would preclude the credible interval from crossing zero, even if the rhythm were quite weak and the posterior samples’ phases were highly variable. Thus, LimoRhyde2 constructs credible intervals for peak-to-trough amplitude and RMS amplitude by first changing the sign (from positive to negative) of amplitudes for posterior samples whose peak phase is greater than away in either direction from the circular mean peak phase (weighted by amplitude). In this way, the credible interval for a rhythm whose amplitude is highly uncertain can span zero.

Processing circadian transcriptome data from mice

For microarray data from mouse lung (GSE59396), we used the seeker R package [27] to download the sample metadata and processed (Illumina) expression data from NCBI GEO, map probes to Entrez Gene IDs [28], and return log2-transformed expression values. For RNA-seq data from mouse liver (GSE67305) and suprachiasmatic nucleus (SCN) (GSE72095), we used seeker to download the sample metadata and to download and process the raw reads. We processed the data using Trim Galore for adapter and quality trimming [29], FastQC [30] and MultiQC [31] for quality control, and salmon [32] and tximport [33] for quantifying gene-level counts and abundances based on Ensembl Gene IDs. We obtained the transcriptome index for salmon using refgenie [34]. Based on the plotMDS function from the limma package, we removed from analysis one extreme outlier sample from GSE59396. For GSE67305 and GSE72095, we kept only those genes having counts per million (CPM) ≥ 0.5 in at least 75% of samples (irrespective of timepoint). To avoid unrealistically low log-transformed CPM values and artificially inflated effect size estimates, for each sample-gene combination that had zero counts, we impute the counts as the minimum of the non-zero counts across all samples for that gene.

Quantifying rhythmicity using LimoRhyde2

We ran LimoRhyde2 using three knots (or using a cosinor model where noted), a period of 24 h, and either limma-trend (for GSE59396) or limma-voom (for GSE67305 and GSE72095). Where applicable, we calculated 90% equal-tailed credible intervals based on 200 posterior samples.

Detecting rhythmicity using BooteJTK

To compare performance of a prior method for detecting rhythmicity in genomic data, we used BooteJTK (source code from https://github.com/alanlhutchison/BooteJTK). BooteJTK performs hypothesis testing for rank-order correlation between a feature’s time series and a set of reference waveforms, using parametric bootstrapping to account for measurement uncertainty. We ran BooteJTK using the default settings. In particular, we generated 25 bootstrap resamples of each gene’s time series, used a period of 24 h, treated samples collected 24 h apart as replicates, and searched for phases and asymmetries at 2-h intervals from 0 to 22 h. For each dataset, we passed BooteJTK the log-transformed microarray expression values or the log2(cpm + 1) values for the same set of genes that we passed to LimoRhyde2. We adjusted the resulting p-values using the Benjamini-Hochberg (BH) method to control the false discovery rate [35]. To quantify agreement between LimoRhyde2 and BooteJTK in ranking rhythmic genes, we calculated Cohen’s kappa using the irr R package. To quantify the circular correlation of the two methods’ phase estimates in each dataset, we used the circ.cor2 function of the Directional R package, and included only those genes having a BooteJTK adjusted p-value ≤ 0.2 and a LimoRhyde2 amplitude in the upper 10 percent.

Results

To demonstrate how LimoRhyde2 quantifies rhythmicity, we applied it to transcriptome datasets from a range of tissues, experimental designs, and measurement techniques [3638] (Table 1). We used LimoRhyde2 to fit periodic spline-based linear models for each gene (raw fits), moderate the raw fits using Mash (producing posterior fits), and calculate each gene’s rhythm statistics based on the raw and posterior fits (Fig 1). As expected, the posterior fits tended to have lower rhythm amplitudes compared to the raw fits (Fig 2), and higher standard errors in the raw fits led to greater amplitude reduction in the posterior fits for many genes (S1 Fig). In addition, the periodic splines allowed LimoRhyde2 to fit rhythms that appeared non-sinusoidal, avoiding the occasional poor fit of the cosinor model (S2 Fig).

thumbnail
Fig 1. Schematic overview of LimoRhyde2’s approach to quantifying rhythmicity in genomic data.

Given a genomic feature (row) by sample (column) matrix of measurements, LimoRhyde2 fits a curve (dashed orange line) based on periodic splines describing how each feature’s measurements change as a function of time. To adjust for noise and uncertainty in the fits, LimoRhyde2 uses Mash to moderate the model coefficients, yielding a posterior fit (solid green line) for each feature. Using the posterior fits and their distributions, LimoRhyde2 calculates rhythm statistics and credible intervals.

https://doi.org/10.1371/journal.pone.0292089.g001

thumbnail
Fig 2. LimoRhyde2 uses raw fits and their standard errors to obtain posterior fits.

(A) Scatterplots of posterior peak-to-trough amplitude vs. raw peak-to-trough amplitude for transcriptome data from mouse liver (GSE67305), lung (GSE59396), and SCN (GSE72095). Points represent genes, color represents log2 mean standard error of the gene’s raw fit. Dashed lines indicate y = x. (B) Time-courses of expression of genes (sorted by posterior amplitude) labeled in (A) in the respective tissue. Curves represent fits calculated by LimoRhyde2. Points represent samples.

https://doi.org/10.1371/journal.pone.0292089.g002

thumbnail
Table 1. Characteristics of the three circadian transcriptome datasets used for validation.

https://doi.org/10.1371/journal.pone.0292089.t001

To compare LimoRhyde2 to a prior method for detecting rhythmicity, we analyzed the same data using BooteJTK [21], a refinement of the popular JTK_CYCLE [39] that accounts for measurement uncertainty. Like other prior methods, BooteJTK calculates a p-value (an estimate of the probability that a given feature is not rhythmic) for each genomic feature, which can then be used for ranking. As LimoRhyde2 does not calculate p-values, but instead estimates a gene’s rhythm, a convenient way to use its output to rank genomic features is by posterior amplitude. Importantly, the raw fits do not account for standard error and are therefore unreliable indicators of true effect size.

Across the three datasets, we found the adjusted p-values from BooteJTK were only modestly correlated with the amplitudes from LimoRhyde2 (Fig 3A; mean Spearman correlation 0.72). Whereas BooteJTK prioritized monotonic rhythms with high signal-to-noise but perhaps relatively low amplitude, LimoRhyde2 prioritized high-amplitude rhythms of various shapes (Fig 3B). Accordingly, the two methods showed relatively weak agreement in the top-ranked genes in each tissue (Fig 3C). In contrast, the two methods’ phase estimates were highly correlated (mean circular correlation 0.86). The two methods differed markedly in runtime: 2 minutes for LimoRhyde2 and 73 minutes for BooteJTK (mean per study, LimoRhyde2 run in parallel on 6 cores, BooteJTK does not run in parallel).

thumbnail
Fig 3. LimoRhyde2 prioritizes high-amplitude rhythms, compared to BooteJTK.

(A) Scatterplots of -log10 adjusted p-value calculated by BooteJTK vs. posterior peak-to-trough amplitude calculated by LimoRhyde2 for each tissue (indicated at top). Points represent genes. Genes towards the top are prioritized by BooteJTK, genes to the right are prioritized by LimoRhyde2. (B) Time-courses of expression of genes (indicated at top) labeled in (A) in the respective tissue (indicated at right). Points represent samples. Colors represent relative ranking of the genes according to the two methods. Curves represent posterior fits calculated by LimoRhyde2. Each gene’s expression is centered at zero to highlight differences in amplitude. (C) Interrater agreement, as quantified by Cohen’s κ, of top-ranked genes based on LimoRhyde2 posterior amplitude and BooteJTK -log10 adjusted p-value. Shape and color represent tissue.

https://doi.org/10.1371/journal.pone.0292089.g003

As a typical use case, we further analyzed the posterior fits and statistics from LimoRhyde2 for each dataset. Here, we also used LimoRhyde2’s ability to sample from the posterior distributions to quantify uncertainty as 90% credible intervals (Fig 4). As expected, a noisy fit, such as that of Nr1d1 in the SCN, led to a credible interval for amplitude that spanned zero (Fig 4A–4C). Consistent with prior work [40], most clock genes showed high amplitudes in each tissue (S3 Fig), although some of the highest-amplitude genes were tissue-specific (Fig 4D). Contrary to previous findings [41], the highest-amplitude genes tended to be moderately expressed, not the most highly expressed (S4A Fig). Furthermore, among the top 25% of genes by amplitude, the joint distribution of amplitude and phase, as well as the marginal distribution of phase, differed widely by tissue (S4 Fig). Overall, these results illustrate the value of LimoRhyde2’s approach to quantifying rhythmicity.

thumbnail
Fig 4. LimoRhyde2 quantifies uncertainty in rhythmicity using credible intervals.

(A) Time-courses of expression for selected genes (indicated by color and at top) in the SCN. Points represent samples. Curves represent fits from 200 draws from the posterior distributions. (B) Time-courses of expression, where lines represent posterior means and ribbons represent 90% credible intervals calculated from the posterior draws. (C) Density plots of peak-to-trough amplitude based on the posterior draws. Dashed line indicates 0 amplitude. Shaded regions represent lower and upper bounds of the 90% credible intervals. Vertical colored lines represent posterior means of amplitude. Horizontal colored lines represent 90% credible intervals. (D) Amplitudes and corresponding 90% credible intervals for the top 20 genes ranked by amplitude in each tissue (indicated at top). Color represents peak phase.

https://doi.org/10.1371/journal.pone.0292089.g004

Discussion

Genomic data have yielded valuable insights into circadian systems and their relevance to human health, but previous methods for genomic analysis of biological rhythms have multiple limitations, perhaps the most severe being an overreliance on p-values and statistical significance.

LimoRhyde2’s posterior estimates—fitted curves, resulting statistics, and credible intervals—account for uncertainty in the raw estimates and for different patterns of rhythmicity. This aspect of LimoRhyde2 is enabled by Mash [14], which uses shrinkage to share information among genomic features. This shrinkage of the coefficients and their standard errors is distinct from that applied by limma to residual variances, which in practice have little effect given the relatively large sample sizes of circadian experiments. Such sharing by Mash may reduce goodness of fit for any one genomic feature, but is a well-validated strategy to prevent overfitting and improve estimates [14, 20], while avoiding the danger that “statistically significant” results can imply overestimated effect sizes [42]. Furthermore, LimoRhyde2’s approach bypasses the need to consider both raw amplitude and p-value, which would require another arbitrary cutoff [43, 44]. Thus, in LimoRhyde2, genome-scale data are neither a burden (for multiple testing) nor a curse (of dimensionality), but rather, an advantage.

Traditionally, efforts to understand the core circadian clock in various species have focused less on rhythm amplitude than on period and phase. However, the goal of genomic studies is often not to understand the core clock itself, but rather to determine how the clock and daily rhythms influence physiology. In this respect, amplitude may be as relevant as period and phase. For example, a protein whose mRNA shows a rhythm amplitude of 3 log2CPM may be a more promising target for circadian medicine than one whose mRNA shows an amplitude of only 0.3.

Given the distinct goals of LimoRhyde2 compared to previous methods (Table 2), the relevant differences between the methods’ output are not quantifiable in terms related to binary classification (precision, recall, false positive, etc.). Therefore, we opted to analyze real circadian transcriptome data rather than simulated data, whose generation requires many simplifying assumptions. Although the true effect sizes in these data are unknown, our results indicate that LimoRhyde2 efficiently prioritizes large effects that have functional significance in the circadian system yet would have been underappreciated by methods dependent on p-values and statistical significance.

thumbnail
Table 2. Comparison of several methods for genome-scale rhythmicity analysis.

https://doi.org/10.1371/journal.pone.0292089.t002

For example, among the genes ranked considerably higher by LimoRyde2 (amplitude) than by BooteJTK in the liver were Rgs16, Dhrs9, and Mt2. Rgs16 regulates daily rhythms of G-protein signaling in the SCN [45] and substrate oxidation in hepatocytes [46]. Dhrs9 belongs to a set of genes whose expression forms a robust biomarker of internal circadian time from human blood [47]. Mt2 (metallothionein 2) was previously shown in a targeted study to have a dramatic diurnal rhythm [48]. Among genes ranked highly by LimoRhyde2 in the lung were Fkbp5 and Adamts4. The former encodes a negative-feedback regulator of glucocorticoid signaling [49], which plays an important role in synchronizing daily rhythms across tissues [50], while the latter is a clock-controlled gene in mouse cartilage [51] and human corneal endothelial cells [52]. Among genes ranked highly in the SCN were Nfkbib, which encodes an inhibitor of NF-κB signaling and whose expression is regulated in microglia by the clock gene Nr1d1 [53], and Id1, which may play a role in photic entrainment of the circadian system [54].

Despite its advantages, LimoRhyde2 still has limitations and opportunities for future improvements. First, we have only validated LimoRhyde2 for quantifying rhythmicity within a single condition, not for quantifying differences in rhythmicity between conditions. Second, LimoRhyde2 assumes that the rhythms have a user-specified period shared by all genomic features. Although it is possible to run LimoRhyde2 multiple times on the same dataset, varying the period each time, we recommend instead using a single period and allowing the periodic spline to capture non-monotonic (and potentially ultradian) rhythms. As with other methods, the ability to reliably detect and quantify ultradian rhythms will depend on the sampling interval and the signal-to-noise ratio. Third, LimoRhyde2 assumes that each feature’s rhythm is fixed. Consequently, LimoRhyde2 does not model amplitude decay [55], although it can model time-dependent trends in mesor.

Just as prior methods cannot determine what level of adjusted p-value qualifies as “significant” (the conventional level of 0.05 being arbitrary), LimoRhyde2 cannot determine what magnitude of a given rhythm statistic is biologically meaningful. Such values likely vary from one gene to another anyway, so it remains possible that LimoRhyde2 could deemphasize biologically meaningful rhythms of low amplitude. When performing follow-up computational analyses, we recommend using the full distribution of a given statistic rather than drawing artificial cutoffs such as “rhythmic” and “arrhythmic”. As a convenient reference for rhythm amplitudes, our results indicate that the core clock genes in wild-type mouse liver have a median peak-to-trough amplitude of ~3.3 (log2CPM in RNA-seq). In addition, LimoRhyde2 cannot determine what range of credible interval is most biologically relevant. In the current analysis, we elected for simplicity and ranked genes only by the point estimates of amplitude. Future work can explore how credible intervals should inform follow-up analyses and experiments. However, no amount of statistical wizardry is likely to overcome low biological replicability [56].

By directly estimating biological rhythms and their uncertainty, LimoRhyde2 seeks to shift the focus of an analysis from detecting statistical significance to interpreting biological relevance. Although we have so far only validated LimoRhyde2 on bulk transcriptome data, recent work showed that the same state-of-the-art methods that underlie LimoRhyde2 are well-suited to analysis of single-cell RNA-seq data [43]. Thus, LimoRhyde2 may provide a basis for using various genomic techniques to improve our understanding of biological rhythms.

Supporting information

S1 Fig. LimoRhyde2 moderates amplitude based on standard errors of genes.

Scatterplots of difference between raw and posterior peak-to-trough amplitude vs log2 mean standard error of the raw fit for genes in each tissue. Points represent genes.

https://doi.org/10.1371/journal.pone.0292089.s001

(TIF)

S2 Fig. LimoRhyde2’s spline-based model is more flexible and tends to give higher amplitude than the cosinor model.

(A) Scatterplots of cosinor posterior peak-to-trough amplitude vs. spline posterior peak-to-trough amplitude for each tissue (indicated at top). Points represent genes. Dashed lines indicate y = x. (B) Time-courses of expression of genes labeled in (A) in the respective tissue (indicated at right). Points represent samples. Curves represent posterior fits for the two models.

https://doi.org/10.1371/journal.pone.0292089.s002

(TIF)

S3 Fig. LimoRhyde2 identifies generally strong rhythms of core clock genes.

Posterior peak-to-trough amplitudes and corresponding 90% credible intervals for core clock genes in each tissue. Points represent genes, color represents peak phase for each gene. Dashed lines indicate 0 amplitude.

https://doi.org/10.1371/journal.pone.0292089.s003

(TIF)

S4 Fig. Distributions of rhythmicity based on LimoRhyde2 posterior statistics.

Scatterplots of (A) peak-to-trough amplitude vs. mesor and (B) peak-to-trough amplitude vs. peak phase for genes in each tissue (indicated at top). Points represent genes. (C) Histograms of peak phase. All plots include only the top 25% of genes based on amplitude.

https://doi.org/10.1371/journal.pone.0292089.s004

(TIF)

Acknowledgments

We thank Layla Aref and Jeffrey Tatro for helpful comments on the manuscript.

References

  1. 1. Patke A, Young MW, Axelrod S. Molecular mechanisms and physiological importance of circadian rhythms. Nat Rev Mol Cell Biol. 2020;21: 67–84. pmid:31768006
  2. 2. Yoo S-H, Yamazaki S, Lowrey PL, Shimomura K, Ko CH, Buhr ED, et al. PERIOD2::LUCIFERASE real-time reporting of circadian dynamics reveals persistent circadian oscillations in mouse peripheral tissues. Proc Natl Acad Sci U S A. 2004;101: 5339–5346. pmid:14963227
  3. 3. Finger A-M, Jäschke S, Del Olmo M, Hurwitz R, Granada AE, Herzel H, et al. Intercellular coupling between peripheral circadian oscillators by TGF-β signaling. Sci Adv. 2021;7. pmid:34301601
  4. 4. Crosby P, Hamnett R, Putker M, Hoyle NP, Reed M, Karam CJ, et al. Insulin/IGF-1 Drives PERIOD Synthesis to Entrain Circadian Rhythms with Feeding Time. Cell. 2019;177: 896–909.e20. pmid:31030999
  5. 5. Sinturel F, Gos P, Petrenko V, Hagedorn C, Kreppel F, Storch K-F, et al. Circadian hepatocyte clocks keep synchrony in the absence of a master pacemaker in the suprachiasmatic nucleus or other extrahepatic clocks. Genes Dev. 2021. pmid:33602874
  6. 6. Qu M, Qu H, Jia Z, Kay SA. HNF4A defines tissue-specific circadian rhythms by beaconing BMAL1::CLOCK chromatin binding and shaping the rhythmic chromatin landscape. Nat Commun. 2021;12: 6350. pmid:34732735
  7. 7. Guan D, Xiong Y, Trinh TM, Xiao Y, Hu W, Jiang C, et al. The hepatocyte clock and feeding control chronophysiology of multiple liver cell types. Science. 2020;369: 1388–1394. pmid:32732282
  8. 8. Ruben MD, Wu G, Smith DF, Schmidt RE, Francey LJ, Lee YY, et al. A database of tissue-specific rhythmically expressed human genes has potential applications in circadian medicine. Sci Transl Med. 2018;10. pmid:30209245
  9. 9. Singer JM, Hughey JJ. LimoRhyde: A Flexible Approach for Differential Analysis of Rhythmic Transcriptome Data. J Biol Rhythms. 2018; 748730418813785. pmid:30472909
  10. 10. Nakagawa S, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev Camb Philos Soc. 2007;82: 591–605. pmid:17944619
  11. 11. Halsey LG. The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biol Lett. 2019;15: 20190174. pmid:31113309
  12. 12. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15: 550. pmid:25516281
  13. 13. Zhu A, Ibrahim JG, Love MI. Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics. 2019;35: 2084–2092. pmid:30395178
  14. 14. Urbut SM, Wang G, Carbonetto P, Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat Genet. 2019;51: 187–195. pmid:30478440
  15. 15. Wang Y, Ke C, Brown MB. Shape-invariant modeling of circadian rhythms with random effects and smoothing spline ANOVA decompositions. Biometrics. 2003;59: 804–812. pmid:14969458
  16. 16. Lachmann G, Ananthasubramaniam B, Wünsch VA, Scherfig L-M, von Haefen C, Knaak C, et al. Circadian rhythms in septic shock patients. Ann Intensive Care. 2021;11: 64. pmid:33900485
  17. 17. Thaben PF, Westermark PO. Differential rhythmicity: detecting altered rhythmicity in biological data. Bioinformatics. 2016;32: 2800–2808. pmid:27207944
  18. 18. Parsons R, Parsons R, Garner N, Oster H, Rawashdeh O. CircaCompare: A method to estimate and statistically support differences in mesor, amplitude, and phase, between circadian rhythms. Bioinformatics. 2019. pmid:31588519
  19. 19. Hughey JJ, Hastie T, Butte AJ. ZeitZeiger: supervised learning for high-dimensional data from an oscillatory system. Nucleic Acids Res. 2016;44: e80. pmid:26819407
  20. 20. Stephens M. False discovery rates: a new deal. Biostatistics. 2017;18: 275–294. pmid:27756721
  21. 21. Hutchison AL, Allada R, Dinner AR. Bootstrapping and Empirical Bayes Methods Improve Rhythm Detection in Sparsely Sampled Data. J Biol Rhythms. 2018;33: 339–349. pmid:30101659
  22. 22. Rubio-Ponce A, Ballesteros I, Quintana JA, Solanas G, Benitah SA, Hidalgo A, et al. Combined statistical modeling enables accurate mining of circadian transcription. NAR Genom Bioinform. 2021;3: lqab031. pmid:33937766
  23. 23. Thaben PF, Westermark PO. Detecting rhythms in time series with RAIN. J Biol Rhythms. 2014;29: 391–400. pmid:25326247
  24. 24. Weger BD, Gobet C, David FPA, Atger F, Martin E, Phillips NE, et al. Systematic analysis of differential rhythmic liver gene expression mediated by the circadian clock and feeding rhythms. Proc Natl Acad Sci U S A. 2021;118. pmid:33452134
  25. 25. Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014;15: R29. pmid:24485249
  26. 26. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43: e47. pmid:25605792
  27. 27. Schoenbachler JL, Hughey JJ. The seeker R package: simplified fetching and processing of transcriptome data. bioRxiv. 2022. p. 2022.08.30.505820. pmid:36389425
  28. 28. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 2005;33: e175. pmid:16284200
  29. 29. Krueger F, James F, Ewels P, Afyounian E, Schuster-Boeckler B. FelixKrueger/TrimGalore: v0.6.7—DOI via Zenodo. 2021.
  30. 30. Andrews S. FastQC: A quality control analysis tool for high throughput sequencing data. Github; https://github.com/s-andrews/FastQC
  31. 31. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32: 3047–3048. pmid:27312411
  32. 32. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14: 417–419. pmid:28263959
  33. 33. Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 2015;4: 1521. pmid:26925227
  34. 34. Stolarczyk M, Reuter VP, Smith JP, Magee NE, Sheffield NC. Refgenie: a reference genome resource manager. Gigascience. 2020;9. pmid:31995185
  35. 35. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Series B Stat Methodol. 1995;57: 289–300. http://www.jstor.org/stable/2346101
  36. 36. Haspel JA, Chettimada S, Shaik RS, Chu J-H, Raby BA, Cernadas M, et al. Circadian rhythm reprogramming during lung inflammation. Nat Commun. 2014;5: 4753. pmid:25208554
  37. 37. Pembroke WG, Babbs A, Davies KE, Ponting CP, Oliver PL. Temporal transcriptomics suggest that twin-peaking genes reset the clock. Elife. 2015;4. pmid:26523393
  38. 38. Janich P, Arpat AB, Castelo-Szekely V, Lopes M, Gatfield D. Ribosome profiling reveals the rhythmic liver translatome and circadian clock regulation by upstream open reading frames. Genome Res. 2015;25: 1848–1859. pmid:26486724
  39. 39. Hughes ME, Hogenesch JB, Kornacker K. JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J Biol Rhythms. 2010;25: 372–380. pmid:20876817
  40. 40. Zhang R, Lahens NF, Ballance HI, Hughes ME, Hogenesch JB. A circadian gene expression atlas in mammals: implications for biology and medicine. Proc Natl Acad Sci U S A. 2014;111: 16219–16224. pmid:25349387
  41. 41. Laloum D, Robinson-Rechavi M. Methods detecting rhythmic gene expression are biologically relevant only for strong signal. PLoS Comput Biol. 2020;16: e1007666. pmid:32182235
  42. 42. Gelman A, Carlin J. Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspect Psychol Sci. 2014;9: 641–651. pmid:26186114
  43. 43. Lück S, Westermark PO. Circadian mRNA expression: insights from modeling and transcriptomics. Cell Mol Life Sci. 2016;73: 497–521. pmid:26496725
  44. 44. Pelikan A, Herzel H, Kramer A, Ananthasubramaniam B. Venn diagram analysis overestimates the extent of circadian rhythm reprogramming. FEBS J. 2022;289: 6605–6621. pmid:34189845
  45. 45. Doi M, Ishida A, Miyake A, Sato M, Komatsu R, Yamazaki F, et al. Circadian regulation of intracellular G-protein signalling mediates intercellular synchrony and rhythmicity in the suprachiasmatic nucleus. Nat Commun. 2011;2: 327. pmid:21610730
  46. 46. Bai X, Liao Y, Sun F, Xiao X, Fu S. Diurnal regulation of oxidative phosphorylation restricts hepatocyte proliferation and inflammation. Cell Rep. 2021;36: 109659. pmid:34496251
  47. 47. Wittenbrink N, Ananthasubramaniam B, Münch M, Koller B, Maier B, Weschke C, et al. High-accuracy determination of internal circadian time from a single blood sample. J Clin Invest. 2018. pmid:29953415
  48. 48. Zhang D, Jin T, Xu Y-Q, Lu Y-F, Wu Q, Zhang Y-KJ, et al. Diurnal-and sex-related difference of metallothionein expression in mice. J Circadian Rhythms. 2012;10: 5. pmid:22827964
  49. 49. Wochnik GM, Rüegg J, Abel GA, Schmidt U, Holsboer F, Rein T. FK506-binding Proteins 51 and 52 Differentially Regulate Dynein Interaction and Nuclear Translocation of the Glucocorticoid Receptor in Mammalian Cells*. J Biol Chem. 2005;280: 4609–4616. pmid:15591061
  50. 50. Oster H, Challet E, Ott V, Arvat E, de Kloet ER, Dijk D-J, et al. The Functional and Clinical Significance of the 24-Hour Rhythm of Circulating Glucocorticoids. Endocr Rev. 2017;38: 3–45. pmid:27749086
  51. 51. Gossan N, Zeef L, Hensman J, Hughes A, Bateman JF, Rowley L, et al. The circadian clock in murine chondrocytes regulates genes controlling key aspects of cartilage homeostasis. Arthritis Rheum. 2013;65: 2334–2345. pmid:23896777
  52. 52. Nakai H, Tsuchiya Y, Koike N, Asano T, Ueno M, Umemura Y, et al. Comprehensive Analysis Identified the Circadian Clock and Global Circadian Gene Expression in Human Corneal Endothelial Cells. Invest Ophthalmol Vis Sci. 2022;63: 16. pmid:35579906
  53. 53. Griffin P, Dimitry JM, Sheehan PW, Lananna BV, Guo C, Robinette ML, et al. Circadian clock protein Rev-erbα regulates neuroinflammation. Proc Natl Acad Sci U S A. 2019;116: 5102–5107. pmid:30792350
  54. 54. Duffield GE, Watson NP, Mantani A, Peirson SN, Robles-Murguia M, Loros JJ, et al. A role for Id2 in regulating photic entrainment of the mammalian circadian system. Curr Biol. 2009;19: 297–304. pmid:19217292
  55. 55. De Los Santos H, Hurley JM, Collins EJ, Bennett KP. Circadian Rhythms in Neurospora Exhibit Biologically Relevant Driven and Damped Harmonic Oscillations. ACM BCB. 2017;2017: 455–463. pmid:31844846
  56. 56. Brooks TG, Manjrekar A, Mrcˇela A, Grant GR. Meta-analysis of Diurnal Transcriptomics in Mouse Liver Reveals Low Repeatability of Rhythm Analyses. J Biol Rhythms. 2023; 7487304231179600. pmid:37382061