Skip to main content
Advertisement

< Back to Article

Fig 1.

The nycthemeral transcriptome is the group of genes whose mRNAs have periodic variations with a 24h period, called rhythmic genes. To detect these rhythmic genes, we applied seven methods to time-series datasets that produced different density distribution of p-values.

a) Simplified diagram of the entrainment of nycthemeral gene expression. Environmental cues include the light-dark cycle, food-intake, sleep-wake behavior, social activities, or any other 24h periodic event. b) Density distribution of p-values obtained before (raw) and after the default correction (software) for the seven methods applied to mouse liver data (microarray) sub-categorized in: i. randomized data which represents the null hypothesis; ii. randomized data restricted to the first and fourth quartiles of the median gene expression level, to check for the impact of expression level under the null; iii. the full original dataset; iv. the first and fourth quartiles of the median gene expression level of the original data; and v. a subset of known cycling genes (99 genes from KEGG “circadian entrainment” among which we expect a large proportion of rhythmic mRNA accumulation). The default p-values of ARS, GeneCycle, and LS are uncorrected. Mouse image credit to Anthony Caravaggi (license CC BY-NC-SA 3.0).

More »

Fig 1 Expand

Table 1.

Raw, default, and BH.Q in JTK algorithm.

More »

Table 1 Expand

Fig 2.

Fewer time points per cycle lead to a weaker detection of rhythmic patterns even if the transcriptome profiling quality is better.

a) Bhlhe41, Npas2, and Per1 expression over time from data of the same mouse experiment [2] using two transcriptome profiling techniques: microarray vs RNAseq. The number of time-points with data is 24 for microarray and 8 for RNAseq. b) The restriction of microarray time-series to the same time-points as in the RNAseq series produces similar p-value distributions to those obtained with RNAseq. This supports a major role of the temporal resolution for method results, relative to a minor role for the difference between RNAseq and microarrays. Mouse image credit to Anthony Caravaggi (license CC BY-NC-SA 3.0).

More »

Fig 2 Expand

Fig 3.

Datasets with one replicate per time-point over a unique cycle of 24 hours do not provide enough information to detect rhythmicity.

Methods lose in statistical power for detecting rhythmic patterns in gene expression when the number of 24h cycles decreases, or when the number of time-points sampled decreases. a) Default p-value distributions obtained for ARS, GeneCycle, and empJTK applied to different datasets and sub-categorized in: i. randomized data which represents the null hypothesis; ii. randomized data restricted to the first and fourth quartiles of the median gene expression level, to check for the impact of expression level under the null; iii. the full original dataset; iv. the first and fourth quartiles of the median gene expression level of the original data; and v. a subset of known cycling genes (8 to 99 genes according to species, see Methods). For each dataset, the number of time-points with data and the temporal resolution is illustrated around a 24h clock. For the same number of time-points, performance seems better with two cycles than only one cycle (zebrafish vs baboon). b) The reduction of the number of time-points of the mouse liver microarray dataset shows increasingly weak rhythm detection by ARS, GeneCycle, and empJTK, shown by a flattening of the p-value distribution on the full dataset (red arrow). GeneCycle showed no difference between a few time-points over two cycles or more time-points over a single cycle (black arrow). c) Scatter-plots of p-values obtained before and after down-sampling (every 2h over 48h vs. every 2h over 24h) for the full dataset. Each point is a gene. R is the Pearson correlation; p-value < 2.2e–16 in all cases. After down-sampling, the rhythmic signal is retrieved for the same genes. Images credit: Anthony Caravaggi (mouse), Ian Quigley (zebrafish), wikipedia GNU GPL Muhammad Mahdi Karim (baboon), and Public Domain for other images (from PhyloPic).

More »

Fig 3 Expand

Fig 4.

Methods detect the same first top rhythmic genes, but with inconsistencies in the meaning of their p-values.

Upset diagrams show the number of rhythmic genes called in common by the methods. Each intersection is exclusive, i.e. one gene can appear in only one intersection. (a,b) Upset diagram for mouse liver dataset (microarray) (a) and baboon liver dataset (b) for the p-value thresholds of 0.05 (black) or 0.01 (grey) for calling genes rhythmic. The Venn diagram (a) illustrates the upset diagram with, for instance, 2343 genes called rhythmic by all methods. (c,d) Upset diagram for mouse liver dataset (microarray) for the first 1000 (c) or 6000 (d) genes detected rhythmic for each method. With a smaller number of top rhythmic genes, the overlap between methods is weaker. Images credit: Anthony Caravaggi (mouse) and wikipedia GNU GPL Muhammad Mahdi Karim (baboon).

More »

Fig 4 Expand

Fig 5.

Signal of evolutionary conservation of rhythmic gene expression.

Orthologous genes detected as rhythmic in the same organ of two species have a stronger statistical signal of rhythmicity than those not detected as rhythmic in at least one species. a) Mouse and zebrafish share orthologous genes, some of which are rhythmic in the homologous tissues. b) Method used for ortholog benchmarking, as in panel d: From all mouse genes, only mouse-zebrafish one-to-one orthologs are kept. Considering the liver, these orthologs are separated into two groups: genes for which the ortholog is detected as rhythmic in zebrafish liver, called rhythmic orthologs; and the remaining one-to-one orthologs. c) Chart providing the legends to inform about the method and the threshold used to call genes rhythmic for each condition (species and tissue). d) p-values density distribution of rhythmic orthologs vs non-rhythmic orthologs obtained for the seven methods applied to mouse liver data. Mouse-zebrafish orthologs, that are detected rhythmic in zebrafish liver, are significantly more enriched in small p-values in mouse liver, for all methods (Kolmogorov-Smirnov test p-values < 0.001). Images credit: Anthony Caravaggi (mouse), Ian Quigley (zebrafish).

More »

Fig 5 Expand

Fig 6.

Only strong rhythmic signals of gene expression are relevant.

Methods designed for rhythm detection in gene expression show an advantage only for the genes with a strong rhythmic signal, i.e. related to very small p-values. For a fixed number of top genes called rhythmic, all the methods, despite their design differences, retrieve approximately the same proportion of biologically functional rhythmic genes and the same genes themselves. a) Method to obtain figure b: For a given p-value threshold, each method detects a certain number of rhythmic genes (genes with p-value ≤ threshold). At each threshold, we calculate the proportion of orthologs rhythmic in species2 (A) among one-to-one species1-species2 orthologs (B). The benchmark set is composed of one-to-one orthologs detected rhythmic in the second species (using method ARS, GeneCycle, or empJTK), called rhythmic orthologs. b) Variation of the proportion rhythmic orthologs/all orthologs in mouse as a function of the number of mouse orthologs detected rhythmic, for each method applied to the mouse lung dataset. The benchmark gene set is composed of mouse-rat orthologs, detected rhythmic in rat lung by the GeneCycle method with default p-value ≤ 0.01. The black line is the Naive method which orders genes according to their median expression levels (median of time-points), from highest expressed to lowest expressed gene, then, for each gene, calculates the proportion of rhythmic orthologs among those with higher expression. The proportion of the benchmark set among one-to-one orthologs is higher for highly expressed genes (4th quartile) than for lowly expressed genes (1st quartile) (∼60% vs ∼20% respectively). Diamonds correspond to a p-value threshold of 0.01. c) Upset diagram showing the number of rhythmic orthologs (figure a) called in common by the methods among the first 1000 mouse-rat orthologs that are called rhythmic in mouse lung. Images credit: Anthony Caravaggi (mouse) and Public Domain for other images (from PhyloPic).

More »

Fig 6 Expand

Table 2.

t-test comparing the expression levels between rhythmic (p-value ≤ 0.005) and non-rhythmic genes (randomly chosen same number of genes among those with p-value > 0.01), in mouse liver dataset (microarray).

More »

Table 2 Expand