The transcriptome in a cell is finely regulated by a large number of molecular mechanisms able to control the balance between mRNA production and degradation. Recent experimental findings have evidenced that fine and specific regulation of degradation is needed for proper orchestration of a global cell response to environmental conditions. We developed a computational technique based on stochastic modeling, to infer condition-specific individual mRNA half-lives directly from gene expression time-courses. Predictions from our method were validated by experimentally measured mRNA decay rates during the intraerythrocytic developmental cycle of Plasmodium falciparum. We then applied our methodology to publicly available data on the reproductive and metabolic cycle of budding yeast. Strikingly, our analysis revealed, in all cases, the presence of periodic changes in decay rates of sequentially induced genes and co-ordination strategies between transcription and degradation, thus suggesting a general principle for the proper coordination of transcription and degradation machinery in response to internal and/or external stimuli.
The amount of a given transcript in a cell is determined by a fine tuned balance of production and degradation in a complex regulatory network. Regulation of transcription controls when transcription occurs and how much mRNA is created, whereas regulation of degradation controls the rate at which messengers are destroyed. The latter mechanism has recently gained attention due to the increasing evidence of its key role in the overall co-ordination of gene expression. A long lifetime of mRNA enables a cell to produce more proteins from that mRNA. By contrast, a short lifetime rapidly alters protein levels in response to changing needs. Measuring mRNA stability is a complex and expensive experiment and, given the condition-specific response of the degradation pathway, it would be desirable to take advantage of the large variety of expression experiments stored in public databases. To this end, we developed a stochastic model to infer each specific mRNA lifetime from gene expression data. Predictions were validated using malaria data. We then applied our methodology to the reproductive and metabolic cycle of budding yeast and found, in all cases, the presence of a general principle for the proper coordination of transcription and degradation machinery.
Citation: Cacace F, Paci P, Cusimano V, Germani A, Farina L (2012) Stochastic Modeling of Expression Kinetics Identifies Messenger Half-Lives and Reveals Sequential Waves of Co-ordinated Transcription and Decay. PLoS Comput Biol 8(11): e1002772. https://doi.org/10.1371/journal.pcbi.1002772
Editor: Teresa M. Przytycka, National Center for Biotechnology Information (NCBI), United States of America
Received: May 5, 2012; Accepted: September 23, 2012; Published: November 8, 2012
Copyright: © 2012 Cacace et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The authors are partially supported by The Epigenomics Flagship Project (Progetto Bandiera Epigenomica) EPIGEN funded by Italian Ministry of Education, University and Research (MIUR) and the National Research Council of Italy (CNR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Appropriate and timely changes in gene expression are essential for cell life. The transcriptome is finely regulated by a large number of molecular mechanisms able to adjust the balance between mRNA production and degradation. Every aspect of transcript life is subject to elaborate control but, traditionally, the focus of the research has been on transcriptional regulation . However, whereas mRNA abundance results from the dynamic interplay between transcription and degradation, the speed by which cells can adjust their mRNA levels is critically dependent on the rate of mRNA turnover . As a result, small changes in mRNA stability may dramatically drive rapid variations of transcript abundance. Efforts to understand the underlying principles of mRNA decay and transcription co-ordination are very important since the balance between transcription and decay influences most, if not all, the cell responses to endogenous and exogenous signals .
The current widespread interest in this topic has been fostered by the finding of specific regulatory mechanisms of mRNA stability such as, for example, RNA binding proteins ,  and small RNAs . Regulation of transcript stability cannot be considered a simple “disposal system” but a sophisticated tool for the proper orchestration of the global cell response to internal and external stimuli . Remarkably, a key role of mRNA stability has been reported in cancer, inflammatory diseases and Alzheimer's . In recent years there has been a surge in empirical studies that measured, on a genome-wide scale in a variety of environmental conditions, messenger half-lives of many organisms, including plants  mammals  and fungi . The discovery of such new regulatory layer has clarified that, in order to obtain a clear picture of the underlying regulatory machinery, it is necessary to complement the traditional time-course experiment measuring the cell transcriptional response under certain conditions far from steady state with decay rates data under the same condition .
Experimental procedures for the evaluation of mRNA decay rates are based on measuring gene expression upon inhibition of transcription – or on pulse-chase RNA labeling protocols , –. Such protocols are very critical (see Figure 1 for a comparison among different studies summarized in Table 1), since, for instance, transcriptional shut-off blocks growth and has a profound effect on cellular physiology, as well as on mRNA metabolism . In fact, Wang et al.  and Grigull et al.  datasets show a low value of the Pearson correlation (), and no correlation at all can be found () between Munchel et al.  and Wang et al.  datasets (see Figure 1A and Figure 1B respectively). Despite the same experimental conditions (asynchronous growth), the two half-life independent measurements obtained by Wang et al. and Munchel et al. are uncorrelated, probably due to differences in the shut off protocol (pulse chase for  and thermal inactivation for ), whereas Grigull et al. and Wang et al. appears significantly correlated, probably due to the same shut off protocol used (thermal inactivation).
Three genome-wide studies considered are: Grigull et al. , Wang et al.  and Munchel et al. . (A) Scatterplot of Wang et al. and Grigull et al. datasets; (B) Scatterplot of Munchel et al. and Wang et al. datasets.
It has been shown that genes having the same biological function ,  are likely to share similar half-life values. Consistently, by averaging using functional groups, we found an increase in correlation between Wang et al. and Grigull et al. datasets (, see Supplementary Figure S1A), and still no correlation between Munchel et al. and Wang et al. datasets (, see Supplementary Figure S1B).
Here, we developed a stochastic computational model of the expression kinetics to identify condition-specific mRNA stabilities which makes use only of experimental mRNA time profiles. We also assumed that degradation rates are gene-specific but approximately constant over the experiment time course. Predictions of our algorithm, termed DRAGON (Decay RAtes from Gene expressiON), were validated on experimental mRNA abundance  and turnover  data, both collected during the Intraerythrocytic Developmental Cycle (IDC) of Plasmodium falciparum. The estimations were in line with the experimental measurements. Remarkably, the DRAGON estimated half-lives were consistent with the finding of a peculiar pattern of mean half-life values along the wave of sequentially induced genes in subsequent stages of P. falciparum development. We also applied our methodology to public time-series datasets for which half-lives data, under the same experimental conditions, have not been experimentally measured. In particular we focused on budding yeast reproductive ,  and metabolic cycle data . In fact, for the yeast Saccharomyces cerevisiae, only half-life data under asynchronous growth are publicly available , , . Our study showed the presence of the same periodic pattern of mean half-life values in all datasets, thus suggesting that such behavior may be a general feature, not limited to the Plasmodium falciparum IDC.
mRNA kinetics and half-life
Experimental evidence suggest that the majority of mRNAs are degraded with a first-order decay rate , . This allows to characterize mRNA disappearance time profiles by a first-order rate equation(1)where is the decay rate (or half-life , with ), is the mRNA concentration and is the promoter activity (the rate of production of new mRNAs). It is worth noting that, the degradation rate cannot be estimated from the concentration time profile for a single gene, since the term is not usually available in the typical time-course microarray experiment. The measurement of the promoter activity time profile would require additional experiments (such as those described in ) but, in this paper, we will assume that only mRNA abundance time-series data are available. At steady-state and are constant so that and, consequently, we obtain(2)From the above equation, it is clear that at steady-state an increase (decrease) in mRNA concentration can be produced either by an increase (decrease) of transcription or by a decreased (increased) value of the decay rate: the two regulatory strategies have therefore an equivalent outcome. As a result, from steady-states measurements, it is hopeless to reveal the relative contribution of transcription and degradation and, most importantly, their co-ordinated activity as well. By contrast, the whole kinetics of induction and relaxation, as measured by time-courses experiments, depends on the degradation and production rate in different ways: increasing (decreasing) the production rate results in a proportionally increased (decreased) mRNA abundance, whereas the rise time (i.e. the time required for the response to rise from to of its final value) is not affected. Increasing the decay rate results in a faster rise time both in the induction and relaxation phases, whereas a decrease results in slower rise time . This key point is illustrated in Figure 2A and in Supplementary Figures S2A and S2B.
Panel A shows in silico experiments to illustrate some basic features of gene induction kinetics. The “ON” and “OFF” regions correspond to the turning “ON” or “OFF” of the promoter activity. (A) Induction kinetic of transcripts having the same steady-state concentration, but different half-lives and synthesis rates reaching the same steady state value. The time profile plotted in red corresponds to an unstable transcript and displays a fast induction and relaxation profile. By contrast, the blue one has an higher half-life value, resulting in a slower response. (B) Cascade of transcription factors resulting in waves of sequentially induced genes. The timing of expression peaks is modulated by transcriptional serial regulation. (C) Sequentially induced genes generated by a single transcription factor and a stability “gradient”. The timing of expression peaks is modulated by post-transcriptional regulation. Early induced genes are those with a low half-life value, late induced ones are those with an high half-life value.
Another important consequence of half-life specificity is the regulation of the timing of gene induction, as pointed out by Elkon et al. . In fact, an expression wave, i.e. the sequential activation of genes, is usually interpreted as resulting from the corresponding activation of a multi-step transcription factors cascade (as illustrated in Figure 2B). Whereas such mechanism is certainly very important, there is also an alternative way to obtain an expression wave by means of a “stability gradient”. As illustrated in Figure 2C, a single transcription factor may initiate transcription of a set of target genes and their peak of induction can be modulated by a stability “gradient”, i.e. by specifically adjusted decay rates. More precisely, early induced genes would have short half-lives and late responding genes would have long half-lives. Clearly, both mechanism may well act in cells, thus generating a wide spectrum of responses.
Time-courses are a very common design for microarray analysis, which allows researchers to follow the dynamics of the cellular response to perturbations . Such data are available for a very large number of experimental conditions and organisms: only the Stanford Microarray Database includes to date 1545 time course data sets. Among the examples later illustrated in the paper, it is worth mentioning the genome-wide gene expression time-series obtained during the reproductive cell cycle , , the metabolic cycle  and the P. falciparum IDC . The time-series datasets used in this paper are summarized in Table 2.
DRAGON–an algorithm for half-life estimation
The goal of the DRAGON methodology is to derive a robust estimate of each mRNA species half-life starting from all available gene expression pairs. The rationale for the algorithm mainly draws on properties of pairs exhibiting a certain degree of common promoter activity (as in ). Besides, DRAGON infers common promoter activity using a statistical model that simulates both gene-specific and common effects.
The rate of change of mRNA concentration for a generic pair of genes, say gene and gene , is:(3)where the symbols and represent the mRNA time profiles of the gene pair and , and are the promoters activity, and and are the degradation rate of mRNA of gene and , respectively. The terms , are not known since we considered the case in which only mRNA abundance is measured.
We modeled promoter activities as the sum of two terms, the first one common to the pair and the other one specific for each gene:(4)where is the common part, scaled by constants and , whereas and are gene specific independent stochastic processes with zero mean, that is , . Equations (3)–(4) encompass the case of:
- , fully correlated (correlation ) for which and
- , partially correlated (correlation ) for which and
- , un-correlated (correlation ) for which either or .
Equations (3)–(4) can therefore be written for all available gene pairs; thus, for a set of genes, we have pairs to analyze. For each gene pair DRAGON provides an estimate of the time profile of , of all the parameters, , , , , and the covariance matrix of the stochastic processes. For each gene we therefore have estimates of the decay rate , one for each pair containing that gene.
Notice that equations (3)–(4) yield a couple of linear stochastic differential equations. Since measurements of mRNA concentrations are available only at given time points, it is necessary to transform (3)–(4) in a couple of discrete stochastic equations. The exact discretization of (3)–(4) is possible since they are linear . The Kalman filter  is used on the resulting discrete equations and a maximum likelihood algorithm is exploited to generate the best possible estimate of the parameters.
A complete description of the mathematical model and of the discretization and parameters estimation procedure is given in the paragraph Stochastic modeling of expression kinetics and Kalman filtering of the Materials and Methods section.
Performance evaluation on malaria IDC experimental data
The IDC is characterized by four morphologic stages: ring, trophozoite, schizont and late schizont. The cycle begins with the red cells invasion by merozoites followed by a remodeling of the host cell in the ring stage . The merozoites then develop into trophozoites. During the schizont stage, after a period of growth, the trophozoite undergoes an asexual dividing process and the parasite is ready for the next round of invasion by new merozoites (late schizont phase).
Bozdech et al. , using microarrays, measured genome-wide mRNA abundance profiles across 48 h during one cycle of P. falciparum IDC, collecting one sample per hour. Later on, Shock et al.  measured mRNA half-lives of 2774 transcripts of the IDC using chemical inhibitors to reach transcriptional shut-off.
The simultaneous availability of gene expression and decay data during the same biological process (IDC) represents a natural test bed for the validation of the DRAGON algorithm. Therefore, we applied DRAGON on Bozdech et al. dataset to obtain mRNA stability estimations (provided in Supplementary Table S6) to be compared with Shock et al. measurements for performance evaluation. The resulting Pearson correlation between in vitro and in silico measures is (P value ), and the first principal component accounts for of the variability, thus showing a good performance for DRAGON algorithm (see Figure 3). However, since gene expression and decay data have been measured by different groups, we can speculate that of unexplained variability may be partly due to inherent biological variability and to transcriptional inhibition stress. As further analysis, we computed average mRNA half-lives in both studies for functional categories (see Supplementary Figure S3). We found that the two studies are in better accordance when half-lives are averaged for all genes within any given functional category (Pearson correlation ).
Scatterplot of mRNA half-lives for 616 genes estimated by DRAGON versus experimentally measured by Shock et al. .
Remarkably, Shock et al. in , found progressive stage-dependent average increases in mRNA stability and suggested such phenomenon to be a major determinant of mRNA accumulation (see Figure 4A). The same feature is also found using DRAGON estimated half-lives (see Figure 4B). To investigate in further detail the behavior of average half-life of genes sequentially induced during IDC, we computed for each gene the time point corresponding to its peak of expression (see the Data processing paragraph of the Materials and Methods section for details) and selected groups of genes having peak of expression at each hourly time points over the hours monitored by Shock et al. For each gene group we computed half-lives mean and standard deviation and found a high correlation with the corresponding curve obtained using experimental data (Pearson correlation , P value ; see Figure 4C). Early responding genes are characterized by high instability, whereas late responders are more stable, as also reported by Elkon et al. in  when studying mammalian cells. A possible explanation for the presence of stable mRNAs at the schizont stage, suggested by Shock et al., is that it may be important for the merozoite to receive a carefully regulated “starting package”, that would allow rapid activation of the IDC following the next round of invasion . By contrast, the initial low mRNA stability values may be an indication of the fast dynamic remodeling after merozoite invasion . To evaluate the probability of obtaining such behavior by chance, we randomized the gene expression matrix and used DRAGON to estimate half-lives (see Figure 4D). Consistently, the estimation of half-lives using random data does not produce any correlation with experimental data (Pearson correlation ).
(A) Histograms of mRNA half-lives for genes induced at each stage of the P. falciparum IDC as experimentally measured by Shock et al. and (B) estimated by DRAGON algorithm. The inset panels show mean and standard deviation of half-lives during each stage. Both studies show an increase of average transcript stabilities of sequentially induced genes during P. falciparum IDC. (C) Average experimental and estimated half-life values (red and blue dots, respectively) corresponding to genes having the same expression peak timing, indicated on the x-axis. Standard deviations are drawn as pale blue and pale red stripes. The two curves both show a maximum peak of average half-life value during the schizont stage and a minimal value during the ring stage. A sharp increase of average half-life occurs during the trophozoite stage. The Person correlation between experimental and DRAGON estimated curve is , thus showing a good agreement between the two studies. (D) Effect of randomizing the gene expression matrix on DRAGON estimated half-lives.
Half-lives estimation during reproductive cycle in S. cerevisiae
Gene expression during yeast cell cycle has been recently measured by Pramila et al.  using alpha-factor synchronization and by Orlando et al.  using centrifugal elutriation for synchronization. We obtained a high consistency of DRAGON estimations using data for 569 transcripts over replicate datasets (Pearson correlation for Pramila et al. dataset and Pearson correlation for the Orlando et al. dataset; see Figure 5A–B). The larger variability in half-lives estimations may be explained by the inconsistencies between replicate time-series in the Pramila et al. dataset with respect to the Orlando et al. dataset (see Figure 5C). All half-lives estimations obtained with the DRAGON algorithm are provided in Supplementary Tables S1 and S2 (Pramila) and in Supplementary Tables S3 and S4 (Orlando). Notwithstanding significant differences in synchronization procedures, we also found a high correlation of DRAGON half-lives estimations over the two datasets (Pearson correlation , P value ; see Figure 5D) where the first principal component accounts for of the overall variability. We can speculate that of unexplained variability may be partly due to the different synchronization methods used. In fact, Orlando et al. obtained a cell cycle duration of about 2 hours, 8 samples per cycle , whereas Pramila et al. obtained a cell cycle duration of about 1 hour, 12 samples per cycle. Consistently, most of the transcripts during the slower cycle display higher half-lives when compared to the fastest cycle (see Figure 5D).
(A) Scatterplot of the DRAGON estimated half-lives using two replicates taken from Pramila et al. dataset  (denoted by alpha30 and alpha38). (B) Scatterplot of the DRAGON estimated half-lives using two replicates taken from Orlando et al. dataset  (denoted by Orlando wtr1 and Orlando wtr2). (C) Histograms of the Pearson correlation values between time series relative to each gene in two replicates. Orlando et al. dataset shows a better consistency between replicates with respect to the Pramila et al. dataset. (D) Scatterplot of the DRAGON estimated half-lives using the Pramila et al.  and the Orlando et al. dataset . The half-lives obtained by replicate datasets have been averaged. The half-lives estimated using Orlando dataset show slightly higher values with respect to those obtained using Pramila dataset, as shown by the deviation from the bisector line (dashed blue line). This is consistent with the slower cell cycle in Orlando experiment (2 hours) compared to that of Pramila experiment (1 hour).
GO annotations of genes with extreme half-lives in S. cerevisiae
In this paragraph we briefly discuss functional annotations (done using GOrilla software ) of novel predicted half-lives provided by DRAGON algorithm using yeast reproductive and metabolic cycle time series. For the yeast cell cycle we normalized the half-life log-distribution (Z-score), for each dataset, and then computed the geometric mean to obtain a single half-life value for each gene. Notably, the averaging has the effect of reducing the impact of the different synchronization stress response. The list of half-lives normalized values (geometric mean value equal to 1) for common genes to all datasets is provided as Supplementary Table S7 in the Half-life estimation paragraph of the Materials and Methods section.
Unstable genes are enriched with replication fork complex (p-value ) and stable genes (histones HA1-2,HB1-2) are enriched with nucleosome (p-value ). This is consistent with the need of producing a large number of histones during DNA replication process so that stable histone mRNAs contribute to a higher translation efficiency. Moreover, DNA replication timing requires first the formation of the replication fork, then the production of the needed histones for chromatin assembling: such temporal sequence of events is consistent with a rapid turnover of the replication complex genes and a slow turnover of the histone genes (see Supplementary Figure S4). Among unstable genes we also found the G1/S transition cyclins and among stable ones we found G2/M transition cyclins (see Supplementary Figure S5). In this case, the temporal sequence of events is the progression of the cell cycle from DNA replication to mitosis.
For the yeast metabolic cycle (half-lives estimations using DRAGON algorithm are provided in Supplementary Table S5) we found many stable mRNA species involved in the organic acid and arginine metabolism and protein catabolic processes. Among unstable messengers we found genes involved in DNA repair (p-value ), DNA metabolism (p-value ) and chromatin silencing (p-value ).
Periodic behavior of average half-lives of sequentially induced genes
The increasing pattern of average half-life found during P. falciparum IDC (shown in Figure 4A) motivated us to investigate whether a periodicity could be found also in other cyclical biological processes. We focused on the reproductive cell cycle and the metabolic cycle in Saccharomyces cerevisiae, for which high resolution time series measurements are available on public repositories (see Table 2).
To study if a periodic pattern of average half-life of sequentially induced genes exists along the cell cycle progression, for each gene we computed the time points at which maximal expression is attained (see the Data processing paragraph of the Materials and Methods section for details). Thus, we obtained, for each time point, the list of genes having expression peak value at that time and computed the corresponding mean and variance of DRAGON estimated half-lives values. Indeed, we found a cyclic behavior along sequentially induced genes in both datasets (see Figure 6A for the Pramila et al. dataset and Figure 6B for the Orlando et al. dataset). Synchronization methods, cell cycle duration and number of samples are different between the two cited studies, but, reassuringly, the phases of the cell cycle at which mean half-life is minimal or maximal is consistent. In fact, for both datasets we observed a cyclical increase of mean half-life from G1 phase to M phase and a subsequent decrease back to G1. The figure clarifies that the minimal mean half-life is reached at the G1/S transition, whereas the maximal value correspond to the M/G2 phase for both cycles and datasets. The latter is consistent with the observation that, in higher eukaryotes, mitosis is accompanied by global repression of nuclear RNA synthesis , indicating that mRNAs must be stable to be inherited from daughter cells.
Average DRAGON estimated messenger half-life values corresponding to genes having the same expression peak timing, indicated on the x-axis (dark blue points). Standard deviations are drawn as pale blue stripes. (A) yeast cell cycle using Pramila dataset, (B) yeast cell cycle using Orlando dataset and (C) yeast metabolic cycle. Strikingly, in all datasets the maximal average half-life is attained for genes induced during G2/M phase, including the metabolic cycle. The minimal average half-life, in all datasets, is attained at the G1/S transition phase. The bar charts show mean and standard deviation of half-lives during each stage.
The yeast metabolic cycle has been recently studied by Tu et al.  using a continuous culture system, after a brief starvation period, the culture spontaneously began persistent respiratory cycles of about 5 hours. In the same study, a genome-wide microarray gene expression measurement was performed. Samples were taken every 25 minutes for 3 consecutive cycles. Using DRAGON algorithm we estimated half-lives using data of 1043 transcripts. Surprisingly, also in this case we found a cyclical pattern for mean half-life of sequentially induced genes. The maximum peak is located at the RC phase and the minimum peak located at RB phase (see Figure 6C).
Integrated analysis–sequential waves of co-ordinated transcription and decay
Recently, the appearance of a number of studies has revealed the fundamental role of stability regulation in shaping appropriate cell response . A key point has been recently addressed by Shalem et al. , who have shown the dynamic co-ordinated interplay between transcription and degradation. They have found in yeast two basic regulatory strategies in response to stress. More precisely, they measured changes of mRNA abundance and decay rates in a yeast population subjected to oxidative and DNA damage stress. By grouping genes according to the time point at which the maximal (minimal) fold change is attained and combining normalized (mean and variance) mRNA abundance and decay rate data, they constructed a “stability versus folding” (SF) diagram where change in mRNA stability relative to a reference state (mean value in our case) is plotted against the maximal fold change. Using yeast expression time-course data obtained in response to an oxidative stress and a DNA damage, they were able to reveal two different strategies: a) a “counteracting regulation” strategy (see Figure 7A), characterized by genes in which an increase (decrease) in degradation rates counteracts a increase (decrease) in mRNA abundance, i.e. repressed genes are stabilized and induced genes are destabilized; b) a “synergistic regulation” strategy (see Figure 7B), characterized by genes in which an increase (decrease) in degradation rates is associated with an decrease (increase) in mRNA abundance, i.e. induced genes are stabilized and repressed genes are destabilized.
The stability/folding diagrams (SF), introduced by Shalem et al. in , show the change in mRNA stability relative to the average value plotted against the maximal fold change. (A) Counteracting strategy (negative correlation): induced genes are destabilized and repressed genes are stabilized. (B) Synergistic strategy (positive correlation): induced genes are stabilized and repressed genes are destabilized.
Shalem et al. also found that, progressing from early time points forward, the negative correlation (counteracting) was replaced with a positive correlation (synergistic). Such co-ordination strategy may permit crosstalk between different steps of mRNA biogenesis, providing a mechanism to control the order and timing of events . The work of Shalem et al. has shown the importance of combining expression data with decay rates under the same experimental condition to reveal the underlying strategy of co-ordination of the two “regulatory arms”, namely transcription and degradation. Uncovering such relationships is certainly a fundamental task, since the underlying reciprocal influences between mRNA production and degradation are largely unexplored . The DRAGON algorithm, by estimating half-lives directly from gene expression data under specific conditions, allows the computational integration of mRNA abundance and decay rates data, making this powerful combined analysis possible when experimentally measured half-lives are not available.
We computed SF diagrams for P. falciparum IDC, yeast cell cycle (Pramila et al. dataset) and metabolic cycle (shown in Figure 8). In panels A,C and E each blue dot corresponds to a Pearson correlation of the SF diagram at the peak time point indicated on the x-axis, for the three datasets. In panels B,D and F the SF diagrams corresponding to the correlation values indicated by the arrows in panels A,C and E are displayed. The arrows point to maximal negative (red dots in panels B,D,F and red arrows in panels A,C,E) and maximal positive correlation values (green dots in panels B,D,F and green arrows in panels A,C,E).
In all cases under study, we found a progressive shift from an inverse (counteracting) to a direct (synergistic) relationship. (A) Plasmodium falciparum IDC, (C) yeast cell cycle (Pramila et al. dataset) and (E) yeast metabolic cycle. In panels A,C and E each blue dot corresponds to a Pearson correlation of the SF diagram at the peak time point indicated on the x-axis, for the three datasets. In panels B,D and F the SF diagrams corresponding to the correlation values indicated by arrows in panels A,C and E are displayed. The arrows point to maximal negative (red dots in panels B,D,F and red arrows in panels A,C,E) and maximal positive correlation values (green dots in panels B,D,F and corresponding green arrows in panels A,C,E).
Strikingly, in all cases we reached the same conclusions of Shalem et al., namely we found that early induced genes show counteracting regulation, whereas late induced genes show a synergistic regulation.
Advantages and disadvantages of the method
The main advantage of the DRAGON algorithm consists in the estimation of the mRNA half-lives directly from gene expression time-course during condition-specific experiments. Moreover it estimates the correlation among promoter activities between pairs of genes. Another advantage of the algorithm lies in its robustness. Specifically, we observed that even if the accuracy of the absolute values of the estimated half-lives can be influenced by many factors (such as the number of points in the time series, the accuracy of the measurements, the time interval between samples, the choice of the thresholds for the outliers, etc.), the ranking of half-lives is insensitive to the factors mentioned above.
The main disadvantages are the following: DRAGON can work only with time-series under the same experimental condition and cannot handle steady-state values under different conditions. As a general rule a reliable estimate requires at least 10–12 time samples, i.e. a number significantly larger than the number of parameters to be estimated (this rule is not obviously always applicable as the required number of points depends strongly on the signal to noise ratio) and a sampling time not larger than the expected average half-life. If no information is available about the correlation of promoter activities, as a rule of thumb, a set of at least 50–100 time series must be processed together in order to have reliable half-lives estimates. One basic hypothesis is that the half-life of a transcript is approximately constant during the time course of the experiment, thus a substantial change of its value would yield an unreliable estimate. These problems can be handled by performing more measurements using a shorter sample time, or by considering moving time windows. The computational overhead can be significant: for a sample of 1000 time series there are pairs to analyze, requiring a computation of about 150 hours on a medium-speed single-processor machine capable of analyzing 2 pairs per second.
Our analysis supports and strengthens Shalem et al. conclusions about the coordination of transcriptional and mRNA degradation in the cell in response to stress. We have demonstrated that during periodic processes, such as the P. falciparum IDC, the reproductive cell cycle and the metabolic cycle, the alternative interplays between changes in mRNA stability and changes in mRNA abundance are activated by periodically switching from a counteracting to a synergistic regulation. In light of these results, the classical vision of periodic processes as the result of serial transcription factor sequential activation, should be re-considered from a broader point of view by including post-transcriptional regulation and coordination.
Materials and Methods
Stochastic modeling of expression kinetics and Kalman filtering
We defined as the time profile of the expression of gene at time . The underlying conservation equation simply stems from the observation that the rate of change of with time, i.e. its time derivative , must equal the difference between the production and degradation term. Based on experimental evidence , the degradation is well described by a first order term. The dynamics of the -th transcript is therefore described by(5)where is the mRNA decay rate of -th messenger. This value is linked to the half-life of the transcript by the relation . is the -th gene promoter activity regulated by transcription factors. Such regulation occurs by triggering or suppressing the transcription of the -th gene, thus we have . Moreover, the observed measure is also a noisy time-series, thus we have(6)where is the standard deviation of measurements white noise (see supplementary material Text S1 for an example of the identification procedure). We considered a generic pair of expression time profiles characterized by the presence of two terms: a stochastically correlated promoter activity and a gene-specific term . We then considered the case:(7)where is a scaling factor accounting for the relative contribution to the overall promoter activity regarding gene . The term models the part of the promoter activity which is not common to the pair. We model this part by means of a noise term, which is assumed to be a white noise. The common part is modeled as a Wiener process:(8)where is white noise. Thus . The complete mathematical dynamic model for two transcripts and , together with their respective measurement equations, is(9)We can rewrite the linear dynamic system (9) using a compact matrix notation(10)whereandSince the dynamic system (10) is linear, it can be exactly discretized (see ) for a given time interval , corresponding to the time interval between two consecutive measurements. The -th measurements corresponds to , thus in the discretized system we can use in place of , to keep the notation simple.
The solution of the linear dynamic system (10) is(11)and its discretized form is(12)whereand is the covariance matrix defined by(13)The unknown parameters of the model to be estimated are , , , with , and . The state variables of the system are , and . For each given choice of the parameters we used the Kalman filter  to estimate of the state variables.
The Kalman filter equation uses a feedback control strategy. It contains a prediction term for projecting forward (in time) the current state to obtain the a priori estimate, and a correcting term for incorporating a new measurement into the a priori estimate to obtain an improved a posteriori estimate(14)where is the prediction Kalman gain that depends on the parameters of the stochastic equation.
For each choice of we run the Kalman filter. A probability value is associated to the resulting estimation. These values measures the probability that the current parametrization of the model generates the measured time series. Denoting by the innovation of the stochastic process, is a sequence of independent gaussian random variables with covariance . The optimal set of parameters if therefore chosen according to a maximum likelihood criterion as the choice corresponding to the maximum of the a priori probability density of the innovation sequence. This corresponds to the minimum of the likelihood functionwhere is the number of samples. We are interested in the half life of the -th messenger. To use all the available information and make the method robust with respect to measurement and estimation errors, we have designed the following algorithm (see Supplementary Figure S6):
- Given a set of mRNA time profiles, perform the maximum likelihood estimation for every pair and compute the corresponding and .
- For each pair compute the ratio matrix whose elements are the ratios between the half-lives of gene and gene . The matrix is generally not symmetric due to the presence of outliers and numerical sensitivity. Thus we defined the ratio estimation by row as and by column as . The matrix contains all the ratios on the -th row, and all the ratios on the -th column. Let us denote the sum of the th row, the sum of the column, and the mean operator, that is,(15)(16)(17)
- Given , delete outliers to obtain a final matrix . First, compute the probability density (using a smoothing kernel approach) of all the entries and delete those values below a probability of of occurring in the distribution. Second, since ideally , we considered as outliers those pairs such that .
- On the resulting matrix compute for each transcript two estimates of its half-life and , using equations (15), (16)and (17). We obtained(18)(19)This computation requires the value of . When this value is known for the group of transcripts under analysis the measured value can be used. Otherwise, letting one can obtain half-life values that are relative to the average half-life of the group. However, we have followed a third approach. All the results reported in this paper have been obtained by replacing with the geometric mean of and , that is(20)The final estimate of the half-life for the -th gene is computed as the weighted average of and using as weights the respective variances and as follows(21)where and are the standard deviation of the and , respectively.
- We considered as a quality index for each estimated half-life the following:where , are noise variances of the discrete system (12) and (see equation (13)) is the mutual covariance of the state noise between time series and . Thus, high values of imply the presence of a correlation between and in equation (9). We removed the half-lives having a value smaller than the percentile of its distribution.
Public experimental data used throughout the paper are described in Table 1 (experimental half-lives measurements) and in Table 2 (gene expression time series). Pramila et al. in  and Orlando et al. in  experimentally measured genome-wide gene expression data during the reproductive cell cycle. We considered the ranking provided by the combined test developed by de Lichtenberg et al.  for each replicate for the two datasets and, among the list of 1000 genes with highest ranking, we selected those common to all datasets. We ended up with a list of 569 genes that we used for half-life estimation. Tu et al. in  experimentally measured genome-wide gene expression data during the metabolic cell cycle. We selected 1000 genes with the best periodicity score according to . Of the 1000 genes, DRAGON estimated half-lives are 939.
Half-lives determination of genes induced during each stage of P. falciparum IDC.
Shock et al. in  experimentally measured genome-wide values of decay rates for each gene in each of the four stages of the IDC. To obtain a single half-life value for each messenger, we performed a k-means clustering of microarray gene expression data  by considering 5 stages (according to : early ring, ring, trophozoite, schizont and late schizont). Then, we merged the early ring cluster with the ring cluster to obtain the same stages as in Shock et al.. Among the 4488 genes in  we chose 1000 genes with the best periodicity score (power signal/power total ratio) according to . Of the 1000 genes, DRAGON estimated half-lives are 967, available experimental half-lives are 675. Both data are available over a set of 616 genes.
Expression peak timing estimation
To estimate peak timing, for a given noisy gene expression time profile, we preliminary performed the smoothing algorithm presented by Bar-Joseph et al. in . The algorithm employs two parameters: grid (number of spline curves) and classes (number of classes to use for clustering). In particular, for Pramila datasets we used and , for Orlando datasets we used and , for Tu dataset we used and , for Malaria dataset we used and .
Matlab code will be provided upon request.
Additional data and information can be found at web site http://www.dis.uniroma1.it/~farina/dragon.
Functional categories analysis in yeast S. cerevisiae during asynchronous growth measured by three laboratories. Three genome-wide studies are considered: Grigull et al., Wang et al. and Munchel et al. (A) Average mRNA half-lives in both studies Wang et al. and the Grigull et al. datasets for 111 functional categories from the yeast GO Biological Process database (http://www.geneontology.org) that are represented in the set of 2863 transcripts by 5 or more members. (B) compare, in the same way, the Munchel et al. and the Wang et al. datasets.
Kinetics of gene induction. Panels A–B show in silico experiments to illustrate some basic features of gene induction kinetics. The reference time profile with unity steady-state is plotted in black. The “ON” and “OFF” regions correspond to the turning “ON” or “OFF” of the promoter activity. (A) Induction kinetic of transcripts having the same half-life value and, as a consequence, the same speed of response. The higher (or lower) steady-state value of the red and blue time profiles is due only to an increased (or decreased) transcription rate. (B) Induction kinetic of transcripts having different half-lives. The time profile plotted in red corresponds to an unstable transcript. It has a faster induction and relaxation profile but a lower steady-state value. By contrast, the blue one has an higher half-life value, resulting in a higher steady state value but a slower response. The example illustrates that, to obtain both a fast response and an high steady-state value, the regulatory strategy must destabilize transcriptionally up-regulated genes.
Functional categories analysis for DRAGON estimations using P. falciparum IDC data. Average mRNA half-lives in both studies, DRAGON iestimations versus and experimentally measured by Shock et al. half-lives, for 12 functional categories from the P. falciparum GO annotation database (http://www.geneontology.org) that are represented in the set of 616 transcripts by 5 or more members.
GO annotations of genes with extreme half-lives in S. cerevisiae DNA replication timing requires first the formation of the replication fork, then the production of the needed histones for chromatin assembling: such temporal sequence of events is consistent with a rapid turnover of the replication complex genes and a slow turnover of the histone genes.
GO annotations of genes with extreme half-lives in S. cerevisiae Among unstable genes we also found the G1/S transition cyclins and among stable ones we found G2/M transition cyclins. In this case, the temporal sequence of events is the progression of the cell cycle from DNA replication to mitosis.
DRAGON algorithm pipeline.
DRAGON estimated half-lives using alpha30 dataset.
DRAGON estimated half-lives using alpha38 dataset.
DRAGON estimated half-lives using Orlando replicate 1 dataset.
DRAGON estimated half-lives using Orlando replicate 2 dataset.
DRAGON estimated half-lives using metabolic cycle dataset.
DRAGON estimated half-lives using P. falciparum dataset.
DRAGON estimated normalized half-lives using all yeast datasets.
Example of parameter estimation through the Kalman filter.
We would like to thank Prof. Pino Macino and Dr. Teresa Colombo for helpful suggestions on the manuscript and the “Consorzio interuniversitario per le Applicazioni di Supercalcolo Per Università e Ricerca” (CASPUR) for providing computing resources and Dr. Alessandro Federico for support.
The authors are partially supported by The Epigenomics Flagship Project (Progetto Bandiera Epigenomica) EPIGEN funded by Italian Ministry of Education, University and Research (MIUR) and the National Research Council of Italy (CNR).
Conceived and designed the experiments: LF AG. Performed the experiments: FC PP VC. Analyzed the data: PP LF. Wrote the paper: FC PP AG LF. Made the figures: LF.
- 1. Garneau N, Wilusz J, Wilusz C (2007) The highways and byways of mrna decay. Nat Rev Mol Cell Bio 8: 113–126.
- 2. Munchel S, Shultzaberger R, Takizawa N, Weis K (2011) Dynamic profiling of mrna turnover reveals gene-specific and system-wide regulation of mrna decay. Mol Biol Cell 22: 2787–2795.
- 3. Keene J (2010) The global dynamics of rna stability orchestrates responses to cellular activation. BMC Biol 8: 95.
- 4. Gerber A, Herschlag D, Brown P (2004) Extensive association of functionally and cytotopically related mrnas with puf family rna-binding proteins in yeast. PLoS Biol 2: 342–354.
- 5. Hogan D, Riordan D, Gerber A, Herschlag D, Brown P (2008) Diverse rna-binding proteins interact with functionally related sets of rnas, suggesting extensive regulatory system. PLoS Biol 6: e255.
- 6. Houseley J, Tollervay D (2009) The many pathways of rna degradation. Cell 136: 763–776.
- 7. Cheneval D, Kastelic T, Fuerst P, Parker C (2010) A review of methods to monitor the modulation of mrna stability: a novel approach to drug discovery and therapeutic intervention. J Biomol Screen 15: 609–622.
- 8. Narsai R, Howell K, Millar A, O'Toole N, Small I, et al. (2007) Genome-wide analysis of mrna decay rates and their determinants in arabidopsis thaliana. Plant Cell 19: 3418–3436.
- 9. Sharova L, Sharov A, Nedorezov T, Piao Y, Shaik N, et al. (2009) Database for mrna half-life of 19977 genes obtained by dna microarray analysis of pluripotent and differentiating mouse embryonic cells. DNA Res 16: 45–58.
- 10. Shalem O, Dahan O, Martinez M, Furman I, Segal E, et al. (2008) Transient transcriptional re- sponses to stress are generated by opposing effects of mrna production and degradation. Mol Syst Biol 4: 223.
- 11. Wang Y, Liu C, Storey J, Tibshirani R, Hershlag D, et al. (2002) Precision and functional specificity in mrna decay. Proc Natl Acad Sci U S A 99: 5860–5865.
- 12. Grigull J, Mnaimneh S, Pootoolal J, Robinson M, Hughes T (2004) Genome-wide analysis of mrna stability using transcription inhibitors and microarrays reveals posttranscriptional control of ribosome biogenesis factors. Mol Cell Biol 25: 5534–5547.
- 13. Coller J (2008) Methods to determine mrna half-life in saccharomyces cerevisiae. Methods Enzymol 448: 267–284.
- 14. Rabani M, Levin J, Fan L, Adiconis X, Raychowdhury R, et al. (2011) Metabolic labeling of rna uncovers principles of rna production and degradation dynamics in mammalian cells. Nature Biotechnol 29: 436–442.
- 15. Garcia-Martinez J, Aranda A, Perez-Ortin J (2004) Genomic run-on evaluates transcription rates for all yeasts genes and identifies gene regulatory mechanisms. Mol Cell 15: 303–313.
- 16. Bozdech Z, Llinas M, Pulliam B, Wong E, Zhu J, et al. (2003) The transcriptome of the intraery- throcytic developmental cycle of Plasmodium falciparum. PLoS Biol I: E5.
- 17. Shock J, Fischer K, DeRisi J (2007) Whole-genome analysis of mrna decay in Plasmodium falci- parum reveals a global lengthening of mrna half-life during the intra-erythrocytic developmental cycle. Genome Biol 8: R134.
- 18. Pramila T, Wu W, Miles S, Noble W, Breeden L (2006) The forkhead transcription factor hcm1 regulates chromosome segregation genes and fills the s-phase gap in the transcriptional circuitry of the cell cycle. Genes Dev 20: 2266–2278.
- 19. Orlando D, Lin C, Bernard A, Wang J, Socolar J, et al. (2008) Global control of cell-cycle tran- scription by coupled cdk and network oscillators. Nature 453: 944–947.
- 20. Tu B, Kudlicki A, Rowicka M, McNight S (2005) Logic of the yeast metabolic cycle: temporal compartmentalization of cellular processes. Science 310: 1152–1158.
- 21. Ross J (1995) mrna stability in mammalian cells. Microbiol Rev 59: 423–450.
- 22. Elkon R, Zlotorynski E, Zeller K, Agami R (2010) Major role for mrna stability in shaping the kinetics of gene induction. BMC Genomics 11: 259.
- 23. Farina L, Santis AD, Salvucci S, Morelli G, Ruberti I (2008) Embedding mrna stability in corre-lation analysis of time-series gene expression data. PLoS Comput Biol 4: e1000141.
- 24. Kailath T (1980) Linear systems. Prentice-Hall. 682 p.
- 25. Kalman R (1960) A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering 82: 35–45.
- 26. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z (2009) Gorilla: A tool for discovery and visualization of enriched go terms in ranked gene lists. BMC Bioinformatics 2009: 10–48.
- 27. Shermoen A, O'Farrell P (1991) Progression of the cell cycle through mitosis leads to abortion of nascent transcripts. Cell 67: 303–310.
- 28. Dahan O, Gingold H, Pilpel Y (2011) Regulatory mechanisms and networks couple the different phases of gene expression. Trends Genet 27: 316–322.
- 29. de Lichtenberg U, Jensen L, Fausboll A, Jensen T, Bork P, et al. (2004) Comparison of computa-tional methods for the identification of cell cycle-regulated genes. Bioinformatics 21: 1164–1171.
- 30. Bar-Joseph Z, Gerber G, Gifford D, Jaakkola T, Simon I (2003) Continuous representations of time-series gene expression data. J Comput Biol 10: 341–356.