Figure 1.
Particles representing ribosomes move along a unidimensional lattice (the mRNA chain) in which each site represents a codon. For the sake of illustration, in the sketch a particle covers 3 codons, while in the model we considered particles occupying 9 codons [15]. (A) Schematic representation of ribosome dynamics: along the mRNA, ribosomes with the A site on codon capture the cognate tRNA with a rate
, then keep it and advance with a rate
, provided that the following codon is empty. (B) The entire translation process can be viewed as particles moving on a lattice. Ribosomes attempt to initiate the translation with a rate
. Then they move according to the dynamical rules introduced above and at the end of the lattice the ribosomes detach with a termination rate
. Particles can queue if the bottlenecks in the lattice cannot support the incoming flow.
Figure 2.
Distribution of the estimated initiation rates in S. cerevisiae.
The mean initiation rate is and the median is
. Most of the mRNAs have an estimated initiation rate
and therefore we show only this range.
Figure 3.
Scatter plot of the ORF lengths against the estimated
's.
The log-log scatter plot shows possible signatures of a power-law dependence.
Figure 4.
Outcomes for some mRNAs obtained by stochastic simulations of the model.
Panels (A) and (C) show a sketch of the two different behaviours one can obtain for the density of ribosomes and the current
, respectively. The genes are divided in two categories, according to the shape of
, as shown in (A): abrupt mRNAs (red, colour online) present a steep increase of the polysome size with increasing the initiation rate. On the other and, smooth sequences (blue, colour online) do not show this feature. The current (C) is also affected, with abrupt genes exhibiting a sudden change, or ‘kink’ in the current, while the current of smooth mRNAs does not suddenly saturate. Panels (B) and (D) show the outcome of numerical simulations of real sequences from S. cerevisiae. Genes YGL103W and YBL027W are ribosomal proteins while YHR030C and YBL105C are kinase regulatory proteins.
indicates the saturation value of the current (see text).
Figure 5.
Scatter plot of the estimated initiation rates versus the slope of the protein production rate
evaluated at
.
Both distributions of initiation rates and slopes have been subdivided in quartiles (dashed lines), defining 16 regions. Boxed annotations indicate those GO categories that are overrepresented in each quartile sector (P; GO process, F; function, C; component) with the P-value indicated as a power 10 exponent (E). The enrichments in each region are indicated in the square brackets as xx/yy, where ‘xx’ is the number of genes with that specific annotation and ‘yy’ the total number of genes in the region.
Figure 6.
Normalised histogram of the simulated protein production rates of the YPL106C randomised ensemble.
We constructed the randomised ensemble by shuffling the YPL106C codon choice at each sequence position, generating 2,000 different variants, each time keeping the amino acid sequence and overall codon composition constant. For example, for the chosen gene, CAI = 0.521. The value of the chosen initiation is .
Figure 7.
Scatter plots of different estimators of protein production rates.
(A) versus abundance of proteins. The mRNA abundances are from [30] and the experimentally measured protein levels from [51]. The plot shows a clear correlation between the model prediction of the amount of proteins in the cell and the experimental values. (B) CAI from [30] versus protein abundance. (C) and (D) show different variants of the tRNA adaptation index, tAIc and tAIp from [30], vs protein abundance. Our approach yields a better correlation between the predicted and measured protein abundance.
Figure 8.
Initiation rate: summary of the findings.
(A) For a given ‘physiological’ number of ribosomes we found mRNA-specific initiation rates, distributed over a broad range of values (Figure 2). Different regions of the distributions can be mapped to certain GO annotations. For example, mRNAs with small physiological initiation rate
are regulatory proteins while genes involved in translation have a larger initiation rate. (B) Changes in initiation (induced, for instance, by variations in the ribosomal pool, e.g. available ribosomes increase to a value of
) are estimated by our modelling and theoretically perceived by the transcript in different ways, according to their current-initiation relationship
. In particular, some mRNAs have a large gearing factor
, such as regulatory proteins, while other messengers, such as translation associated ones, are less sensitive to changes of the initiation rate. (C) For very large initiation rates the protein production rates reach a maximal elongation-limited value, i.e. only depending on the sequence of codons. We discover that translation associated genes have a larger maximal production rate when compared to other mRNAs, such as regulatory proteins, whose production might need to be capped. (D) In general we find two main groups of sequences classified according to their current-initiation relationship
. Abrupt sequences, usually regulatory proteins, present an abrupt ‘kink’ in
, meaning that the protein production rate can quickly saturate above specific values (sequence-dependent) of the initiation rate. Genes involved in translation like ribosomal proteins are instead classified as smooth sequences, since their sequences are such that this abrupt crossover does not exist.