Physical modeling of ribosomes along messenger RNA: Estimating kinetic parameters from ribosome profiling experiments using a ballistic model

Gene expression is the synthesis of proteins from the information encoded on DNA. One of the two main steps of gene expression is the translation of messenger RNA (mRNA) into polypeptide sequences of amino acids. Here, by taking into account mRNA degradation, we model the motion of ribosomes along mRNA with a ballistic model where particles advance along a filament without excluded volume interactions. Unidirectional models of transport have previously been used to fit the average density of ribosomes obtained by the experimental ribo-sequencing (Ribo-seq) technique in order to obtain the kinetic rates. The degradation rate is not, however, accounted for and experimental data from different experiments are needed to have enough parameters for the fit. Here, we propose an entirely novel experimental setup and theoretical framework consisting in splitting the mRNAs into categories depending on the number of ribosomes from one to four. We solve analytically the ballistic model for a fixed number of ribosomes per mRNA, study the different regimes of degradation, and propose a criterion for the quality of the inverse fit. The proposed method provides a high sensitivity to the mRNA degradation rate. The additional equations coming from using the monosome (single ribosome) and polysome (arbitrary number) ribo-seq profiles enable us to determine all the kinetic rates in terms of the experimentally accessible mRNA degradation rate.


Introduction
The translation of messenger RNA (mRNA) is, together with DNA transcription, one of the two main steps of gene expression.The information encoded into mRNAs is translated into a polypeptidic chain through the action of ribosomes processing along mRNA.Ribosomes match a triplet of nucleic acids (a codon) to the corresponding amino acid.Different kinetic rates are involved during translation: the initiation rate, i.e. the rate at which a ribosome assembles at the start codon of mRNA and starts the translation, the termination rate, i.e. the rate at which the ribosome releases the protein and disassembles from mRNA when it achieves translation, and the hopping rates from codon to codon (elongation rates) which depend on the translated codon (and thus on the position along the transcript) (1).
Despite its importance in gene expression, and more generally in the metabolism of the cell, much remains to be learned about the kinetics of translation.The onset of the highthroughput technique known as ribosome-sequencing (Riboseq) gives access nowadays to genome-wide ribosome density profiles along mRNA (2).Ribo-seq is an experimental technique based on the sequencing of ribosome-protected fragments of mRNA after fixation of ribosomes and degradation of unprotected mRNA.Although this approach has revolutionized the study of translation, one still needs a deeper physical understanding of the molecular processes in order to turn Ribo-seq data into quantitative modeling tools.Ribo-seq data could then be used to probe the microscopic parameters tuning gene expression at the translational level and to understand the role of deregulation of translation in diseases like cancer (3).
Several approaches have been proposed to describe quantitatively Ribo-seq data by means of physical modeling involving one dimensional transport (2,(4)(5)(6)(7)(8)(9).However, fitting the kinetic model parameters from Ribo-seq data currently faces several major limitations.Firstly, estimating kinetic rates from ribosome profiles is hindered by the lack of sufficient independent experimental data.Since this is a typical inverse problem (10), parameters can be under/over-determined creating technical difficulties for making reliable fits.Secondly, most quantitative models assume that mRNAs have infinite lifetimes.The question therefore arises concerning the possible influence of finite mRNA lifetime on the evaluation of ribosome kinetic rates (4)(5)(6).
In this article, we model the motion of ribosomes along mRNA using a ballistic model first introduced by Valleriani et al. (4) of unidirectional transport along a filament including finite mRNA lifetimes.We go beyond previous approaches, however, by modeling a new experimental procedure where the mRNA population is sorted out into k-somes having a fixed number k of ribosomes, going from 1 to 4, present at the instant of the sequencing.Our goal is to demonstrate how to exploit this experimental data to establish a reliable method for an absolute evaluation of kinetic parameters (i.e., in time units).
Heyer et al. (11) previously presented an interesting pioneering Ribo-seq study on Saccharomyces cerevisiae that was restricted to monosomes and polysomes.According to these authors mRNA translation could in general be divided into three classes.Although they discuss the influence of mRNA lifetime and conclude that their data reveal a relationship between it and the monosome occupancy, they do not propose a mechanistic explanation, but simply note that genes enriched in monosomes had a lower median mRNA half-life than those without enrichment.Despite this finding they do not attempt to integrate mRNA lifetime into their classification.We present here a more quantitative analysis that reveals the crucial importance of mRNA lifetime in classifying genes.Our main result is that although mRNA lifetime does not affect the polysome footprint, it does have a strong impact on the monosome one, despite the low probability of having a monosome.
The paper is organized as follows.In the second section, we define the ballistic model coupled to an mRNA degradation process.In the third section, we solve it analytically by calculating the exact k-some probability distribution and density profiles as functions of the kinetic parameters.In the fourth section, we perform a parametric analysis of these quantities showing that: 1) the smaller the value of k, the more these quantities are sensitive to mRNA degradation; 2) for typical biological values of the parameters, low order k−somes are more sensitive to mRNA degradation than the full population of mRNAs (polysomes).In the fifth section, we use the model to analyse real Ribo-seq data obtained from a human histonic gene.For this particular gene, we give -for the first time to our knowledge -evidence of degradation on Ribo-seq data.These results validate our approach and provide motivation

D R A F T
for setting up and testing a quantitative method in the sixth section.That method rests on combining k-some (monosome actually) data with polysome data while taking advantage of the sharply increased sensitivity of mRNA degradation on low order k-somes with respect to polysomes.As a proof-ofconcept, we first apply our fitting method to data generated from the ballistic model itself.By using the experimentally accessible degradation rate to fix the time scale, our approach leads to an improved method for obtaining absolute kinetic rates of ribosomes.In the seventh section, we rationalize our findings by proposing a global picture of the different dynamical regimes in the form of a phase diagram based on the ballistic model.It presents a condensed global view of the different mRNA degradation dependent regimes and allows at a glance a general qualitative analysis for any given gene.We model the translation of mRNA by ribosomes, and the degradation of mRNA, by means of a ballistic model with degradation as sketched in Figure 1.The mRNA is modeled by a filament of length L where ribosomes are treated as point-like particles.i.e. without excluded volume interactions.This is motivated by the low density of ribosomes on mRNA.Particles get attached at the entrance of the filament with a stochastic rate α (initiation rate at the start codon) and subsequently progress in one direction along the filament with a local deterministic velocity p(x) (related to the local elongation rate), that depends on the position x along the transcript.We shall simply use p(x) = p for a constant (codon independent) hopping rate.Particles leave the filament with velocity p(L) (termination rate at the stop codon).We emphasize that our model does not account for excluded volume interactions, thus traffic jam effects induced in the Totally Asymmetric Simple Exclusion Process -TASEP (12), the paradigmatic model for translation modeling, are not possible.The ballistic model, however, remains a good approximation to TASEP for long enough mRNA lifetimes in the low density regime, which is the relevant regime for translation (13).

Definition of the ballistic model and k-somes
Let T (x) be the time for a ribosome to go from genomic coordinate 0 to x.Since the ballistic trajectory of a ribosome, X0(t), obeys dX0(t)/dt = p(x), we obtain . [1] The particular value T (L) corresponds to the time needed for a ribosome to cross the entire mRNA.This quantity plays an important role in the determination of the average density profiles of ribosomes along mRNA.
Our model includes the degradation of mRNA at a constant rate ω as in Ref. (4).For the sake of simplicity, we assume instantaneous mRNA degradation -translation is therefore instantaneously aborted -but other, more progressive, mechanisms exist (4,6).We assume that mRNAs are synthesized and degraded in the cytosol in such a way that their population is constant on average.In what follows, we shall refer to the "age", a, of an mRNA as the time elapsed between its synthesis and the time at which the Ribo-seq experiment is actually done.The mRNA lifetime θ is modeled by a random variable with probability density function φ θ (θ).The assumption of stationarity of the mRNA population leads to a relation between the mRNA age distribution, φa(a), and the lifetime distribution φ θ (θ) ( 14) where θ = ∞ 0 θφ θ (θ)dθ is the average lifetime of an mRNA.As in (4), we shall consider an exponential mRNA lifetime distribution φ θ (θ) = ωe −ωθ , θ ≥ 0, with average lifetime θ = ω −1 .The mRNA age distribution is thus also exponential and can be written as φa(a) = ωe −ωa , a ≥ 0 . [3] In a biological cell, mRNA can be found in different states: newly synthesized mRNA empty of ribosomes, mRNA in a transient state evolving from an empty to stationary state, and finally mRNA in the stationary state of translation.We divide the whole population of mRNA into sub-populations, denoted k-somes containing exactly k ribosomes.The first four categories containing from one to four ribosomes, respectively, are called monosomes, disomes, trisomes and tetrasomes.
We will show in the next sections that these k-somes are more sensitive to mRNA degradation than the whole population of intracellular mRNAs (polysomes) and that these independent density profiles provide sufficient information for estimating all relative translation kinetic rates in a single experiment.

Theoretical results
In this section we present an analysis of the k−somes within the framework of the ballistic model.
A. Distribution of the k-some population.The number of ribosomes on a particular mRNA depends on its age a: if a < T (L), ribosomes present on the mRNA at the time of the experiment have been loaded during the time interval a preceding the experiment while for a > T (L), they have been loaded during the time interval T (L) before the experiment.The loading process being of the Poisson type, the probability that an mRNA of age a be loaded with k ribosomes is therefore given by L) , a ≥ T (L) . [4] Using Eq. ( 3), the probability P k that an mRNA be a k-some irrespective of its age is then given by (see SI, section 3.A, for

D R A F T
an explicit analytical expression) B. k-some density profile along mRNA.Another important result about Poisson processes is that, knowing that k events have occurred during a given time interval, the probability density of their occurrence times is uniform on that time interval (see for example Ref. (15) for a rigorous statement of that theorem).Given the one-to-one correspondence between the ribosome initiation time and its location at the time of the Ribo-seq experiment, using the ribosome equation of motion, dT (x)/dx = 1/p(x), the density profile ρ k (x|a) of a k-some knowing that its age is a can be written as , a ≥ T (L) , [6] where H(x) stands for the Heaviside function.In the first line of Eq. ( 6), x(a) is the maximal distance that a ribosome can possibly travel along an mRNA with an age a < T (L).
It is given implicitly by a = x(a) 0 dy/p(y).Eq. ( 6) is easily interpreted: for recently synthesized mRNA such that a < T (L), a ribosome reaches at most the distance x(a) along the mRNA.The density is therefore zero beyond that limit (hence the Heaviside function) and the system behaves as if the mRNA had an effective length x(a) < L.Moreover, the ribosome density is proportional to the time ribosomes spend at location x which is inversely proportional to their local velocity p(x).
According to Eq. ( 6), it is possible to distinguish two states for mRNA with translating ribosomes: a transient state, for which young mRNA with a < T (L) have densities ρ k (x|a) that depend explicitly on the age a, and a stationary state for which a > T (L) and ρ k (x|a) no longer depends explicitly on a.We will see later that the proportion of mRNA in one state versus the other is critical in characterizing the age-averaged density.Integrating any of these densities over the entire mRNA length yields, as expected, the number of ribosomes, k.
The density profile of a k-some irrespective of mRNA age -which is experimentally accessible for small values of k -is given by where p(a|k) is the age distribution among k-somes given by Bayes theorem, p(a|k) = φa(a)P (k|a)/P k .The analytical expression for ρ k (x) is somewhat complex and shall be provided and analysed in section 4.C.2.

C. Polysome density profile along mRNA.
The ribosome density profile obtained for the entire mRNA population (polysomes) may be calculated as the average over k of the k-some density profiles Integrating over x leads to the average number of ribosomes k loaded on an mRNA The two former equations, Eq. ( 8) and Eq. ( 9), are in agreement with Ref. (4), showing the consistency of our approach.Note that the result Eq. ( 9) can also be obtained from k = ∞ k=0 kP k .Note also that, in the limit of infinite lifetime (ω = 0), the polysome density Eq. ( 8) becomes ρ ∞ (x) = α/p(x), a result that simply expresses the conservation of the local particle current (j(x) = ρ(x)v(x) = cst) in the stationary regime, and that can indeed be derived in the TASEP low density phase (16).

Analysis of finite lifetime effects
In this section we rationalize the parametric dependence of the previous expressions in the relevant physical regimes.We shall use T (L) as a time scale throughout the rest of the paper.This allows us to define dimensionless rates (with tildes) as original ones multiplied by T (L) α = αT (L) ; ω = ωT (L) ; p(x) = p(x)T (L) . [10] A. Values of kinetic rates in various organisms.Since we are studying the effects of finite mRNA lifetime, we report in table 1 some literature values for the kinetic parameters and in table 2 their dimensionless counterparts that prove essential in determining typical biological regimes.We note that typical mRNA lifetimes for different species can span two orders of magnitude, while molecular quantities vary significantly less (e.g. a factor 7 for p, a factor 2 for L, a factor 10 for T (L)).B. Polysomes are not sensitive to mRNA degradation.In this subsection, we discuss the effect of degradation on the whole population of mRNA.In doing so, we shall define f (ω) as the average number of ribosomes k normalized by the same quantity in the infinite lifetime limit, k ∞ = α (see Eq.( 9)): [11]

D R A F T
This function can be interpreted in terms of a birth and death process (see SI, section 2).It only depends on ω and is plotted in Figure 2  As expected, f (ω) decreases with increasing ω: the average number of ribosomes per mRNA decreases as the mRNA lifetime becomes shorter.Nonetheless, biological values from the literature presented in Tab. 1 and 2 show that in yeast and bacteria, ω ∼ 10 −2 , while in mammals ω ∼ 10 −3 .Therefore, ω 1 for the majority of genes and f (ω) ∼ 1 − ω/2 remains very close to unity, showing that degradation has very little impact on k , the most relevant parameter characterizing polysomes.
To strengthen even more this argument, let us discuss the polysome density profile, ρ(x), which is one of the experimentally accessible quantities measured in Ribo-seq experiments, and compare it to its infinite lifetime limit, ρ ∞ (x) = α/p(x).We see from Eq. ( 8) that the influence of mRNA degradation on ρ(x) is maximal at the last site (i.e., x = L), which leads to the following bound where the approximation holds for ω 1.For ω = 10 −1 , an upper bound of the values indicated in table 2, replacing the polysome density by its infinite lifetime limit would lead to a relative error of no more than 10%.Therefore, no modifications due to mRNA degradation should be detectable in polysome Ribo-seq experiments.This analysis shows that previous works (2, 8) estimating the hopping rates in S. cerevisiae from polysomes with a model that does not take into account the mRNA finite lifetime provide a valid approximation.At variance, we shall demonstrate in the next sections that, in the same conditions, k-somes are more sensitive to degradation than polysomes and constitute a key tool to determining kinetic rates.
C. k-some analysis.C.1.k-some distribution.In terms of the dimensionless parameters α and ω, the probability P k (see SI, section 3.A) that an mRNA be a k-some writes where details about the incomplete Gamma function, γ(k, z), may be found in the SI, section 1.In Figure 3, this distribution is plotted for typical values of the parameters: T (L) = 200s, L = 100 codons, α = 0.06s −1 ( α = 12) and different values of ω between 0 (no degradation) and 1 (high degradation).A log-lin scale is used in the main graph to better observe the evolution of the k-some population with small k.As ω increases, we see that the populations of empty mRNA (zerosomes) and of the mono-through tetrasome increase quasi linearly with ω and that the distribution progressively displays a plateau for low values of k.This result is best understood by carrying out a linear expansion of P k in ω for α k: For values of α ∼ 10, the first term of this expansion, P ∞ kthat is of the Poisson type, as expected given the nature of the ribosome loading process -remains very small compared to the linear term in ω.We therefore have P k ∼ ω/ α, a result both linear in ω and independent of k that explains the observed plateau.For ω < 10 −1 , when k increases, however, the k-some distribution becomes poissonian again and P k ∼ P ∞ k , as we may observe in Fig. 3.

D R A F T
In the inset, the same data are plotted on a lin-lin scale and with a broader range for k.For ω < 10 −1 , we observe a nearly symmetric Gaussian distribution that mainly corresponds to the Poisson term P ∞ k .It is centered around a maximum close to k = k ∞ = α = 12, the value of the average number of ribosomes in the infinite lifetime limit.As expected, most mRNAs are then loaded with that number of ribosomes when ω < 10 −1 .For higher values of ω, however, typically when ω > ωinv where ωinv = ln α 2π + 1 2 (see SI, section 3.D), an inversion of the populations occurs: the proportion of small order k-somes takes over the population of those for which k ∼ α (local maximum).This result corresponds to a regime where mRNAs do not live long enough for ribosomes to reach the exit, hence reducing their number and favoring k-some populations with small k.For biologically relevant values ( α ∼ 10), ωinv is on the order of unity, in good agreement with the value ω ∼ 1 for which the polysome density is significantly impacted by degradation (see Eq. ( 12)).
C.2. k-some densities.One might suspect from the previous results that, due to their statistical prominence when ω ∼ 1, mono-through tetrasomes (k-somes with small k) might play a decisive role in the determination of the kinetic parameters only once that condition is fulfilled.This is however not true for, as we shall see, k-some densities become sensitive to mRNA degradation for smaller values of ω than those ensuring their dominance.Therefore, inasmuch as a statistically relevant sample of them (say thousands) is available in Riboseq data, densities may be analysed to determine the kinetic parameters.Hundreds of millions of mRNA sequences are typically involved in Ribo-seq experiments.This means that for α ∼ 10 and as soon as ω > 10 −4 , thousands of monosomes and even more of k-somes with k > 1 are available for analysis, enough to reach a good statistical accuracy.
The first one represents the contribution to the k-some density of mRNAs that have reached their stationary state of translation in the sense that a ribosome may have had the chance to exit them prior to the experiment.We shall refer to that term as S k (x).The second term corresponds to mRNAs that are still in their transient state of translation: the time elapsed between their synthesis and the Ribo-seq experiment is too short for a ribosome to reach their exit codon.We shall call that term F k (x).
To quantify the influence of mRNA degradation on k-somes, we shall define a transient to stationary mRNA density ratio as R k (x) = F k (x)/S k (x).A mathematical analysis of this ratio is done in the SI, section 5. Its maximal value, achieved at x = 0, is given by where P ∞ k is given in Eq. ( 14).The last approximation is valid for small values of ω and α k but the exact expression, valid for all values of α and ω, allows us to analyse the impact of degradation in the entire parameter space.The fact that R k (x) achieves a maximal value at x = 0 means that k-some densities are more sensitive to degradation near the entrance site than towards the end, at variance with polysomes for which it is the opposite (see section B).We shall see below that the approximate value of R k (0) appears as a natural parameter in the comparison of the k-some density to its infinite lifetime limit.
To illustrate the global trends of k-some densities without further complications, we shall now assume a constant (codon independent) elongation rate * , p(x) = p.In that case, the dimensionless time is simply τ (x) = x/L and the sole term of Eq. ( 15) that depends on x is F k (x).This term is responsible for the global exponential decay of the densities observed in Figure 4 where ρ k (x) has been plotted for k = 1 to 4. In that figure, density profiles in solid lines are given by Eq. ( 15) (the discrete version of which can be found in the SI, section 6) with L =100 codons, α = 0.06s −1 , p = 0.5s −1 and thus T (L) = 200s.Dimensionless parameters are thus α = 12 and p = 100.
In each graph, dashed lines represent the density profiles for an infinite mRNA lifetime (ω = 0) which are flat with magnitude ρ ∞ k = k/L.In that same limit, the polysome density is also flat, with magnitude ρ ∞ = α/p.As an important consequence, when ω = 0, all k-some and the polysome profiles are linearly related.We shall see in section 5 that this precludes the determination of all kinetic parameters in that limit.
From top to bottom, the values of ω used in the successive panels are spaced by a factor of two orders of magnitude: ω = 10 −4 , ω = 10 −2 and ω = 1 corresponding to 555h, 5h and 3 min lifetimes, respectively.Below, we briefly describe the main features of the densities ρ k (x).A more detailed analysis is provided in the SI, section 4.
In the first case, all density profiles are close to the ω = 0 limit and we barely notice a small deviation for monosomes, at the start of the filament.Thus, in this regime of low degradation, we do not expect k-somes to bring more information than the polysomes in a usual Ribo-seq experiment.
In the second case, we do not see any effect on the polysome profile, but do see a clear effect on the k-somes.Note that, as k-some profiles are no longer linearly related, a fit of the kinetic rates is now possible.This decoupling indeed provides extra information for determining the unknown model parameters.Importantly, this intermediate regime (ω ∼ 10 −2 ) falls into the range of biological values.Additionally, we observe that the ordering of k-some densities change with the position along the mRNA: while densities obey ρ1 > ρ2 > • • • > ρ4 close to the initiation site, they switch in reverse order after a few sites.
In the last case, degradation is high (ω = 1).All density profiles are modified, even the polysomes.This regime is probably not relevant for typical translation processes, but we describe it for the sake completeness and because it might be relevant under extreme cellular conditions.
Let us now turn to the sensitivity of k-some density profiles to mRNA degradation.Taking advantage of the smallness of ω

D R A F T
compared to other parameters, we carry out a linear expansion of ρ k (x) in ω.Yet, as we may observe in Fig. 4, k-some densities are more sensitive to degradation near the ribosome initiation site (x = 0) than further along the mRNA.More details about the subtle question of the position dependent sensitivity of k-some densities are provided in the SI, section 4.B, where the intersection of the k-some density profiles with their infinite lifetime limit (flat profiles) is analysed.Hereafter, we evaluate their sensitivity to ω by calculating their relative difference with their infinite lifetime limit at x = 0.For α k, we obtain Thus, in the biologically relevant limit of small ω values, the transient to stationary mRNA ratio R k (0) controls the impact of degradation on k-some densities near the initiation site.
In conclusion, the ballistic model mainly displays three regimes of degradation: (i) low degradation (LD), where ksomes and polysomes are linearly related and where k-somes are therefore not expected to provide more information than polysomes; (ii) an intermediate regime (ID), into which some genes are likely to fall, where k-some densities are already sensitive to degradation and no longer linearly related while polysomes are still unaffected; (iii) a regime of high degradation (HD), not relevant for typical translation processes, in which all profiles are strongly modified by degradation, polysomes included.

Comparison with experiments
A. Ribo-seq experiments with k−somes.In this section, we compare the phenomenology of the ballistic model for k-somes to experimental data for a gene coding for histones.This comparison will show that the effect of degradation is visible in certain genes and the sensitivity to degradation is enhanced for k-somes with respect to polysomes, thus justifying the introduction of this new experimental setup.
Figure 5 displays the k-some densities, ρ k (for k=1 to 4), versus the genomic coordinate obtained from Ribo-seq profiling experiments for the histone gene HIST1H2BM (symbols) and from a fit of these data using the homogeneous ballistic model (solid lines).
The polysomes were fractionated following the method of ( 26) in order to separate the mRNAs bound either to one, two, three or four ribosomes.This yields four subsets of mRNA respectively termed mono-, di-, tri-, and tetra-somes (Figure 5(A)).For each fraction, we extracted all mRNA segments covered by a ribosome (called Ribosome Protected Fragments (RPF)), and prepared a library for sequencing as described in (27).Then short read sequencing was performed as for RNA-seq using an Illumina platform.The sequencing data for each library (mono-, di-, tri-, and tetra-somes), which contains raw sequencing reads, are processed the same way, one independently of the others (using a method adapted from ( 28)).For a given mRNA/gene, the goal of bioinformatic processing is to tabulate how many reads cover a given position in the mRNA coding sequence.After cleaning, each read sequence is aligned to the mRNA sequence to determine the position of origin of the RPF -a classical task of sequence comparison called mapping -using efficient software (29).Only reads having a high quality alignment and a unique origin  15): monosome (blue), disome (orange), trisome (green) and tetrasome (red).Black curves correspond to polysome densities.Parameters are α = 0.06s −1 , p = 1/2s −1 and L = 100 codons ( α = 12).Solid lines are obtained from Eq. ( 15) for increasing finite values of ω = 5.10 −7 s −1 (ω = 10 −4 ), ω = 5.10 −5 s −1 (ω = 10 −2 ) and ω = 5.10 −3 s −1 (ω = 1) from top to bottom; while dashed lines are obtained for ω = 0s −1 (infinite mRNA lifetime).location are further processed.One records the matching position of the first base of the read.On the first position of each read, we applied a shift of 12 nucleotides to assign the read to the P-site position of ribosomes (28).It has been observed from previous analyses that 12 nucleotides is by far the most frequent distance between the read start and the P-site (30), which determines the position of the codon being translated.Finally, we record the counts for each P-site covered position of the mRNA in a table.
In the intermediate panel of Fig. 5, Ribo-seq data (cross symbols) have been subsequently smoothed over 19 codons by averaging the signal within a sliding bin and normalized such that the integral of each k-some density ρ k corresponds to k.As the genomic coordinate increases, we observe a RNA populations were separated on a 15-50% sucrose gradient under 35,000rpm centrifugation for 3.5h.Free RNA, 40S ribosomal subunit, 60S ribosomal subunit, mRNA charged with 1,2,3. . .ribosomes were separated through a 254 nm UV spectrometer and collected with an ISCO fractionation system.Fractions corresponding to mRNA charged with 1,2,3 or 4 ribosomes were pooled and subjected to RPF extraction and cDNA construction.(middle) k-some densities ρ k (k=1 to 4) obtained from the homogeneous ballistic model (solid lines) and from Ribo-seq profiling experiments (symbols) are plotted versus the genomic coordinate (in codon unit).The polysome density is plotted in black for the ballistic model only (lack of experimental data).Experimental data correspond to the histone gene HIST1H2BM and the fit is based on the ballistic model (see text for details) with fixed parameters, α = 0.06s −1 and ω = 1/60 min −1 ≈ 3.10 −4 s −1 .The fit then yields T (L) = 178s and hence p = 0.7 codons/s.(bottom) Same k-some densities are normalized to 1, i.e. ρk = ρ k /k, in order to compare the discrepancies with each-other and show that we retrieve the characteristic crossing between ballistic k-somes curves.

D R A F T
global exponential-like decay of the densities, a phenomenon predicted by the ballistic model when mRNA degradation is taken into account (see Fig. 4).Furthermore, another clear signature of mRNA degradation may be observed in the lowest panel of Fig. 5 where we display the Ribo-seq k-some densities normalized to unity.These profiles show clearly that near the entrance of the mRNA (over a genomic distance of 30-40 codons), k-some density profiles decrease with increasing k while they increase with k beyond that distance.This crossing of the k-some density curves is another prediction of the ballistic model in the presence of mRNA degradation.
Encouraged by the similarities between k-some Ribo-seq data and the corresponding ballistic trends noted above, we shall now try to compare them more quantitatively.We shall do so by restricting this first comparison to the homogeneous version of the ballistic model, i.e., p(x) constant independent of the codon, and to smoothed experimental data.This choice is partly dictated by its simplicity of implementation, but also and foremost, by the lack of polysome data in the Ribo-seq set available for this histone gene (polysome data prove to be crucial in determining the codon-dependent elongation rates, pi, as we shall explain in the next section).In their absence we choose to fit an average (and thus constant) elongation rate p while fixing the two other parameters, α and ω, to reasonable values found in the literature.To do so, we use the specific half-life of histone mRNA (τ 1/2 = 40 min) to obtain (by definition) ω = ln(2)/τ 1/2 ≈ 1/60 min −1 ≈ 3.10 −4 s −1 (31) and use the initiation rate for Homo sapiens given in (25): α = 0.06s −1 .Finally, p is estimated through T (L) = L/p in order to optimize the fit for all four studied values of k.We find † T (L) = 178 s, a value that leads to an average number of ribosomes on the mRNA in the stationary regime k = αT (L) = 0.06 × 178 ≈ 10.
Overall, we observe that the qualitative trends exhibited by the Ribo-seq data are described by the ballistic model and that the inversion of the k−some density profiles near the start of the mRNA gives a clear signature that translation in these experiments is influenced by mRNA degradation.Finally, in contrast to the k-somes, the theoretical density for polysomes (black solid line in the intermediate panel of Fig. 5), which has no experimental counterpart here, is nearly flat and barely influenced by degradation: this is in agreement with the regime of intermediate degradation presented in Fig. 4.This result provides strong further motivation for the need to split the population of mRNA into k−somes in order to increase the sensitivity to mRNA degradation.Note that from the analysis of f (ω) done in section B, it was expected that in the present case mRNA degradation would have little influence on the polysomes density since ω = 3.10 −4 × 178 ≈ 0.05 1.Finally, we conclude this section by remarking the need to introduce an inhomogeneous p(x) to explain the strong and systematic fluctuations along the profiles that cannot be accounted for by experimental fluctuations.In the absence of polysome data to complete the k-some set, we deem it premature to attempt such a fit here (experiments in progress).

Fitting the kinetic rates from ribosome density profiles
A. Limitations in using solely polysomes to fit kinetic rates.
Usual Ribo-seq experiments involve polysomes only and their modeling does not account for the mRNA degradation rate, ω.This is justified inasmuch as, for most living cells, the polysome density is barely sensitive to degradation and this approximation does not lead to significant errors in the estimation † This value is the average of T (L) values estimated for specific values of k: T (L) = 147 s for monosomes, 168 s for disomes, 190 s for trisomes and 206 s for tetrasomes.Given the crudeness of this first approach where the codon-dependency of the elongation rates is not taken into account, the dispersion of these results seems reasonable.

D R A F T
of the elongation rates.Yet, the main problem in estimating the kinetic rates from polysomes alone is that there are more unknowns than equations (or equivalently, more model parameters than experimentally measured quantities).Considering henceforth the inherent discreteness of the mRNA, we shall now use pi as the elongation rate at site (codon) i ∈ {1, . . ., L} and ρi for the polysome density at that site instead of their continuous version.Now, even if we assumed the degradation rate ω to be known from independent experiments, we would still need to estimate (L + 1) parameters: the set of {pi} and the initiation rate α.As the polysome density ρi, on the other hand, only provides L data, we are left with an extra unknown.
Another problem in estimating the kinetic parameters is that the number of mRNAs from which the Ribo-seq profile is obtained is not known, which prevents a proper normalization of the Ribo-seq density.A parametric estimation must therefore rely on a density ρ(x) normalized to unity, L 0 ρ(x)dx = 1, instead of k .This density is given by ρ 1 − e −ωT (L) . [18] Although this density no longer depends on α, this does not solve the problem, since the new normalisation constraint, whose discrete version is L i=1 ρ(i) = 1, also removes one equation for the input parameters.A strategy found in the literature to circumvent this problem is to normalize Ribo-seq profiles by a mean ribosome velocity measured from independent experiments (2).This method suffers from several drawbacks, however: (i) the ribosome velocity that is used is averaged over an entire genome instead of being gene specific, (ii) results are typically obtained for species different from the one involved in the Ribo-seq experiment and (iii) experiments are done independently in different conditions.Altogether, the normalization obtained from this procedure is expected to be subject to large uncertainties.B. k-some improved fit of kinetic rates.Half-lives (τ 1/2 ) of individual mRNAs have been successfully measured in mouse embryonic stem cells (19).By applying the same method, any mRNA half-life can be measured as well.We shall henceforth consider ω = ln(2)/τ 1/2 as a known kinetic time scale.The normalised polysome density does not depend on α (Eq.18), but on the pi's only and, as we have shown, it is not strongly affected by mRNA degradation.It therefore constitutes a suitable quantity to determine the elongation rates, pi, independently of the other parameters.Furthermore, in section 4.C.2, we have shown that a k-some Ribo-seq data analysis provides additional information on the kinetic parameters, especially when the mRNA lifetime decreases for, in that case, k-some densities become less and less linearly related.Yet, when the mRNA degradation becomes too important (large ω), k-some density profiles vanish near the end of the mRNA rendering their analysis quite difficult.The same is true of polysome densities.This, however, happens for parameters far outside the biological range.
In a first approach, we therefore decide to combine the information provided by both the polysome normalized density and the monosome density since among k-some profiles, it is the most sensitive to the mRNA lifetime (see section 4.C.2 and SI, section 5).C. Optimizing the fit with respect to the degradation rate.We now implement the strategy outlined above by detailing how to fit the initiation rate α and the elongation rates pi of all the codons constituting the mRNA.To find these kinetic parameters, we minimize the quadratic difference between the ballistic model predictions and, simultaneously, the Ribo-seq data for the normalized (to one) polysome density and the monosome density.To be more specific, we minimise the quantity where χ 2 polysome is the sum of the relative quadratic errors of the normalized polysome density on each codon : and where χ 2 1−some has a similar form for monosomes.The minimisation, performed by the L-BFGS algorithm (an adaptation of the Broyden-Fletcher-Goldfarb-Shanno quasi-Newton method for using a limited amount of computer memory) encoded in the Python 3 library Scipy (32), yields at once α and the L parameters pi.
As we shall see, the accuracy of our fitting method improves when the values of the ratio R1(0) = ω e ω+ α − 1 /(ω + α) [21] (see Eq. ( 16) and SI section 5) increase.As a first feasibility test, the minimisation procedure is performed on artificial data created by the ballistic model itself.The goal is to measure its accuracy in recovering the known input parameters as a function of R1(0).The relative error between a parameter value found by minimisation, Y * , and its exact value, Y , is defined by We call it a "score function".Beside Errα, and to avoid the wealth of score functions related to every single elongation rate pi, we shall estimate two quantities: an average score function Errmean = i Errp i /L, and the maximum of them all, Errmax = maxi Errp i .
To test our minimisation procedure, we fix α = 0.08 s −1 .We then generate ten sets of L random pi values drawn from D R A F T (ID) regime, the finite mRNA lifetime has a strong impact on k−somes densities, but not on polysome densities (and therefore little impact on k ); in the high degradation (HD) regime, all densities, as well as k , are affected by the finite mRNA lifetime.The HD regime is delimited by ω = 1 for which the time for a ribosome to cross the mRNA, T (L), is equal to the inverse of the degradation rate ω.This limit probably does not correspond to any usual biological regime, which leads us to the conclusion that polysome densities should probably never be affected by finite mRNA lifetime (except perhaps in rare extreme conditions).
Given that the ID regime is the biologically relevant one, we can base a viable fitting strategy on the monosome and normalized polysome densities alone.Because the finite lifetime strongly affects the monosome density, and only slightly the polysome one, the two densities give access to complementary information.Whereas the monosome density exhibits the typical shape expected from finite mRNA lifetime (exponentially decreasing profiles) depending on α, ω and T (L), the normalized polysome density is mainly sensitive to pi variations.Precisely, the polysome density no more depend on α when being normalized, and as ω is measured by another experiment, unknowns are only pi.Furthermore, we have identified a quantity directly related to the effectiveness of the method, namely the ratio of transient to stationary monosome densities, R1(0), Eq. ( 21).Since the experiment presented in this article involve measuring the density profiles for both the polysomes and the k-somes with k = 1, . . ., 4, the model predictions for the disomes, trisomes, and tetrasomes (not used in the fitting procedure) provide a test of the consistency of the modeling approach we propose in ID regime.
Using our ballistic model, we have analyzed preliminary Ribo-seq data for a human histonic gene.It is clear from this analysis that the Ribo-seq data display the same phenomenology predicted by the ballistic model in the ID regime: a decrease of the k−some Ribo-seq profiles at the end of the mRNA due to the degradation of mRNA in the late stages of translation.Thus we show for the first time the effect of degradation on Ribo-seq data.
In Fig. 7, we present a phase diagram of the ballistic model in the ( α, ω) plane as a summary of the main results of the finite lifetime effect study and the fitting method.This phase diagram displays the three regimes: LD, ID and HD, the ID regime considered as the biological regime as well as the regime where our fitting method is effective.
In the modeling of real experimental data it will also be important to evaluate the influence of experimental noise on the reliability of our fitting method.It would be interesting to look at the effect of degradation with finite resources of ribosomes (33) and finite diffusion of ribosomes (34,35).

Fig. 1 .
Fig. 1.Graphical illustration of ribosome transport with the ballistic model including degradation.The mRNA is represented as a red filament of length L. The ribosomes are point-like particles processing along the filament with a position dependent rate p(x) (where x ∈ [0; L] is the position along the filament).Bound and unbound ribosomes are respectively represented by dark blue and light blue discs.A ribosome can bind at the entry of the filament at a rate α.A ribosome located at the exit unbinds at a rate p(L) = β.The mRNAs get degraded at a rate ω.

Fig. 3 .
Fig.3.The k-some population distribution P k for finite values of ω (from 0 to 1) and other parameters fixed: T (L) = 200s, L = 100 codons and α = 0.06s −1 .The value k = α = 12 corresponds to the maximum of the distribution at ω = 0.In the main graph (lin-log scale) we can appreciate the increase of the population of empty mRNA (k = 0) and small value of k as ω increases.In the inset (lin-lin), we can see that the two maxima (at k = α and k = 0) equate for ω = 0.8.Expression(14) for P k provides an excellent approximation for k ≤ 4, except for the highest two values of ω (for which the approximation remains good with less than 20% error for k ≤ 1).

Fig. 6 .
Fig.6.Score functions Eq. (22) of the fit with the ballistic model when α = 0.08s −1 and for 10 ballistic density profiles obtained by a random drawing from a uniform distribution of the elongation rates pi ∈ [0.2, 4[ s −1 (pale colors).Results averaged over the 10 profiles are displayed in vivid colors.R1(0), Eq. (21), is increased by increasing ω and T (L) 81.3 s on average.

Table 1 . Translation kinetic rates and mRNA properties in bacteria, yeast and mammals. Medians of gene length L are calculated from the entire genomes set available on the NCBI database. mRNA half- lives
t 1/2 (17-20), elongation rates p (21-24) and initiation rates α (8, 25) are taken from the literature.T (L) is approximated by L/p (

Table 2 . Calculated dimensionless parameters in bacteria, yeast and mammals from parameters taken from TABLE 1. The quantity
R 1 (0) will be defined in the next section, see Eq.(21).