This is an uncorrected proof.
Figures
Abstract
The process of translation is both energetically costly and relatively error-prone compared to transcription and replication. Nonsense errors during translation occur when a ribosome drops off a transcript before reaching a stop codon, resulting in energetic investment in an incomplete and likely non-functional protein. Nonsense errors impose a potentially significant energy burden on the cell, making it critical to quantify their frequency and energetic cost. Here, we present a model of ribosome movement for estimating protein production, elongation, and nonsense error rates from high-throughput ribosome profiling data. Applying this model to an exemplary ribosome profiling dataset in S. cerevisiae, we find that nonsense error rates vary substantially between codons and that these types of errors place an energetic burden on cells comparable to ribosome pausing. Overall, we present multiple lines of evidence that selection against nonsense errors is a prominent force shaping protein-coding sequence evolution and codon usage bias, in particular.
Author summary
The process of translating mRNA into a protein is both energetically expensive and relatively error-prone. As such, natural selection is thought to shape the evolution of protein-coding genes to reduce the cost of these errors when they occur. Nonsense errors (NSEs) occur when a ribosome stops translation before completing a functional protein, resulting in wasted energy on a non-functional product. Despite their functional consequences, NSEs and their effects on coding sequence evolution are generally understudied compared to other types of translation errors. This is in part due to the challenge of quantifying these errors from omics-scale data. We present a model for quantifying codon-specific estimates of elongation and NSE rates from ribosome profiling data, which gives a snapshot of the actively translating ribosomes in a cell. Although it is well-established that sense codons vary in their elongation rates, we find evidence that codons also vary in their NSE rates. Using our parameter estimates, we find multiple lines of evidence for selection against NSEs shaping patterns of codon usage bias. Our results suggest the cost of NSEs is comparable to the cost of ribosome pausing, and thus may play a greater role in coding sequence evolution than previously appreciated.
Citation: Cope AL, Pak D, Gilchrist MA (2026) The importance of nonsense errors: Estimating the rates and implications of ribosome drop-off during protein synthesis. PLoS Genet 22(6): e1012162. https://doi.org/10.1371/journal.pgen.1012162
Editor: Jianzhi Zhang, University of Michigan, UNITED STATES OF AMERICA
Received: July 31, 2025; Accepted: May 11, 2026; Published: June 9, 2026
Copyright: © 2026 Cope et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Raw sequencing reads were obtained from the SRA (SRR1049521, SRR7241903, SRR23242245, SRR23242246, SRR5766382, SRR5766388). Processed data and scripts for performing model fits and subsequent analyses are available at https://github.com/acope3/Yeast_Nonsense_Error_Analysis.
Funding: This work was supported by the NIH-funded Rutgers INSPIRE IRACDA Postdoctoral Program (grant #GM093854 to ALC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Protein synthesis, i.e., the transcription and translation of mRNAs, is one of the most energetically costly cellular processes [1]. For example, 66% of ATP usage is dedicated to protein synthesis in E. coli [2]. While both transcription and translation are critical to protein synthesis, the cost of translation is estimated to be 125 times the cost of transcription [3]. Translation involves both direct (GTPs used to charge tRNAs and drive each elongation cycle) and indirect costs (overhead cost of the translation infrastructure, such as the ribosome and tRNAs) [4–6]. Furthermore, mRNAs generally undergo multiple rounds of translation. Given the energetic cost of translation and the finite energy budget of a cell, genotypes reducing the cost of protein production are expected to have a selective advantage that scales with a gene’s average protein production rate. A classic example of this is the bias of highly expressed genes toward synonymous codons recognized by more abundant tRNA, which is hypothesized to be an adaptation due to natural selection to reduce the indirect cost of ribosome pausing [5–8].
In addition to being energetically costly, translation is relatively error-prone compared to DNA replication and transcription [9]. Translation errors potentially result in non-functional proteins. Much of the literature has focused on the impact of ribosomes misreading codons, resulting in the incorrect amino acid being inserted into the growing peptide chain (i.e., a missense error) [10–13]. The incorrect amino acid sequence may result in a misfolded protein that is potentially corrected via interactions with chaperone proteins [14,15]. In contrast, a nonsense error (a.k.a. premature termination errors or processivity errors) results in an incomplete protein [16,17]. As an incomplete protein is expected to have little, if any, functional capacity and is unlikely to be rescued via chaperones, nonsense errors may contribute substantially to the overall cost of translation [18]. Based on work in E. coli and S. cerevisiae, nonsense errors are estimated to occur at an approximate frequency of 1 every 10,000 codons [19–23].
Each ribosome elongation step can be thought of as having two possible outcomes: a nonsense error or successful elongation. Assuming the amount of time a ribosome spends waiting before elongation or an error occurs is exponentially distributed, the probability of a nonsense error at a given codon depends on both its nonsense error rate (NSE rate, i.e., the background rate at which a nonsense error event is triggered) and its elongation rate. It is well-established that ribosome elongation rates generally vary across codons, often due to differences in the tRNA availability and codon-anticodon base-pairing effects (e.g., wobble) [24–29]. As a result, slower codons are expected to be more prone to nonsense errors, even if NSE rates are uniform across codons [18,30–32].
Many theoretical studies of translation dynamics assume the NSE rate is uniform across codons. Theoretical studies making this assumption, while still allowing variable ribosome elongation rates across codons, predict that the probability of a ribosome completing translation varies substantially across transcripts as a function of amino acid sequence, protein length, and codon usage bias [30,33,34]. Conflicting with the assumption of uniformity, studies in both E. coli and S. cerevisiae indicate sense codons differ in their capacity to pair with release factors [35–37], often by mimicking the structural mechanism of stop codon recognition upon release factor binding [38]. Additionally, other mechanisms that lead to nonsense errors, such as peptidyl-tRNA drop-off [39–41] and ribosomal frameshifting [42,43], are expected to vary across codons. This limited evidence suggests NSE rates differ across codons, which could amplify or dampen the differences in codon-specific elongation rates. If codons vary in their NSE probabilities, then it follows that genes can evolve to reduce the cost of nonsense errors via codon usage. As before, the selective advantage for synonymous genotypes reducing the cost of nonsense errors should increase with a gene’s average protein production rate. Thus, we expect to see increasing evidence of adaptation in codon usage bias to reduce the cost of protein production with a gene’s average protein production rate.
Ribosome profiling is a powerful technique for quantifying steady-state translation across the transcriptome [44]. Compared to mass spectrometry-based approaches (including those specifically intended to quantify translation), ribosome profiling provides better sequence coverage, a broader dynamic range, and reveals actively translating ribosomes at codon-level resolution [45]. Due to these advantages and the decreasing monetary cost of next-generation sequencing, ribosome profiling is a powerful approach for studying translation on an omics scale [46]. However, ribosome profiling does not directly track ribosome movement along a transcript, requiring mathematical models to extract biological information from these empirical measurements. Previous attempts to quantify nonsense errors from ribosome profiling data have typically averaged over codons and genes, ignoring variation in elongation, nonsense error, and protein production rates [21,23]. To explicitly account for this variation, we extend our previous model of nonsense errors and ribosome movement along a transcript [18,30] to estimate codon-specific elongation and NSE rates using a high-quality ribosome profiling dataset in S. cerevisiae [27].
Using our estimates, we evaluate and contextualize the estimated distribution and costs of nonsense errors across the S. cerevisiae transcriptome. We find compelling evidence that NSE rates and probabilities differ across codons and amino acids. As genes differ in amino acid usage, protein length, and codon usage, our results indicate the probability that a ribosome completes translation of a transcript varies across genes (interquartile range 0.87 – 0.95). We identify multiple lines of evidence that the S. cerevisiae genome is extensively adapted to reduce the cost of nonsense errors during translation. Our evidence includes an overall bias towards nonsense error-prone codons near the 5’-ends of the coding DNA sequences (CDS) and an avoidance of nonsense error-prone codons in highly expressed genes. Consistent with previous theoretical studies relying upon tRNA-based proxies of elongation rates [30,33], we find approximately 60% of genes exhibit signals of adaptation to reduce the cost of nonsense errors in S. cerevisiae. Despite these adaptations to reduce their cost, we estimate nonsense errors impose an energetic burden comparable to, if not greater than, ribosome pausing. Although nonsense errors are believed to be less frequent than missense errors, because a preponderance of nonsense errors (but only a fraction of missense errors) are likely to disrupt a protein’s functionality, our work suggests natural selection against nonsense errors plays a substantial, important, and underappreciated role in protein-coding sequence evolution.
Materials and methods
We are interested in calculating the probability of observing a ribosome footprint (RFP, i.e., a mapped sequencing read) at a codon within an mRNA transcript as measured via ribosome profiling. We assume there is a pool of RFP generated from the transcriptome and that the mRNAs in this pool are close to steady-state in terms of ribosome initiation and completion of translating a transcript. Below we give an overview of our model formulation and usage, with additional details found in S1 Text. Definitions of all model parameters can be found in Table 1.
Ribosome pausing and nonsense error model
Below, we outline the assumptions behind our ribosome PAusing and NonSense Error (PANSE) model and the formal likelihood function they lead to.
PANSE model assumptions.
For a given gene g, mg represents its equilibrium mRNA transcript concentration within a cell and represents the average ribosome initiation rate for each of its mRNAs. Gene g consists of ng codons and the elongation and nonsense error rates at codon position i are cg,i and bg,i, respectively. We assume the elongation rate cg,i varies between sites such that
InverseGamma
where the shape parameter
and scale parameter
can vary between the 61 sense codons. The across-site heterogeneity of a given sense codon’s elongation rates reflects a variety of factors that alter ribosome elongation [22,47]. Although the nonsense error rate bg,i likely varies with codon and position, for simplicity we focus solely on the variation between codons and, thus, treat the codon-specific nonsense error rate b as a constant, having the same value across all codons of type c. Based on these assumptions, it is possible to model the distribution of footprints as coming from a negative binomial distribution, i.e.,
with
,
, where
. The composite parameter
is a rescaling of a codon’s specific rate parameter
by the ratio of the partition function for the RFP distribution Z from which the footprints are sampled and the total number of observed footprints in a dataset
. Thus, Y/Z is a measure of the sampling efficiency of the experiment.
Given Yg,i represents the observed number of ribosome footprints (RFP) at codon i of gene g, it follows that the likelihood of observing Yg,i is,
where is a rescaling of the mRNAg specific ribosome initiation rate
by the density of mRNAg transcripts within the cell, mg. Assuming independence between elongation steps, it follows that the probability of a ribosome successfully elongating at codons 1 through
is,
where represents the PDF of the InverseGamma distribution for the ribosome elongation rate for codon i.
Because the nonsense error rate (NSE rate) b between sense codons is likely to be several orders of magnitude less than a given cg,i and we will be working with our likelihood function on the log scale, we can greatly speed up our evaluation of the log of Eqn. (1) by approximating using a 2nd order Taylor Series approximation around a mean NSE rate for all codons
. Doing so gives,
Analysis of ribosome profiling data
Raw ribosome profiling for S. cerevisiae reads were downloaded from the Sequence Read Archive (SRA Run Accession: SRR1049521) [27]. These data were generated using a flash-freeze protocol to halt ribosome elongation, avoiding many of the technical artifacts caused by cycloheximide [27,48]. The raw reads were processed using riboviz2 [49] and a transcriptome FASTA file containing annotated protein-coding sequences from the Saccharomyces Genome Database (Release R64-2–1), as well as 250 nucleotide upstream and downstream of the start and stop codons, respectively. Aligned reads of length 28–30 were extracted and assigned to codons in the A-site, assuming a 15-nucleotide offset relative to the 5’-end of the read [44] (S1 Fig). We note that using different A-site offsets based on the riboWaltz R package had little impact on parameter estimation (S2 Fig).
A flat file was created that includes the number of ribosome footprints (i.e., counts) assigned to each codon within a gene. This file was used as the primary input into our implementation of PANSE. We employed many filtering criteria to remove genes that violate the assumptions of the model (see S1 Text and S3 Fig). This led to a final dataset consisting of RFP counts for 3,112 S. cerevisiae protein-coding genes, representing approximately 50% of genes found in the genome.
Fitting the pausing and nonsense error model
The PANSE model was fit via a Markov chain Monte Carlo (MCMC) algorithm to the codon-level RFP counts for the genes included in the final data set. The MCMC was run for 50,0000 iterations, keeping every 5th iteration to reduce autocorrelation. Convergence was assessed by comparing the results of two separate MCMCs that were started at random points in parameter space. Posterior means and 95% highest density intervals (HDIs) were calculated for each parameter of PANSE based on the MCMC traces. Consistent with our previous work [7], gene-specific initiation rates are assumed to follow a lognormal distribution with a mean initiation rate of 1. This is accomplished by fixing the mean of the lognormal distribution to be
, where
is the standard deviation of the lognormal distribution. The shape and scale parameters,
and
, for each codon-specific elongation rate c, and the codon-specific NSE rates b, were assumed to have a flat prior with ranges (0, 100), (0, 100), and (1E-100, 1E-1) on the natural scale. We note these are particularly broad distributions, but we wanted to ensure that adequate parameter space could be explored. We fit PANSE (1) assuming no nonsense errors occur, (2) assuming uniform NSE rates across codons, and (3) allowing NSE rates to vary across codons. These three model fits were compared using the Deviance Information Criterion (DIC) [50] to determine statistical support for variation in NSE rates across codons.
The ribosome profiling data we used were prepared with a flash-freeze protocol to halt elongation, but there is still an observable increase in ribosome density in the first 200 codons. Although this increased density was less than observed in ribosome profiling measurements using cycloheximide to halt elongation, it is unclear if this reflects true biology or a technical artifact [27]. As it is plausible that ribosome counts for the first 200 codons are impacted by unknown technical biases, we fit PANSE masking the ribosome counts for this region in the likelihood calculation for each gene. However, we did include these codons in our calculation of the expected probability of elongation up to codon i .
We compared parameter estimates of PANSE to (1) independent empirical data such as mRNA abundances and tRNA-based proxies for elongation rates [27], and (2) parameter estimates from the Ribosomal Overhead Cost version of the Stochastic Evolutionary Model of Protein Production Rates (ROC-SEMPPR), which estimates protein production rates and codon-specific selection coefficients
from codon usage bias patterns [7]. See previous work regarding parameter estimation with ROC-SEMPPR for S. cerevisiae [7,51].
Analyzing variation in NSE rates b across codons
To better understand the variation in NSE rates b across codons, we performed a linear regression of the posterior mean NSE rates b with properties of the codon. This included the number of stop codons 1 nucleotide mutation away from the codon (0, 1, or 2), the missense error probability of the codon [52], and whether or not the codon is prone to frameshifts [42]. We weighted each codon in the regression by the standard deviations estimated from the MCMC traces.
Quantifying variation in translation completion probabilities
across genes
We used our estimates of elongation and NSE rates, c and b, for each of the 61 sense codons to calculate the translation completion probability . This allowed us to apply this formula to all genes in the S. cerevisiae genome, regardless of whether or not it was included in the model fit. We assessed how variation in translation completion probabilities
varied across genes as a function of length and gene expression measured as mRNA abundances (in units of RPKM) taken from previous work [27]. Additionally, we compared how well our inferences of translation completion probabilities
made directly from empirical ribosome profiling data compared to theoretical estimates based on simulations from a Totally Asymmetric Exclusion Process (TASEP) model of translation [34].
Quantifying elongation probability variation within and across genes
As before, ci and bi represent the expected elongation and NSE rates, respectively, of the codon at position i. Given that there are only two possible outcomes in our model (elongation or a nonsense error), it follows that the probability a ribosome elongates the growing peptide chain at codon i is,
The probability a ribosome reaches codon i by successfully elongating at the previous codons is,
To evaluate how elongation probabilities vary with codon position i, we calculated the average probability a ribosome elongates at a given position as the geometric mean of the elongation probability across genes. We then regressed the log(Position i) with the average elongation probability to quantify how the elongation probability changes as a function of position. To test if the observed slope was greater than expected under the null model of no selection against nonsense errors, we generated 1000 different permuted sets of genes for each of 3 different possible nulls: (1) the synonymous codons of an amino acid were permuted within a gene (randomized by amino acid), (2) amino acids and codons were permuted within a gene’s coding region (randomized by CDS), and (3) codons were permuted across genes (randomized across CDS). The slope estimated from the real set of genes was then compared to each of the 3 null distributions, with the p-value = (k + 1)/(1000 + 1), where k is the number of occurrences in which a slope from a permuted set of genes was greater than the slope from the real set of genes. A similar analysis was performed on the real data after binning genes based on mRNA abundances into the lower quartile (low expression), interquartile (moderate expression), and upper quartile (high expression) on the log scale.
Identifying codons enriched in the 5’-end
To identify codons enriched in the 5’-end and 3’-end of the CDSs (i.e., near the start and stop codon, respectively), we used both an absolute and a relative definition of the termini. The absolute definition considered the termini to be the first and last 100 codons of a coding sequence. In this case, we restricted our analysis to coding sequences with a minimum of 250 codons. The relative definition considered the termini to be the first and last 25% of codons along a coding sequence. In this case, no minimum length cutoff was used.
To identify codons enriched in the 5’-end, an empirical expectation was determined by calculating the frequency of each codon, relative to its synonyms, in the “middle” (i.e., neither of the termini) of the CDS across all genes. For each codon, we used a one-sided binomial test to determine if the observed frequency in the 5’-ends was greater than expected by chance based on its observed frequency in the middle of the CDS. A similar enrichment test was used for the 3’-ends.
Calculating the cost of nonsense errors
To better understand the potential importance of nonsense errors in translation, we calculated the expected cost of producing a complete, functional protein (in terms of the number of hydrolyzed NTPs) from a given transcript that includes the impact of nonsense errors. For simplicity, we excluded the cost of amino acid synthesis from our calculations.
Conceptually, we break the cost of protein synthesis into two overlapping sets of categories: direct vs. indirect costs and fixed vs. variable costs. Direct costs include the NTP used in assembling the small and large ribosome subunits on the mRNA and each elongation step by the ribosome, whereas indirect costs are based on the synthesis cost of the translation infrastructure. In terms of fixed and variable costs, fixed costs are the direct and indirect costs of translating a protein in the absence of a nonsense error, i.e., costs incurred every round of successful translation. Variable costs are the expected direct and indirect costs of translation that are wasted when a nonsense error occurs – these costs are variable because they are only incurred if a nonsense error occurs. As a result, the total cost of producing a particular protein is the sum of fixed and variable costs, each of which is comprised of direct and indirect costs.
From our model, it follows that
Where a1 = 4 NTP and a2 = 4 NTP are the direct costs of translation initiation and elongation, respectively, in terms of hydrolyzed phosphate bonds [18].
Combining the contribution of fixed and variable direct costs of protein synthesis yields,
Similarly, combining the contribution of the fixed and variable indirect costs of protein synthesis yields,
The term represents the indirect cost of ribosome pausing up to codon i. Because RPF data lacks information on the absolute rate of ribosome elongation, we rescaled our estimates of pausing times such that the average elongation rate across the 61 sense codons was ∼9.3 codon/sec or, equivalently,
sec/codon. Assuming the average mRNA has a CDS of 400 codons, the same average elongation rate as above, and that at any given time 80% of the ribosome population is engaged in translating mRNAs [27], it follows that the expected initiation rate c0 is
sec (see S1 Text for more details).
The parameter C converts the indirect cost of ribosome pausing (in units of seconds) into their equivalent costs in terms of NTP and has the units of NTP/sec. Although we are unaware of any empirical estimates of C, we used two different approaches to estimate C that vary by less than one order of magnitude. One estimate of C is based on selection coefficients estimated from a ROC-SEMPPR analysis of the S. cerevisiae genome and yields CROC = 0.71 NTPs/sec). The second estimate of C is based on the assembly cost (in NTPs) and average lifespan (in seconds) of the ribosome and yields = 5.5 NTPs/sec. See S1 Text for detailed descriptions of these calculations. Given these two independent estimates of C are the only ones we have, we treat CROC and
as lower and upper bounds on C, respectively. Thus, the average cost of ribosome pausing is approximately
or 0.59 NTP/codon for CROC and
, respectively, suggesting indirect costs are a fraction of the direct cost of 4 NTP/codon.
In addition to using our cost estimates directly, we also used them to test the hypothesis that the order of synonymous codons along a gene shows evidence of adaptation to reduce the cost of nonsense errors. To do so, we generated a null distribution of the expected cost of translation for each gene when there is no adaptation to reduce the cost of nonsense errors by permuting the order of synonymous codons. These permuted sequences had the exact same amino acid sequence and codon usage as the sequence observed in the genome, but differed in the order of synonyms. For each permutation, we calculated its expected protein production cost using Eqn. (3). We then compared the mean protein production cost of our population of permuted sequences to that of the protein production cost of the observed sequence. We then scored each observed sequence as having a protein production cost either greater than (0) or less than (1) the expected cost under the null hypothesis. To test whether the observed sequences are biased towards lower costs than expected under the null hypothesis, we performed a binomial test (H0: p = 0.5). This allowed us to test if the number of sequences with an expected cost less than the mean expected cost across the permuted sequences was greater than expected by chance. Because the gross cost of nonsense errors scales with the gene’s protein production rate, we performed a logistic regression to test if the probability that a gene has a lower cost than expected increases with gene expression.
Analysis of additional S. cerevisiae ribosome profiling data
To complement our analysis of the Weinberg et al. data, which is generally considered to be high quality, we also analyzed data from two more recent ribosome profiling experiments: Wu et al. 2019 (SRA: SRR7241903) [29] and Ferguson et al. 2023 (SRA:SRR23242245, SRR23242246) [53] for comparison. We also analyzed a wild-type replicate (SRA:SRR5766382) and an elp1 deletion strain replicate (SRA: SRR5766388) from Chou et al. 2017 [28] to confirm that changes to tRNA function (in this case, deletion of a tRNA modification enzyme) primarily impact elongation rates c and not NSE rates b. These datasets were processed using the riboviz2 pipeline, with minor differences in the read lengths selected and in defining the A-site (see S1 Text). Finally, to estimate stop codon “NSE rates,” we introduced a slight modification to the Weinberg et al. data, allowing up to 15 codons in empirically determined 3’-UTRs (obtained from [54]).
Results
After PANSE was fit to S. cerevisiae ribosome profiling data from Weinberg et al. [27], we compared parameter estimates to appropriate empirical and theoretical proxies, finding them in good agreement (Fig 1, S1 Table, S2 Table). Both mRNA abundances and tRNA gene copy number (tGCN) are common empirical proxies for translation initiation rates and elongation rates, respectively [24,27]. We expected PANSE estimates of gene-specific translation initiation rates and codon-specific elongation rates c to be well-correlated with these empirical proxies. Indeed, translation initiation rates
and independent RNA-Seq-based estimates of mRNA abundances are strongly correlated [27] (Fig 1A, Spearman rank correlation
). PANSE estimates of elongation rates c and estimates based on tGCNs (including effects of wobble base pairing) are reasonably well-correlated (Fig 1B, Spearman rank correlation
).
Histograms on x and y-axes represent the distributions of the relevant variables. (A) RNA-seq estimates of mRNA abundances (RPKM) (a common proxy for translation initiation rates) and PANSE estimates of initiation rates . (B) tRNA gene copy numbers (tGCN) based estimates of elongation rates and PANSE estimates of elongation rates. (C) ROC-SEMPPR estimates of protein production rates
(based on differences in codon usage) and PANSE estimates of initiation rates
. (D) ROC-SEMPPR estimates of selection coefficients
and relative ribosome waiting times
. Waiting times w are defined as the inverse of the elongation rates w = 1/c. Selection coefficients and waiting times were set relative to the alphabetically last codon for each amino acid.
Theoretical biophysical models rooted in population genetics principles effectively estimate parameters relevant to the evolution of codon usage bias [6,7,18,55]. One such model, ROC-SEMPPR, estimates protein production rates and selection coefficients
solely from genome-wide patterns of codon usage bias. ROC-SEMPPR assumes differences in natural selection on codon usage are due to differences in elongation rates c between synonyms [6,7]. We obtained parameter estimates from a previous application of ROC-SEMPPR to the S. cerevisiae CDSs [51]. We note protein production rates
, but ROC-SEMPPR ignores translation errors such that
. As expected, initiation rates
and protein production rates
are strongly correlated (Fig 1C, Spearman rank correlation
). To make elongation rates c comparable to the ROC-SEMPPR selection coefficients (which reflect selection against slow codons), we converted our elongation rates to pausing times
and made them relative to ROC-SEMPPR’s pre-defined reference codon for each synonymous codon family (
, where i is a codon and cref is the reference synonym). These estimated relative waiting times
and selection coefficients
are strongly correlated (Fig 1D, Spearman rank correlation
).
To the best of our knowledge, empirical measurements of codon-specific NSE rates b are missing from the literature; however, the frequencies at which ribosomes mistakenly readthrough stop codons have been quantified using a variety of approaches [54,56], with TAA and TGA being the least and most prone to readthrough, respectively. Stop codon readthrough results from a missense error at a ribosome awaiting a release factor, allowing it to elongate into the 3’-UTR of a transcript. We do not explicitly model the process of translation termination and rare missense error events at stop codons, but we expect stop codon “NSE” rates b to be anti-correlated with their readthrough efficiencies. By slightly modifying our Weinberg dataset to allow for up to 15 codons in the 3’-UTRs, we find that TAA had the highest “NSE” rate b, while TGA had the lowest, consistent with independent estimates of stop codon readthrough efficiency (S4A Fig).
The three stop codons have “NSE” rates b greater (by ≥2 orders of magnitude) than the NSE rates of the sense codons.
As an additional test of PANSE’s robustness, we tested if the deletion of the gene encoding a non-essential tRNA modification enzyme elp1 impacted primarily codon-specific elongation rates c or NSE rates b. The deletion of elp1 is known to decrease the elongation rates of codons AAA, CAA, and GAA [28,57]; thus, we expect this effect to be primarily absorbed into the elongation rates c parameter. By comparing model parameter estimates between the deletion strain and a wild-type strain [28], we observe a clear decrease in elongation rates for the specified codons, but observe no significant change in the NSE rates (S4B, S4C Fig).
NSE rates vary across codons
Theoretical and computational studies often ignore nonsense errors or assume a uniform (background) NSE rate b across all codons [21,30,33,34]. However, empirical studies indicate nonsense errors occur at an appreciable frequency [9,32], with targeted studies suggesting background NSE rates b could vary across codons [35,36]. This naturally leads to two questions: (1) do we see evidence of nonsense errors in ribosome profiling data, and (2) do we see evidence that NSE rates b (and not just NSE probabilities) vary by codon? To answer question (1), we compared the most complex model (NSE rates b vary across codons) to the simplest model (no nonsense errors) using a standard model comparison approach based on the Deviance Information Criterion (DIC) [50,58]. We find that the variable NSE rate b model better fits the ribosome profiling data compared to the no nonsense error model by 2381 DIC units. This indicates that the effects of nonsense errors are detectable in ribosome profiling data and that changes to ribosome density along transcripts are not solely due to differences in codon waiting times.
To answer question (2), we compared the model allowing NSE rates b to vary across codons to a model that assumed all codons had a uniform NSE rate. The uniform NSE rate model still allows for differences in NSE probabilities Pr(NSE) through differences in codon elongation rates. We find that the model allowing for variation in NSE rates b across codons is 187 DIC units better than the model assuming uniform NSE rates, indicating strong support for differences in the NSE rates across codons. This suggests the existence of factors of codons aside from elongation rate that alter the probability of nonsense errors Pr(NSE). NSE rates b varied over multiple orders of magnitude, ranging from 8.87 × 10−6 to 1.98 × 10−3 (Fig 2A). When accounting for differences in elongation rates c, the mean NSE probabilities Pr(NSE) across codons were on the order of 10−4, consistent with previous estimates in E. coli and S. cerevisiae [19–23].
NSE rates b are reported on a log10 scale. (A) Posterior means and 95% highest density intervals (HDIs) of NSE rates b estimated for each codon. Colors indicate the number of nucleotide mutations away a codon is from a stop codon. Solid and dashed black lines indicate the NSE rate b posterior mean and 95% HDI for a model fit sharing information across codons (i.e., assuming no variation in NSE rates). To be consistent with our previous work, we separated the amino acid serine into two blocks denoted S and Z. Bolded x-axis labels indicate codons that are frameshift competent [42]. (B) Comparison of NSE rates b with empirical missense error probabilities estimated from mass spectrometry data [52]. The Spearman rank correlation is reported. (C) Comparison NSE rates b between 11 frameshift competent codons (as defined in [42]) and the 50 other sense codons. A Welch’s two-sample t-test is reported. (D) Regression coefficient estimates from a weighted multiple regression of codon properties and the NSE rates b. Variables include the number of stop codon neighbors at each position, whether or not the codon was prone to frameshifts (frameshift competent), and the missense error probability of the codon. The intercept reflects the mean NSE rate b of the reference class of codon. Error bars represent 95% confidence intervals. An * indicates statistical significance (p < 0.05).
Importantly, our model is agnostic to the specific mechanisms associated with nonsense errors. Notable mechanisms that can lead to nonsense errors are release factor binding of sense codons, peptidyl-tRNA drop-off, and frameshift errors [59]. Mismatches between the codon and anticodon resulting from near-cognate or even non-cognate tRNA binding (i.e., missense errors) can increase the chances of peptidyl-tRNA drop-off and frameshift errors [40,43,60]. Consistent with this, we find that our NSE rate b estimates from ribosome profiling are positively correlated with codon-specific missense error probabilities estimated from a large number of mass spectrometry measurements in S. cerevisiae (Spearman rank correlation ,
, Fig 2B) [52]. Previous work in S. cerevisiae identified 11 sense codons that were particularly prone to causing frameshifts when located in the P-site of a ribosome, particularly when followed by a slow codon [42]. Based on our model estimates, these 11 “frameshift competent” codons had generally higher NSE rates b than the other 50 sense codons (on the log10-scale, mean of -3.58 vs. -4.44, Welch’s two-sample t-test, p = 0.0015, Fig 2C).
To better understand how these factors associated with higher NSE rates b contribute to the observed variation, we regressed our estimates against the number of stop codon neighbors (the number of stop codons that are a single nucleotide mutation away from a codon), the average missense error probability, and whether or not the codon is frameshift competent (Fig 2D, S3 Table). We note that we treated the number of stop codon neighbors as a categorical variable, as it is always either 0, 1, or 2. As this is regressing multiple categorical variables against the log10(NSE rate b), each of these coefficients represents a change relative to a reference class defined by the intercept.
One might expect that having more stop codon neighbors increases the NSE rate b of a codon, as there would seem to be a greater chance of being mistakenly recognized by a release factor. Indeed, we find that codons with a single stop codon neighbor at the 3rd position of the codon have an NSE rate b almost 1 order of magnitude greater than codons with no stop codon neighbors at the third position (, p = 0.0012). We did not observe such a relationship for codons with 2-stop codon neighbors at the third position (
, p = 0.884). In contrast, codons with a stop codon neighbor at either the 1st or 2nd nucleotide position showed no significant difference in NSE rates b compared to codons with no stop codon neighbors at these positions. This likely reflects that the ribosome more closely monitors codon-anticodon base pairing at the first 2 positions [61]. Consistent with the independent tests (Fig 2B,2C) and in line with the suspected role of missense errors in contributing to peptidyl-tRNA drop-off and frameshifts, we find a positive effect of a codon’s missense error probability (
,
) and frameshift competency (
,
) on NSE rates b.
Nonsense errors are an unlikely explanation for the “5’-ramp”
Ribosome profiling measurements often exhibit increased ribosome densities at the 5’-end of the CDS relative to the remainder of the mRNA [27]. We simulated ribosome profiling data based on the PANSE model using the parameters estimated from the real data. We observe a moderate correlation between the real and simulated ribosome counts (Spearman rank correlation ,
. On a position-by-position basis (ignoring the first 200 codons), the average log-fold difference between the real and simulated data is approximately 0.015 (One-sample t-test
, S5 Fig), suggesting our PANSE model slightly underestimates the number of counts by about 1.5%, on average.
Based on the metagene profile of the ribosome densities for the real and simulated data, we observe good agreement in the post-200th codon region (Fig 3A, unshaded region). This is expected because this was the region used for calculating the likelihood of the data during model fitting. In contrast, the model poorly predicts ribosome densities near the 5’-end (a.k.a the 5’-ramp region), which were excluded during the model fitting. The gradual decrease in ribosome densities along the first 200 codons is far more drastic in the real data than expected based on the model parameters estimated using the remainder of the CDSs (Fig 3A, shaded region). This suggests other factors are at play in the 5’-ramp that impact ribosome densities, including that this is a technical artifact [27].
(A) Comparing metagene profiles for real ribosome profiling data (purple) and simulated ribosome profiling data based on the PANSE model (yellow). The first 200 codons are shaded to emphasize these codons were excluded in the likelihood calculation during model fitting. (B) Comparison of NSE rate b estimates for each of the 61 sense codons based on either the first 200 codons or the remainder of the gene.
The discrepancy between real and simulated data raises the question of the necessary NSE rates b to generate the dramatic drop in ribosome density at the 5’-end. We fit PANSE to only the first 200 codons to determine if the parameters estimated were biologically plausible. Although initiation rates and elongation rates c are in good agreement with estimates from the post-200 codon fits (S6 Fig), the NSE rates b are significantly greater, with a mean NSE rate of approximately 0.003 across codons (Fig 3B). These higher NSE rates b translate to a higher average NSE probability Pr(NSE) of approximately 0.004 across the 61 sense codons. An NSE probability Pr(NSE) of 0.004 per codon means only 45% of initiated ribosomes are expected to make it to the 200th codon, which is unrealistically low. Taken together, our results suggest the 5’-ramp in ribosome profiling data is, at best, only partially due to nonsense errors.
The probability that translation is completed varies greatly between transcripts
Based on our best model fit, codons differ in their elongation rates c and NSE rates b, meaning they also differ in their NSE probability (i.e., )). As such, variation in codon usage across genes is expected to lead to variation in the probability of a ribosome completing translation
. Across all genes, the median probability of a ribosome completing translation
is approximately 0.92 (interquartile range 0.87 – 0.95). A key factor determining the probability of a ribosome completing translation is the number of codons within a transcript’s CDS. Even if the probability of a nonsense error at any given codon is rare, longer CDSs afford more opportunities for a nonsense error. The length of a CDS and its probability of experiencing a nonsense error
are highly correlated, as expected (Fig 4A, Spearman rank correlation
,
). Unsurprisingly, the S. cerevisiae proteins YHR099W, YKR054C, and YLR106C are the only transcripts considered with a greater than 50% chance of a ribosome experiencing a nonsense error. This likely has little to do with codon usage, as their respective transcripts have > 3700 codons, such that there are many opportunities for any given ribosome to drop off during translation.
Colors represent from ROC-SEMPPR (i.e.,
) for each gene. (A) Relationship between CDS length (in number of codons) and the probability of a nonsense error occurring,
. (B) Comparison of the probability of no nonsense error occurring
with simulation-based estimates of drop-off resilience [34]. The size of the point represents the length (in number of codons) of the gene’s coding region.
A previous study estimated the probability that a ribosome completes translation using a Totally Asymmetric Exclusion Process (TASEP) simulation parameterized by polysome profiling data (translation initiation rates), tRNA counts (codon-specific elongation rates), and a single NSE rate estimate from E. coli [34]. We compared our estimates of translation completion probability based on a model fit to empirical data to theoretical expectations based on this TASEP simulation (referred to as “drop-off resilience”), finding them to be highly correlated (Fig 4B, Spearman rank correlation
,
). We still observe a high correlation between the empirical and theoretical estimates when conditioning on CDS length (partial Spearman rank correlation,
,
), indicating length is not the only cause of the similarity between the two estimates of translation completion probabilities. Estimates of
from PANSE are generally greater than expected based on these TASEP simulations, which assumed a uniform NSE rate b across all sense codons. The NSE rate b used for the TASEP simulations was taken from a study that implicitly assumed the drop in ribosome density along transcripts was solely due to nonsense errors [21]. As we and others [22] have shown, the 5’-ramp region inflates estimates of NSE rates. Regardless, the high correlation between
and theoretical expectations suggests our PANSE model generally captures the across-transcript variability in the probability a ribosome completes translation based solely on empirical data.
Evidence supports adaptation to reduce nonsense errors
As protein synthesis is energetically costly, natural selection is expected to result in genome-level patterns consistent with adaptations to reduce the costs of nonsense errors. We specifically examine two key adaptations: (1) the reduction in frequency of codons with higher NSE probabilities Pr(NSE) along a CDS (5’ to 3’ direction), and (2) an anti-correlation between the frequencies of codons with higher NSE probabilities and gene expression.
Evidence that adaptation increases with position.
As the energetic investment in producing a protein increases as amino acids are added to the peptide chain, natural selection against nonsense errors is expected to be weakest near the start of translation, resulting in position-specific patterns of codon usage in which nonsense error-prone codons are biased toward the 5’-end of a gene’s CDS [17,18,30,33,62]. As such, codons with higher NSE probabilities Pr(NSE) are expected to be enriched in the 5’-ends relative to the remainder of the CDS. To control for the different lengths of CDSs, we assigned codons to the 5’-end based on their relative positions (i.e., their actual position divided by the number of codons in a given CDS) using a relative position cutoff of 0.25. The “middle” of a gene’s CDS was defined as codons falling between 0.25 and 0.75. By comparing synonymous codon frequencies in the 5’-ends to the middle region, we identified codons enriched at the 5’-end (one-sided binomial test, Benjamini-Hochberg adjusted p < 0.05). As expected, codons enriched in the 5’-ends have higher Pr(NSE) than codons showing no difference between the 5’-end and the middle regions (Wilcoxon rank sum test, , Fig 5A). We observe no such pattern if comparing the Pr(NSE) of codons enriched in the 3’-ends to those not enriched relative to the middle of CDSs (Wilcoxon rank sum test p = 0.67, Fig 5B). We obtain a similar result if we define the termini as the first and last 100 codons of each gene’s CDS (S7 Fig). We emphasize that the first 200 codons of each CDS were not considered in the actual parameter estimation (i.e., were not considered in the likelihood calculation).
(A) Comparison of codon-specific Pr(NSE) (on the log scale) of codons enriched in the 5’-end (the first 25% of codons for each sequence) vs. those not significantly enriched relative to the middle regions of the CDSs. Wilcoxon rank sum test p-value reported. (B) Same as in (A), but using the last 25% of codons. (C) Geometric mean of the probability of successful elongation by the ribosome up to the 500th codon across all CDSs (purple). Solid lines and equations represent the linear regressions relating the log(Position) of a codon to the change in probability of elongation for the real sequences and the various nulls. For the null regression lines, the mean slope and the mean intercept across the 1000 permuted sequences were used.
Natural selection against nonsense errors is expected to strengthen along a CDS, thereby increasing the probability of successful elongation. Except for the first 10 codons, the probability of elongation increases along the CDS before appearing to plateau (Fig 5B). This increase in the probability of elongation as a function of codon position is consistent with adaptations to reduce the energetic cost of nonsense errors. Regarding the apparent decrease in the probability of elongation for the first 10 codons, we note that these results are qualitatively consistent with previous findings that observed a decrease in codon bias immediately following the start codon, followed by a gradual increase [62]. The change in the probability of elongation observed in the true sequences is much greater than observed for the null expectations generated by permuting codons (Figs 5B, S8). This is consistent with natural selection against nonsense errors being generally weaker at the 5’-end.
Evidence that adaptation increases with gene expression.
As highly expressed genes generally undergo more rounds of translation (i.e., take up a larger portion of the cell’s energy budget), a lower probability of completing translation in a highly expressed gene will lead to more wasted NTP. Thus, selection against nonsense errors should increase with gene expression, such that highly expressed genes generally have higher probabilities of completing translation . In our comparisons of CDS length and the probability a nonsense error occurs, we observed a clear gradient in this relationship: highly-expressed genes tend to have lower probabilities of experiencing a nonsense error compared to CDSs of similar length but lower expression (Fig 4A). Using mRNA abundances [27] as an independent measure of gene expression, we find a positive correlation between the probability of completing translation
and gene expression (Spearman rank correlation
,
, Fig 6A). This analysis does not control for CDS length, indicating highly expressed genes are more likely to be successfully translated regardless of length. By using a partial Spearman correlation to control for length, we find that the correlation between gene expression and the probability of completing translation
increased (partial Spearman rank correlation
,
). As expected, our results are consistent with selection against nonsense errors being generally stronger in high-expression genes compared to moderate- or low-expression genes of similar length.
(A) Comparing gene expression (mRNA abundance RPKM) with the probability of a ribosome completing translation . Histograms on the x and y-axes represent the distributions of the corresponding variables. Colors indicate the length of the CDS.(B) Comparing relative NSE probabilities Pr(NSE) among synonymous codons to selection coefficients
from ROC-SEMPPR. A higher value of
indicates a codon that is disfavored by selection relative to a reference codon (the alphabetically last codon for each amino acid). Error bars represent the 95% HDIs. (C) Same as in Fig 5B but separating genes into bins based on mRNA abundances (RPKMs). Solid lines represent the linear regression relating codon log(position) to the probability of elongation. Regressions (all coefficients with
) and Spearman rank correlations
are reported.
ROC-SEMPPR assumes selection for on codon usage is uniform along a CDS, but codons with higher NSE probabilities are expected to be avoided in high-expression genes due to the increased energetic costs of experiencing frequent translation errors. Under selection against nonsense errors, we expect ROC-SEMPPR’s selection coefficients to be correlated with relative NSE probabilities (i.e., between synonymous codons). We find that
and differences in NSE probabilities Pr(NSE) are well-correlated (Spearman rank correlation
, Fig 6B), indicating that synonymous codons with higher NSE probabilities Pr(NSE) are avoided in high-expression genes.
Consistent with the avoidance of synonymous codons with higher NSE probabilities Pr(NSE) in high-expression genes, we observed that the probability of successful elongation by position is greater in high-expression genes (Fig 6C), suggesting these genes are better adapted to reduce the cost of nonsense errors. Independent of gene expression, we observed that the probability of successful elongations increases along the transcripts.
The energetic costs of nonsense errors are likely substantial
Although nonsense errors are rarer on average than missense errors [9,59], they may be more costly due to the high probability that the truncated protein is non-functional. We calculated the expected cost of translation (in terms of NTP) (see Materials and Methods) that accounts for the direct ATP cost of translation initiation and peptide elongation, the indirect overhead cost due to ribosome pausing, and the direct and indirect costs of nonsense errors. The direct and indirect costs associated with translation initiation and elongation are the fixed costs, as these will be incurred every time a functional protein is produced. In contrast, direct and indirect costs associated with nonsense errors are variable costs as these will only be incurred if a nonsense error occurs (see Eqn. 3). For the cost of ribosome pausing during translation C, we focus on a rough estimate based on ribosome production costs and half-lives
as this estimate is more directly tied to ribosome assembly (see S1 Text). Unsurprisingly, the majority of NTP used during translation is associated with fixed direct costs of translation initiation and ribosome elongation (Fig 7A); however, this is not expected to impact the evolution of codon usage because fixed direct costs do not vary across codons.
(A) Relative costs of translation in terms of direct costs (translation initiation and peptide elongation), indirect costs (overhead costs of ribosome pausing), and variable costs (cost of nonsense errors). Genes are ordered based on total energetic flux: the product of the total costs and protein production rates (as estimated by ROC-SEMPPR, i.e.,
). (B) The log fold-difference between the variable costs (i.e., the direct and indirect costs of nonsense errors) and the fixed indirect cost of elongation (i.e., ribosome pausing). The mean log fold difference is -0.56 (One-sample t-test
. (C) Relationship between gene expression (mRNA RPKM) and the expected cost of translation per codon. The Spearman rank correlation
and the associated p-value are reported. (D) Relationship between gene expression and the probability of whether or not a gene has a lower observed energetic cost than expected based on the permuted nulls. Histograms represent the distribution of gene expression for genes with and without evidence of adaptation against nonsense errors. The solid line and the equation represent the result of a logistic regression. The logistic regression slope is statistically significant (p = 0.00024), but the intercept is not (p = 0.19). (E) Same as in (D), but with length as the predictor variable. Both the logistic regression slope (
) and intercept (
) are statistically significant.
Studies on the evolution of codon usage have primarily focused on the cost of pausing, either explicitly or implicitly. Based on NTP/s, the fixed indirect cost (i.e., the cost of ribosome pausing) is generally greater than variable costs (both variable direct and variable indirect costs) (Fig 7A,7B). However, variable costs (i.e., the direct and indirect costs associated with a nonsense error) are usually within an order of magnitude of the fixed indirect costs (i.e., the total cost of ribosome pausing for producing a single protein) for the majority of genes under consideration (≈86%, Fig 7B). We note this is likely a conservative estimate of variable costs. Using
NTP/s, which is based on comparing ROC-SEMPPR’s selection coefficients
to ribosome elongation rates c, we find the variable costs to be generally greater than the fixed indirect costs (S9 Fig). Consistent with adaptation to reduce the cost of translation, indirect costs (both fixed and variable) generally decrease as a function of total energetic flux (S10 Fig). We suspect our estimates of CROC and
reflect reasonable bounds on the true energetic cost of ribosome pausing. Combined, our results suggest the costs of nonsense errors are comparable to the cost of ribosomal pausing.
As natural selection to reduce energetic costs is expected to be stronger in high-expression genes, we expect the cost of a gene to decrease as gene expression increases. Consistent with this expectation, we observe the expected cost per codon (i.e., , where n is the number of codons) is negatively correlated with gene expression (Fig 7C). To test for evidence of adaptation to reduce the cost of nonsense errors, we generated a null distribution for the expected translation cost for each gene based on 1000 permuted sequences of synonymous codons. These permutations could be viewed as nulls reflecting the absence of natural selection against NSEs, as permutations do not alter the codon-order invariant fixed translation costs. As codon usage for S. cerevisiae is at selection-mutation-drift equilibrium [6], we are effectively testing against a null that assumes adaptive codon usage via natural selection that varies with gene expression but not with position within a gene (e.g., natural selection against ribosome pausing). We find the true expected cost of a sequence
is less than the mean expected cost of the permuted sequence
for 59% of genes, which is greater than the 50% expected by random chance (two-sided binomial test,
). The average difference between the true cost and the mean cost of the permuted sequences is approximately 4 NTPs, roughly the same cost as initiating another ribosome or the direct cost of translation elongation.
We have already seen evidence that highly expressed genes are better adapted to reduce the cost of nonsense errors. Furthermore, as nonsense errors are more likely to occur in longer genes, gene length may be another predictor of adaptation to reduce the cost of nonsense errors. By classifying genes as either those that were better adapted relative to the permuted sequences (i.e., 1 if and 0 otherwise), we find highly expressed and longer genes are more likely to be adapted to reduce the cost of nonsense errors (Fig 7D,7E).
Parameter estimates across S. cerevisiae ribosome profiling datasets are consistent
Ribosome profiling, as with any high-throughput experiment, can be subject to technical and biological variation [55]. There are many protocols for performing ribosome profiling, each with possible biases [27,48,63]. To ensure that our model parameter estimates were generally consistent across measurements, we fit PANSE to data from two independent ribosome profiling measurements: Wu et al. [29] and Ferguson et al. [53]. Although these measurements were performed in S. cerevisiae, they used different protocols.
Comparing the metagene plots for the transcripts considered in each of the datasets, Weinberg et al. and Wu et al. exhibit the 5’-ramp; however, this effect is nearly absent from the Ferguson et al. data, with a slight depletion in ribosomes in the region between the 75th and 100th codon (S11A Fig). We emphasize that none of these reads were included in the actual likelihood estimation. The Ferguson et al. data had the fewest transcripts under consideration (1,918 as compared to 2,785 for Wu et al., and 3,112 for Weinberg et al.). The Ferguson et al. data were also much sparser (S11B Fig), even if only considering the same sequences across all datasets (S11C Fig). Despite this, we see overall good agreement between the NSE rate b estimates of these two datasets with the Weinberg et al. estimates, despite the reduced number of reads (Spearman rank correlation and
for Wu et al. and Ferguson et al., respectively, S11D, S11E Fig).
Notably, the Ferguson et al. dataset has much noisier estimates (S11E Fig), but this is unsurprising given that it is a much smaller dataset. When assuming NSE rates b across sense codons when analyzing Ferguson et al., the NSE rate b is lower, but comparable, to the same analysis using the Weinberg et al. data: 9.29 × 10−5 (95% HDI: 8.27 × 10−5 − 1.03 × 10−4) vs. 1.41 × 10−4 (95% HDI: 1.34 × 10−4 − 1.47 × 10−4). Although estimates of ribosome waiting times w = 1/c (inverse of the elongation rates) are correlated with a proxy based on the tRNA gene copy number, this effect is weakest for the Ferguson et al. data (S11F Fig). Additionally, the distribution of ribosome waiting times w appears narrower than in the Weinberg et al. and the Wu et al. data (S11F Fig). Overall, our results suggest we are detecting a consistent signal of nonsense errors across these independent datasets, but the reduced number of reads in Ferguson et al. data significantly weakens this signal.
Discussion
The impact of translation errors on coding-sequence evolution has been a major focus for the last 3 decades, with most of this work focused on the impact of missense errors [10,12,16,64]. Recent advances in mass spectrometry technology and proteome bioinformatics make it possible to detect missense errors on a proteome-scale [13,52]. Due to the robustness of the genetic code, missense errors may not necessarily lead to a non-functional protein. In contrast, nonsense errors are likely to almost always result in a non-functional protein. There is no current high-throughput technology to directly identify the location of a nonsense error at a particular codon in a transcript. However, high-throughput ribosome profiling allows for a steady-state measurement of the “translatome,” including the ability to assign ribosomes to particular codons [44]. We developed a model of ribosome movement along an mRNA during translation to quantify codon-specific nonsense error rates and probabilities from ribosome profiling data.
Applying our PANSE model to an exemplary measurement for S. cerevisiae [27], we find evidence that NSE rates b (sometimes referred to as the “background NSE rate”) vary across codons. Prior work generally assumed the probability of a nonsense error occurring at any given codon was solely due to variation in the elongation rates of the codons, with the NSE rate b assumed uniform [18,30,34]. In contrast, we observed that NSE rates b vary over multiple orders of magnitude (10−6 to 10−3), suggesting other properties of codons contribute to their propensity to trigger a nonsense error aside from differences in elongation rates c.
Importantly, our model is agnostic to the specific mechanisms that cause nonsense errors; however, our parameter NSE rates b reflect three key properties thought to contribute to nonsense errors. One potential cause of variation in NSE rates b among codons is the propensity for them to be mistakenly bound by release factors [32]. Indeed, codons a single nucleotide away from a stop codon (i.e., have a stop codon neighbor) at the 3rd nucleotide position (cysteine codons TGC/T and tryptophan codons TGG) have a higher NSE rate b, on average, compared to codons that do not have a 3rd position stop codon neighbor. The high NSE rates of the tryptophan codon TGG and arginine codon CGA are particularly notable in the context of more direct studies that indicate eRF1 can bind these sense codons [36,37]. Surprisingly, this was not the case for the amino acid tyrosine, for which both of its codons (TAC/T) each have 2 stop codon neighbors (TAA/G) – although the effect of having 2-stop codon neighbors at the 3rd nucleotide is positive compared to a codon with no stop neighbors, it was not statistically significant. Consistent with our results, previous work found eRF1 binds TAC only weakly [36]. Taken together, our results indicate that being a single nucleotide away from a stop codon at the wobble position can increase the chances of a nonsense error, but it is not in and of itself sufficient to increase the NSE rates b.
We find codons with higher missense error probabilities generally have higher NSE rates b. Previous work suggested mismatches between the codon and anticodon in the P-site can increase the chances of peptidyl-tRNA drop-off and ribosomal frameshifting [39,40,43,60]. Consistent with the latter, codons particularly prone to frameshifts also had higher NSE rates b. Recent work in E. coli found that peptidyl-tRNA drop-off driven by missense errors generally happens in the subsequent round of elongation [60]. Similar to the incorrect binding of release factors to sense codons [32], slower elongation in the A-site due to, e.g., low tRNA availability would seem to increase the probability of peptidyl-tRNA drop-off and ribosomal frameshifting [42].
Other mechanisms may also trigger premature translation termination and be absorbed into the model’s estimates of NSE rates b [65–68]. For example, previous work in E. coli found that a codon-anticodon mismatch for the tRNA in ribosome P-site was detrimental to the accurate decoding of the codon in the A-site, increasing the probability of a release factor binding to a sense codon [65,69] (although we note this was not observed in yeasts using a similar experimental setup [70]). In this case, the probability of a nonsense error at any given codon is not independent, but a function of the missense error rate of its immediate upstream neighbor, among other things. This likely also affects estimates of NSE rates b as it pertains to ribosomal frameshifts and peptidyl-tRNA drop-off. The fact that we find codons with higher NSE rates b tend to be those with higher missense error probabilities and higher ribosome frameshift capacities suggests our model detects these effects, but these estimates may be conservative given that we do not consider the surrounding sequence context.
We observe that simulated ribosome counts across codons and genes generated from PANSE were generally in good agreement with the real data beyond the 5’-ramp region (first 200 codons), indicating PANSE adequately models the underlying processes shaping ribosome profiling data. We emphasize that while the PANSE model does treat elongation rates as a random variable, it does not explicitly account for various potential factors hypothesized to impact local elongation rates, from upstream basic amino acids in the ribosome tunnel to downstream mRNA secondary structure [22,47,71]. We note the correlation between the real and simulated data may be inflated due to the latter being generated from parameters estimated from the former, but also attenuated (i.e., biased toward 0) due to the inherent noise in any sequencing data [72]. Regarding the impact of noise, previous work found generally poor to moderate agreement between the ribosome footprint counts at individual codons across independent ribosome profiling experiments [73]. Similarly, for the 3 ribosome profiling experiments we considered, the Spearman rank correlation between codon counts on a position-by-position basis ranged from 0.45 (Wu vs. Ferguson) to 0.58 (Weinberg vs. Wu). Future extensions of the model will benefit from explicitly accounting for noise in ribosome profiling data by combining information across independent measurements.
Significant efforts have been made to understand the increased ribosome density at the 5’-end of coding regions frequently observed in ribosome profiling data. In combination with the increased frequency towards slow codons at the 5’-end, previous work hypothesized slow translation was favored at the 5’-end to prevent downstream ribosome queuing (“the 5’-ramp hypothesis”) [74]. The 5’-ramp hypothesis remains controversial [27,75–80], although recent evidence suggests a very short ramp within the first 5 codons can impact translation efficiency [81]. Our work here is insufficient to directly address the 5’-ramp hypothesis. However, we can say that nonsense errors do not entirely explain the dramatic drop observed in ribosome density at the 5’-end. Given that the expected NSE rates b estimated from this region are unrealistically high, we suspect (but cannot confirm) that the increased ribosome density at the 5’-ends results from an experimental artifact [27] and/or a gradual increase in the rate of ribosome elongation over the first 200 codons.
We find numerous lines of evidence consistent with adaptation to reduce the cost of nonsense errors in intragenic codon usage patterns. Generally, selection against nonsense errors is expected to be weakest at the 5’-end of a CDS, as a nonsense error occurring early in translation will waste fewer NTPs. As a result, the codon usage of 5’-ends is expected to be less adapted than regions further along the CDS. Indeed, codons with lower NSE probabilities Pr(NSE) tend to be enriched in the 5’-ends. Consistent with this avoidance of higher NSE probability Pr(NSE) codons at the 5’-end, we find that the probability of successful elongation increases along a CDS.
In general, high-expression genes tend to avoid codons more prone to nonsense errors. As a result, high-expression genes generally have a higher probability of completing translation and a lower expected total energetic cost of protein production. Furthermore, selection coefficients estimated via the population genetics model ROC-SEMPPR (generally assumed to reflect differences in ribosome pausing times) [7] are well-correlated with relative differences in NSE probabilities Pr(NSE), consistent with high-expression genes avoiding nonsense error-prone codons. Selection coefficients estimated via ROC-SEMPPR will average over any form of selection pressure that correlates with gene expression [82]. We cannot say to what extent the selection coefficients
reflect selection for a reduction in ribosome pausing as opposed to selection against nonsense errors, as both will lead to an increase in translation efficiency.
Surprisingly, moderately expressed genes showed the greatest change along sequences in the probability of a ribosome successfully elongating compared to high and low-expression genes. Assuming selection on codon usage is at least partially due to selection against nonsense errors, this may seem counterintuitive at first glance. If selection against nonsense errors is strongest in high-expression genes, then selection may be shaping codon usage at the 5’-end. As a result, high-expression genes would be well-adapted even at the 5’-end. In contrast, low-expression genes are expected to be under the weakest selection against nonsense errors, resulting in a lower probability of completion overall that changes very little as a function of position. Moderate-expression genes may have high enough protein production rates such that selection against nonsense errors is expected to be more effective than in low-expression genes, but not so much that the 5’-end is well-adapted. We speculate this explains the greater increase in the probability of elongation along sequences in moderate-expression genes compared to high and low-expression genes.
Although studies on codon usage bias have predominantly focused on selection against ribosome pausing, recent theoretical work concluded the indirect cost of elongation due to ribosome pausing may be less of a factor shaping codon usage bias than other factors, including nonsense errors [71]. Based on a simple model for approximating the energetic costs (in NTPs) for each gene, we find the costs of nonsense errors to be comparable (i.e., within an order of magnitude) to the fixed indirect costs of ribosome pausing across the majority of the transcriptome. The majority (59%) of genes exhibit signals consistent with adaptations to reduce the cost of nonsense errors. However, this is likely a conservative estimate because our permutation test does not represent the expected codon frequencies sans natural selection. More accurately, our test reflects adaptation to reduce the cost of nonsense errors, in which the position of a codon is relevant, to adaptation to reduce a cost invariant with position, e.g., ribosome pausing. Although a logistic regression revealed the probability of a gene being adapted increased with gene expression, this was not the case for many high-expression genes (Fig 7D). As discussed previously, high-expression genes have a relatively low probability of experiencing a nonsense error even at the 5’-end (Fig 6C), such that permuting synonymous codons likely has a small impact on the energetic cost for a significant portion of high-expression genes. The expected energetic cost savings of adaptation against NSEs is approximately 4 NTPs – roughly the same as the cost of initiating a new ribosome for translation or a single round of elongation. Noting that translation is generally initiation-limited, adaptation against nonsense errors is expected to result in 1 fewer initiation event to produce a functional protein. These estimates were based on many assumptions and parameter estimates from previous model fits, and do not include the potential cost of certain quality-control mechanisms that may be triggered by nonsense errors (e.g., nonsense-mediated decay). As such, we suspect our estimates of the energetic cost of nonsense errors are conservative.
We emphasize that our results are consistent with natural selection against nonsense errors, but our parameters do not reflect evolutionary parameters of interest in population genetics studies (e.g., selection coefficients). The most notable pattern of natural selection against nonsense errors is the enrichment of codons with higher NSE probabilities Pr(NSE) in the 5’-end due to presumably weaker selection shortly after translation initiation. As natural selection against nonsense errors is correlated with natural selection for efficient translation, these two selective pressures are correlated across genes, but the latter is not expected to lead to the enrichment of nonsense error-prone codons in the 5’-end [83].
Other selective pressures are hypothesized to lead to unique patterns of codon usage at the 5’-end of CDSs. A key challenge in understanding selection on synonymous codon usage is the potential for correlated selective effects. First, as previously discussed, is the “5’-ramp hypothesis.” In this case, the enrichment of slow codons, which generally have higher NSE probabilities Pr(NSE), could lead to a similar pattern observed for selection against nonsense errors. However, our previous work using a population genetics model to quantify natural selection shaping codon usage in S. cerevisiae and E. coli showed that the enrichment in slow codons in the 5’-end is more consistent with a reduction in the strength of natural selection for fast codons, rather than selection generally favoring slow codons in this region [82,83]. Second, codon usage at the 5’-end is hypothesized to be shaped by selection against mRNA secondary structure to promote efficient translation initiation [75,84]. However, evidence in S. cerevisiae and other species suggests this effect is primarily relevant to the first 10–15 codons [85], whereas we see a gradual decrease in the probability of nonsense errors over a far broader region in the 5’-end. Third, previous work hypothesized natural selection against ribosomal frameshifting would be stronger at the 5’-end [86]. Ribosomal frameshifts often result in premature translation termination due to encountering an out-of-frame stop codon soon after the frameshift [87–89] and can result from the presence of slow codons in the ribosome A-site in combination with a near- or non-cognate codon-anticodon match in the P-site [42,43,86]. We expect selection against ribosomal frameshifts to be strongly correlated with selection against nonsense errors. Along these lines, codons prone to causing frameshifts when located in the P-site [42] had higher NSE rates b. Future work will focus on quantifying the evolution of codon usage bias as it relates to selection against nonsense errors.
Our results are based on extracting biological information from empirical data using a computational model. As is obligatory, “all models are wrong, but some are useful,” [90]. Using the parameter estimate from our model, we were able to test many predictions about expected coding sequence patterns if they were shaped by selection against nonsense errors, using the parameters from our model, many of which were confirmed, illustrating the model’s utility. Our PANSE model of ribosome movement is applicable to any ribosome profiling measurement. This allows for investigations into the impact of different growth conditions on nonsense error probabilities [21,23]. With the number of species with ribosome profiling measurements growing, it is possible to perform comparative analyses of nonsense error rates and probabilities. Furthermore, using the approaches outlined here, it is possible to quantify the impact of natural selection against nonsense errors on coding-sequence evolution across species and how this may change with the effective population size Ne [91].
Conclusion
By applying a model of translation to an exemplary ribosome profiling dataset in S. cerevisiae, we find multiple lines of evidence that nonsense errors play a significant role in protein-coding evolution that has largely been underappreciated. Overall, 59% of protein-coding genes exhibit signals of adaptation to reduce the cost of nonsense errors in S. cerevisiae. Natural selection to reduce the cost of ribosome pausing has been the predominant hypothesis to explain codon usage bias, but if the cost of nonsense errors is comparable, if not greater than the cost of pausing, then this hypothesis must be updated or revised. Further consideration of the impact and consequences of nonsense errors is critical for understanding the evolution of codon usage bias, which has been observed to varying degrees across all taxa.
Supporting information
S1 Fig. Ribogrid from analysis for Weinberg et al. data using the riboviz2 pipeline [49].
This illustrates the number of ribosome footprints assigned to nucleotide based on the 5’-end of the read. Darker colors indicate more ribosome footprints assigned to a nucleotide. The nucleotide at position 0 indicates the first nucleotide of the start codons.
https://doi.org/10.1371/journal.pgen.1012162.s002
(PDF)
S2 Fig. Impact of A-site assignment rules on parameter estimates.
Comparison of NSE rate estimates from Weinberg et al. data using either the “standard” A-site 15 nt offsets vs. the offsets estimated by riboWaltz. Spearman rank correlation coefficient is reported.
https://doi.org/10.1371/journal.pgen.1012162.s003
(PDF)
S3 Fig. Factors related to filtering genes from the final analyzed dataset.
(A) Distribution of correlations between position within a gene and ribosome density. (B-C) Genes exhibiting sudden increase in ribosome densities. Dashed lines indicate ATG codons. (D-E) Genes exhibiting a sudden decrease in ribosome densities.
https://doi.org/10.1371/journal.pgen.1012162.s004
(PDF)
S4 Fig. Confirmation of model’s capacity to estimate NSE rates b and elongation rates c.
(A) Comparison of NSE rates b for stop codons. Data from Weinberg et al. was used, slightly modified to allow for up to 15 codons from empirically determined 3’-UTRs [54]. (B) Log fold-changes of mean-centered waiting times between the elp1 deletion strain and the reference wild-type strain obtained from [28]. Top panel indicates the codons known to be impacted by this tRNA modification enzyme deletion, while the bottom indicates the other 58 sense codons. (C) Same as in (B), but with the NSE rates b.
https://doi.org/10.1371/journal.pgen.1012162.s005
(PDF)
S5 Fig. Deviations between real and simulated ribosome counts across all genes and all positions within the dataset.
Deviations are calculated as the log fold-difference.
https://doi.org/10.1371/journal.pgen.1012162.s006
(PDF)
S6 Fig. Impact of 5’-ramp region on parameter estimates.
Comparison of (A) elongation waiting times and (B) total initiation rates when considering only the first 200 codons (i.e., the 5’-ramp region) vs. the remainder of the genes. Spearman rank correlations are reported. Error bars represent 95% posterior probability intervals.
https://doi.org/10.1371/journal.pgen.1012162.s007
(PDF)
S7 Fig. First 100 codons are enriched in codons with higher NSE probabilities Pr(NSE).
Difference in NSE probabilities between codons enriched in the (A) 5’-end and (B) 3’-ends of coding sequences (first and last 100 termini). Wilcoxon rank sum test p-values are reported.
https://doi.org/10.1371/journal.pgen.1012162.s008
(PDF)
S8 Fig. Null distributions of slope estimates for regression lines relating codon position to the across-gene average in the probability of a nonsense error per position.
The slope for the real sequences is represented by the dashed line.
https://doi.org/10.1371/journal.pgen.1012162.s009
(PDF)
S9 Fig. Impact of cost on ribosome pausing C on total cost estimates.
Comparison of proportion of total cost per gene as a function of total energetic flux (cost times the protein production rate) based on (Assembly) and CROC.
https://doi.org/10.1371/journal.pgen.1012162.s010
(PDF)
S10 Fig. Breakdown of energetic costs.
Comparison of proportion of total fixed or variable cost per gene as a function of total energetic flux (cost times the protein production rate) based on (Assembly) and CROC.
https://doi.org/10.1371/journal.pgen.1012162.s011
(PDF)
S11 Fig. Comparison of datasets used as input for PANSE analysis.
(A) Comparison of metagene ribosome densities from Weinberg et al. [27], Wu et al. [29], and Ferguson et al. [53]. (B) Total number of ribosome footprints included in the final PANSE analysis for each dataset (excluding the 5’-ends), expressed as a percentage relative to the Ferguson et al. data. (C) Same as in (B), but only considering the CDSs included in the Ferguson et al. analysis. (D) Comparison of the NSE rate b estimates (on log10 scale) from the Weinberg et al. and Wu et al. datasets. Error bars represent the 95% HDIs. (E) Same as in (D), but using the Ferguson et al. dataset. (F) Comparison of PANSE estimated ribosome waiting times wc and the inverse of codon weights estimated via the tRNA adaptation index (tAI).
https://doi.org/10.1371/journal.pgen.1012162.s012
(PDF)
S1 Table. Gene-specific parameters estimated from PANSE and other models.
Contains parameter estimates relevant to gene-specific values from PANSE (initiation rate , probability of successful protein production
) and ROC-SEMPPR (protein production rate
), theoretical estimates of ribosome drop-off resilience [34], and gene expression estimates from high-throughput sequencing data (RNA-seq, Ribo-seq) obtained from previous studies [27].
https://doi.org/10.1371/journal.pgen.1012162.s013
(TSV)
S2 Table. Codon-specific parameters estimated from PANSE and other models.
Contains parameter estimates relevant to codon-specific values from PANSE (NSE rate b, ribosome waiting times 1/c) and ROC-SEMPPR (selection coefficients ), tRNA gene copy numbers and wobble efficiency, and frameshift competency [42].
https://doi.org/10.1371/journal.pgen.1012162.s014
(TSV)
S3 Table. Regression analysis comparing NSE rates
to codon properties.
Properties include missense error rates [52], frameshift competency [42], and number of stop codons one mutation away from the sense codon.
https://doi.org/10.1371/journal.pgen.1012162.s015
(TSV)
Acknowledgments
We thank Premal Shah, Matt Pennell, Edward Wallace, Daohan Jiang, Josh Schraiber, and Antonis Rokas for helpful discussions throughout the course of this project and the writing of this manuscript. Grammarly.com (Free subscription plan) was used to assist the authors in proofreading this manuscript. All suggested changes to spelling and grammar were manually validated by the authors. Grammarly.com was not used to generate any new ideas or sentences for this manuscript.
References
- 1. Wagner A. Energy constraints on the evolution of gene expression. Mol Biol Evol. 2005;22(6):1365–74. pmid:15758206
- 2. Russell JB, Cook GM. Energetics of bacterial growth: balance of anabolic and catabolic reactions. Microbiol Rev. 1995;59(1):48–62. pmid:7708012
- 3. Lynch M, Marinov GK. The bioenergetic costs of a gene. Proc Natl Acad Sci U S A. 2015;112(51):15690–5. pmid:26575626
- 4. Li WH. Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons. J Mol Evol. 1987;24(4):337–45. pmid:3110426
- 5. Bulmer M. The selection-mutation-drift theory of synonymous codon usage. Genetics. 1991;129(3):897–907. pmid:1752426
- 6. Shah P, Gilchrist MA. Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift. Proc Natl Acad Sci U S A. 2011;108(25):10231–6. pmid:21646514
- 7. Gilchrist MA, Chen WC, Shah P, Landerer CL, Zaretzki R. Estimating gene expression and codon-specific translational efficiencies, mutation biases, and selection coefficients from genomic data alone. Genome Biol Evol. 2015;7:1559–79.
- 8. Frumkin I, Lajoie MJ, Gregg CJ, Hornung G, Church GM, Pilpel Y. Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc Natl Acad Sci U S A. 2018;115(21):E4940–9. pmid:29735666
- 9. Drummond DA, Wilke CO. The evolutionary consequences of erroneous protein synthesis. Nat Rev Genet. 2009;10(10):715–24. pmid:19763154
- 10. Akashi H. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics. 1994;136(3):927–35. pmid:8005445
- 11. Drummond DA, Raval A, Wilke CO. A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 2006;23(2):327–37. pmid:16237209
- 12. Drummond DA, Wilke CO. Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008;134(2):341–52. pmid:18662548
- 13. Mordret E, Dahan O, Asraf O, Rak R, Yehonadav A, Barnabas GD, et al. Systematic detection of amino acid substitutions in proteomes reveals mechanistic basis of ribosome errors and selection for translation fidelity. Mol Cell. 2019;75(3):427–441.e5. pmid:31353208
- 14. Warnecke T, Hurst LD. GroEL dependency affects codon usage--support for a critical role of misfolding in gene evolution. Mol Syst Biol. 2010;6:340. pmid:20087338
- 15. Warnecke T, Hurst LD. Error prevention and mitigation as forces in the evolution of genes and genomes. Nat Rev Genet. 2011;12(12):875–81. pmid:22094950
- 16. Kurland CG. Translational accuracy and the fitness of bacteria. Annu Rev Genet. 1992;26:29–50. pmid:1482115
- 17. Eyre-Walker A. Synonymous codon bias is related to gene length in Escherichia coli: selection for translational accuracy? Mol Biol Evol. 1996;13(6):864–72. pmid:8754221
- 18. Gilchrist MA. Combining models of protein translation and population genetics to predict protein production rates from codon usage patterns. Mol Biol Evol. 2007;24(11):2362–72. pmid:17703051
- 19. Tsung K, Inouye S, Inouye M. Factors affecting the efficiency of protein synthesis in Escherichia coli. Production of a polypeptide of more than 6000 amino acid residues. J Biol Chem. 1989;264(8):4428–33. pmid:2538444
- 20. Jørgensen F, Adamski FM, Tate WP, Kurland CG. Release factor-dependent false stops are infrequent in Escherichia coli. J Mol Biol. 1993;230(1):41–50. pmid:8450549
- 21. Sin C, Chiarugi D, Valleriani A. Quantitative assessment of ribosome drop-off in E. coli. Nucleic Acids Res. 2016;44(6):2528–37. pmid:26935582
- 22. Dao Duc K, Song YS. The impact of ribosomal interference, codon usage, and exit tunnel interactions on translation elongation rate variation. PLoS Genet. 2018;14(1):e1007166. pmid:29337993
- 23. Awad S, Valleriani A, Chiarugi D. A data-driven estimation of the ribosome drop-off rate in S. cerevisiae reveals a correlation with the genes length. NAR Genom Bioinform. 2024;6(2):lqae036. pmid:38638702
- 24. dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004;32(17):5036–44. pmid:15448185
- 25. Kramer EB, Vallabhaneni H, Mayer LM, Farabaugh PJ. A comprehensive analysis of translational missense errors in the yeast Saccharomyces cerevisiae. RNA. 2010;16(9):1797–808. pmid:20651030
- 26. Joshi K, Bhatt MJ, Farabaugh PJ. Codon-specific effects of tRNA anticodon loop modifications on translational misreading errors in the yeast Saccharomyces cerevisiae. Nucleic Acids Res. 2018;46(19):10331–9. pmid:30060218
- 27. Weinberg DE, Shah P, Eichhorn SW, Hussmann JA, Plotkin JB, Bartel DP. Improved ribosome-footprint and mRNA measurements provide insights into dynamics and regulation of yeast translation. Cell Rep. 2016;14(7):1787–99. pmid:26876183
- 28. Chou H-J, Donnard E, Gustafsson HT, Garber M, Rando OJ. Transcriptome-wide analysis of roles for tRNA modifications in translational regulation. Mol Cell. 2017;68(5):978–992.e4. pmid:29198561
- 29. Wu CC-C, Zinshteyn B, Wehner KA, Green R. High-resolution ribosome profiling defines discrete ribosome elongation states and translational regulation during cellular stress. Mol Cell. 2019;73(5):959–970.e5. pmid:30686592
- 30. Gilchrist MA, Wagner A. A model of protein translation including codon bias, nonsense errors, and ribosome recycling. J Theor Biol. 2006;239(4):417–34. pmid:16171830
- 31. Shah P, Gilchrist MA. Effect of correlated tRNA abundances on translation errors and evolution of codon usage bias. PLoS Genet. 2010;6(9):e1001128. pmid:20862306
- 32. Yang Q, Yu C-H, Zhao F, Dang Y, Wu C, Xie P, et al. eRF1 mediates codon usage effects on mRNA translation efficiency through premature termination at rare codons. Nucleic Acids Res. 2019;47(17):9243–58. pmid:31410471
- 33. Gilchrist MA, Shah P, Zaretzki R. Measuring and detecting molecular adaptation in codon usage against nonsense errors during protein translation. Genetics. 2009;183(4):1493–505. pmid:19822731
- 34. Bonnin P, Kern N, Young NT, Stansfield I, Romano MC. Novel mRNA-specific effects of ribosome drop-off on translation rate and polysome profile. PLoS Comput Biol. 2017;13(5):e1005555. pmid:28558053
- 35. Freistroffer DV, Kwiatkowski M, Buckingham RH, Ehrenberg M. The accuracy of codon recognition by polypeptide release factors. Proc Natl Acad Sci U S A. 2000;97(5):2046–51. pmid:10681447
- 36. Chavatte L, Frolova L, Laugâa P, Kisselev L, Favre A. Stop codons and UGG promote efficient binding of the polypeptide release factor eRF1 to the ribosomal A site. J Mol Biol. 2003;331(4):745–58. pmid:12909007
- 37. Wada M, Ito K. Misdecoding of rare CGA codon by translation termination factors, eRF1/eRF3, suggests novel class of ribosome rescue pathway in S. cerevisiae. FEBS J. 2019;286.
- 38. Svidritskiy E, Demo G, Korostelev AA. Mechanism of premature translation termination on a sense codon. J Biol Chem. 2018;293(32):12472–9. pmid:29941456
- 39. Menninger JR. Peptidyl transfer RNA dissociates during protein synthesis from ribosomes of Escherichia coli. J Biol Chem. 1976;251(11):3392–8. pmid:776968
- 40. Caplan AB, Menninger JR. Tests of the ribosomal editing hypothesis: amino acid starvation differentially enhances the dissociation of peptidyl-tRNA from the ribosome. J Mol Biol. 1979;134(3):621–37. pmid:395319
- 41. Farabaugh PJ, Björk GR. How translational accuracy influences reading frame maintenance. EMBO J. 1999;18(6):1427–34. pmid:10075915
- 42. Vimaladithan A, Farabaugh PJ. Special peptidyl-tRNA molecules can promote translational frameshifting without slippage. Mol Cell Biol. 1994;14(12):8107–16. pmid:7969148
- 43. Sundararajan A, Michaud WA, Qian Q, Stahl G, Farabaugh PJ. Near-cognate peptidyl-tRNAs promote +1 programmed translational frameshifting in yeast. Mol Cell. 1999;4(6):1005–15. pmid:10635325
- 44. Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 2009;324(5924):218–23. pmid:19213877
- 45. Aviner R, Geiger T, Elroy-Stein O. Genome-wide identification and quantification of protein synthesis in cultured cells and whole tissues by puromycin-associated nascent chain proteomics (PUNCH-P). Nat Protoc. 2014;9(4):751–60. pmid:24603934
- 46. Brar GA, Weissman JS. Ribosome profiling reveals the what, when, where and how of protein synthesis. Nat Rev Mol Cell Biol. 2015;16(11):651–64. pmid:26465719
- 47. Tunney R, McGlincy NJ, Graham ME, Naddaf N, Pachter L, Lareau LF. Accurate design of translational output by a neural network model of ribosome distribution. Nat Struct Mol Biol. 2018;25(7):577–82. pmid:29967537
- 48. Lareau LF, Hite DH, Hogan GJ, Brown PO. Distinct stages of the translation elongation cycle revealed by sequencing ribosome-protected mRNA fragments. eLife. 2014.
- 49. Cope AL, Anderson F, Favate J, Jackson M, Mok A, Kurowska A, et al. riboviz 2: a flexible and robust ribosome profiling data analysis and visualization workflow. Bioinformatics. 2022;38(8):2358–60. pmid:35157051
- 50. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Series B: Stati Methodol. 2002;64(4):583–639.
- 51. Cope AL, Shah P. Intragenomic variation in non-adaptive nucleotide biases causes underestimation of selection on synonymous codon usage. PLoS Genet. 2022;18(6):e1010256. pmid:35714134
- 52. Landerer C, Poehls J, Toth-Petroczy A. Fitness effects of phenotypic mutations at proteome-scale reveal optimality of translation machinery. Mol Biol Evol. 2024;41(3):msae048. pmid:38421032
- 53. Ferguson L, Upton HE, Pimentel SC, Mok A, Lareau LF, Collins K, et al. Streamlined and sensitive mono- and di-ribosome profiling in yeast and human cells. Nat Methods. 2023;20(11):1704–15. pmid:37783882
- 54. Mangkalaphiban K, He F, Ganesan R, Wu C, Baker R, Jacobson A. Transcriptome-wide investigation of stop codon readthrough in Saccharomyces cerevisiae. PLoS Genet. 2021;17(4):e1009538. pmid:33878104
- 55. Wallace EWJ, Airoldi EM, Drummond DA. Estimating selection on synonymous codon usage from noisy experimental data. Mol Biol Evol. 2013;30(6):1438–53. pmid:23493257
- 56. Beznosková P, Pavlíková Z, Zeman J, Echeverría Aitken C, Valášek LS. Yeast applied readthrough inducing system (YARIS): an invivo assay for the comprehensive study of translational readthrough. Nucleic Acids Res. 2019;47(12):6339–50. pmid:31069379
- 57. Nedialkova DD, Leidel SA. Optimization of codon translation rates via tRNA modifications maintains proteome integrity. Cell. 2015;161(7):1606–18. pmid:26052047
- 58. Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res. 2004;33:261–304.
- 59. Jørgensen F, Kurland CG. Processivity errors of gene expression in Escherichia coli. J Mol Biol. 1990;215(4):511–21. pmid:2121997
- 60. Nagao A, Nakanishi Y, Yamaguchi Y, Mishina Y, Karoji M, Toya T, et al. Quality control of protein synthesis in the early elongation stage. Nat Commun. 2023;14(1):2704. pmid:37198183
- 61. Ogle JM, Brodersen DE, Clemons WM Jr, Tarry MJ, Carter AP, Ramakrishnan V. Recognition of cognate transfer RNA by the 30S ribosomal subunit. Science. 2001;292(5518):897–902. pmid:11340196
- 62. Qin H, Wu WB, Comeron JM, Kreitman M, Li W-H. Intragenic spatial patterns of codon usage bias in prokaryotic and eukaryotic genomes. Genetics. 2004;168(4):2245–60. pmid:15611189
- 63. Hussmann JA, Patchett S, Johnson A, Sawyer S, Press WH. Understanding biases in ribosome profiling experiments reveals signatures of translation dynamics in yeast. PLoS Genet. 2015;11(12):e1005732. pmid:26656907
- 64. Yang J-R, Liao B-Y, Zhuang S-M, Zhang J. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci U S A. 2012;109(14):E831-40. pmid:22416125
- 65. Zaher HS, Green R. A primary role for release factor 3 in quality control during translation elongation in Escherichia coli. Cell. 2011;147(2):396–408. pmid:22000017
- 66. Chiabudini M, Tais A, Zhang Y, Hayashi S, Wölfle T, Fitzke E, et al. Release factor eRF3 mediates premature translation termination on polylysine-stalled ribosomes in Saccharomyces cerevisiae. Mol Cell Biol. 2014;34(21):4062–76. pmid:25154418
- 67. Presnyak V, Alhusaini N, Chen Y-H, Martin S, Morris N, Kline N, et al. Codon optimality is a major determinant of mRNA stability. Cell. 2015;160(6):1111–24. pmid:25768907
- 68. Chadani Y, Niwa T, Izumi T, Sugata N, Nagao A, Suzuki T, et al. Intrinsic ribosome destabilization underlies translation and provides an organism with a strategy of environmental sensing. Mol Cell. 2017;68(3):528-539.e5. pmid:29100053
- 69. Nguyen HA, Hoffer ED, Fagan CE, Maehigashi T, Dunham CM. Structural basis for reduced ribosomal A-site fidelity in response to P-site codon-anticodon mismatches. J Biol Chem. 2023;299(4):104608. pmid:36924943
- 70. Eyler DE, Green R. Distinct response of yeast ribosomes to a miscoding event during translation. RNA. 2011;17(5):925–32. pmid:21415142
- 71. Erdmann-Pham DD, Dao Duc K, Song YS. The key parameters that govern translation efficiency. Cell Syst. 2020;10(2):183–192.e6. pmid:31954660
- 72.
Sokal RR, Rohlf FJ. Biometry - the principles and practices of statistics in biological research. 3rd ed. W.H. Freeman; 1995.
- 73. Wright G, Rodriguez A, Li J, Clark PL, Milenković T, Emrich SJ. Analysis of computational codon usage models and their association with translationally slow codons. PLoS One. 2020;15(4):e0232003. pmid:32352987
- 74. Tuller T, Carmi A, Vestsigian K, Navon S, Dorfan Y, Zaborske J, et al. An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell. 2010;141(2):344–54. pmid:20403328
- 75. Goodman DB, Church GM, Kosuri S. Causes and effects of N-terminal codon bias in bacterial genes. Science. 2013;342(6157):475–9. pmid:24072823
- 76. Bentele K, Saffert P, Rauscher R, Ignatova Z, Blüthgen N. Efficient translation initiation dictates codon usage at gene start. Mol Syst Biol. 2013;9:675. pmid:23774758
- 77. Osterman IA, Chervontseva ZS, Evfratov SA, Sorokina AV, Rodin VA, Rubtsova MP, et al. Translation at first sight: the influence of leading codons. Nucleic Acids Res. 2020;48(12):6931–42. pmid:32427319
- 78. Zhao T, Chen Y-M, Li Y, Wang J, Chen S, Gao N, et al. Disome-seq reveals widespread ribosome collisions that promote cotranslational protein folding. Genome Biol. 2021;22(1):16. pmid:33402206
- 79. Sejour R, Leatherwood J, Yurovsky A, Futcher B. Enrichment of rare codons at 5’ ends of genes is a spandrel caused by evolutionary sequence turnover and does not improve translation. Elife. 2024;12:RP89656. pmid:39008347
- 80. Zhang J, Qian W. Functional synonymous mutations and their evolutionary consequences. Nat Rev Genet. 2025;26(11):789–804. pmid:40394196
- 81. Verma M, Choi J, Cottrell KA, Lavagnino Z, Thomas EN, Pavlovic-Djuranovic S, et al. A short translational ramp determines the efficiency of protein synthesis. Nat Commun. 2019;10(1):5774. pmid:31852903
- 82. Cope AL, Gilchrist MA. Quantifying shifts in natural selection on codon usage between protein regions: a population genetics approach. BMC Genomics. 2022;23(1):408. pmid:35637464
- 83. Cope AL, Hettich RL, Gilchrist MA. Quantifying codon usage in signal peptides: Gene expression and amino acid usage explain apparent selection for inefficient codons. Biochim Biophys Acta Biomembr. 2018;1860(12):2479–85. pmid:30279149
- 84. Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009;324(5924):255–8. pmid:19359587
- 85. Gu W, Zhou T, Wilke CO. A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput Biol. 2010;6(2):e1000664. pmid:20140241
- 86. Huang Y, Koonin EV, Lipman DJ, Przytycka TM. Selection for minimization of translational frameshifting errors as a factor in the evolution of codon usage. Nucleic Acids Res. 2009;37(20):6799–810. pmid:19745054
- 87. Clarke CH, Miller PG. Consequences of frameshift mutations in the trp A, trp B and lac I genes of Escherichia coli and in Salmonella typhimurium. J Theor Biol. 1982;96(3):367–79. pmid:6181349
- 88. Seligmann H, Pollock DD. The ambush hypothesis: hidden stop codons prevent off-frame gene reading. DNA Cell Biol. 2004;23(10):701–5. pmid:15585128
- 89. Itzkovitz S, Alon U. The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genome Res. 2007;17(4):405–12. pmid:17293451
- 90.
Launer RL, Wilkinson GN. Box G E P. Robustness in the Strategy of Scientific Model Building. Academic Press; 1979. pp. 201–36.
- 91. Charlesworth B. Effective population size and patterns of molecular evolution and variation. Nat Rev Genet. 2009;10(3):195–205.