Whole cell biophysical modeling of codon-tRNA competition reveals novel insights related to translation dynamics

doi:10.1371/journal.pcbi.1008038

Fig 1.

Overview of the whole cell translation model.

Schematic visualization of the simulated elements at the cell level (A), at the mRNA level (B) and a simplified view of the state machines at the codon level (C), for initiation and elongation (the detailed state machines appear in F-G Figs in S1 File). At the cell level, a transcriptome of mRNA molecules with a finite pool of ribosomes and tRNAs is simulated, leading to a competition for these resources. At the mRNA level, we utilized a novel generalized deterministic TASEP model that incorporates both accurate dynamics at the codon level and dependence on cellular resources, which influences these dynamics. On the codon level, for each codon the dynamics are dictated by a state machine, an object that holds the state (e.g. “this codon is anticipating a tRNA”) and the rules of state transition order and conditions (see sub-section Generalized deterministic TASEP and state machines).

More »

Expand

Fig 2.

High correlation between the predictions of the MP-SMTM model and measured data for E. coli.

(A) Termination rate in steady state (predicted by the model) and protein abundance (empirical data); (B) Local initiation rate (as estimated for the model) and protein abundance (empirical data); (C) Mean ribo-seq read count and mean simulated occupancy. (D) Ribosomal density profiles for both simulation (average occupancy per codon) and ribo-seq (average read count per codon). In (A), (B) and (D) mRNAs with level lower than 0.2 were omitted from this analysis to avoid discretization errors of the simulation. In (A), (B) and (C) each point represents a single mRNA type. (All terms in this figure are defined in the Methods section, sub-sections System parameters, Prediction of protein synthesis rate and Additional terminology).

More »

Expand

Fig 3.

The effect of the codon order and composition on the total termination rate.

(A) Schematic illustration of the different randomization types. (B) Box plot for total simulated termination rate (i.e. the sum of termination rates of all mRNAs) distributions (10 values each) compared to the un-randomized scenario. The p-values shown are the result of a single-sample two-sided t-test.

More »

Expand

Fig 4.

The model promotes understanding and provides analysis framework for heterologous expression problems.

(A) Total E. coli and GFP genes termination rates for various GFP variants as heterologous genes. Two method were used for variant generation, as described in the sub section Heterologous expression in the Methods. The original variant is shown in orange. Smaller blue dots are related to variants in which one codon was chosen as substituted all synonymous codons (61 such variants in total). Other dots represent variants in which a single representing codon was chosen for each amino acid, according to some optimality score (either TDR, ESDR or inverse occupancy). (B) Two variants (marked with arrows in (A)) are compared in terms of ESDR. Codons for which the ESDR is different represent changes in supply and demand of associated tRNAs and allow to understand the results in (A). (C) Comparison of three optimization variants in terms of their ESDR, per codon, relative to the unoptimized variant (red represents a higher ESDR value). Some codons (such as CCC) exhibit an increase in ESDR in all variants, indicating that increasing the supply/demand for these codons can improve overall translational efficiency.

More »

Expand

Fig 5.

A multivariable regressor for predicting protein abundance (PA) and optical density (OD) of GFP variants.

(A) Pearson correlation between model prediction and the measured PA values of the GFP variants, as function of the number of feature selected for the train and the test sets. This graph demonstrates the approach taken for feature selection: For 100 times, a train (~67%) and test (~33%) sets were randomly selected. Each time, the next best feature was selected to be the one the increases R² the most in the test set (for more details, see sub-section 'Regressor features selection' in the Methods section). This result suggests that a model with more than ~10 features will show poor predictivity due to over-fitting. (B) After choosing the best features and sorting them, this figure shows the correlation between the predicted PA (blue) and OD (red) values and the measured ones, for increasing number of features. A features' set based on ESDR (continues line) was compared to a simpler metric of codon-count (dashed lines). In both cases (PA and OD) the ESDR-based model performed better and reached impressive correlation with empirical data, demonstrating the importance of our model (Data for this model was taken from Kudla et. al [34]).

More »

Expand