A quantitative model for the rate-limiting process of UGA alternative assignments to stop and selenocysteine codons

Ambiguity in genetic codes exists in cases where certain stop codons are alternatively used to encode non-canonical amino acids. In selenoprotein transcripts, the UGA codon may either represent a translation termination signal or a selenocysteine (Sec) codon. Translating UGA to Sec requires selenium and specialized Sec incorporation machinery such as the interaction between the SECIS element and SBP2 protein, but how these factors quantitatively affect alternative assignments of UGA has not been fully investigated. We developed a model simulating the UGA decoding process. Our model is based on the following assumptions: (1) charged Sec-specific tRNAs (Sec-tRNASec) and release factors compete for a UGA site, (2) Sec-tRNASec abundance is limited by the concentrations of selenium and Sec-specific tRNA (tRNASec) precursors, and (3) all synthesis reactions follow first-order kinetics. We demonstrated that this model captured two prominent characteristics observed from experimental data. First, UGA to Sec decoding increases with elevated selenium availability, but saturates under high selenium supply. Second, the efficiency of Sec incorporation is reduced with increasing selenoprotein synthesis. We measured the expressions of four selenoprotein constructs and estimated their model parameters. Their inferred Sec incorporation efficiencies did not correlate well with their SECIS-SBP2 binding affinities, suggesting the existence of additional factors determining the hierarchy of selenoprotein synthesis under selenium deficiency. This model provides a framework to systematically study the interplay of factors affecting the dual definitions of a genetic codon.


Criteria of estimated parameter selection
The grid search algorithm reports multiple estimated parameters to experimental data under given ρ p . The parameters are sorted by loss function Q 2 , and top 5 combinations are retrieved for each given ρ p . The best estimated parameter values to fit experimental data have to satisfy the following criteria.
1. For each ρ p value, select the five parameter combinations that give the lowest Q 2 . Rule out the ρ p values with substantially higher average Q 2 than others.
2. Among the remaining ρ p values, select the ones with the lowest degeneracy of parameters k 1 , kF, k 3 and T total .
Q 2 represents the fitness of estimated parameters. In the meanwhile, the lower degeneracy of top combinations also reflects the overall fitness of given ρ p . Degeneracy is assessed by the coefficient of variation (CV%) for {k 1 , kF, k 3 , T total }. Since quantities of release factor and tRNA sec are invariant in the four SECIS constructs, kF and T total should be similar among them. Therefore, the consensus of kF and T total is considered as an additional criteria.
We searched parameter values on six-dimensional grids, where boundary values of the grids were determined according physiologically reasonable ranges. The upper boundary of six parameters, {k 1 , kF, k 3 , T total , ρ, ρ p } is {1e2, 1e4, 1e2, 1e3, 1e4, 1e4}. The lower boundary setting is {1e-3, 1e-3, 1e-3, 1e-1, 1e-3, 1e-3}. The intervals between the low and high boundaries in each parameter are equally dissected into 12 subintervals. We evaluated the Q 2 on the grids and selected the parameter configurations according to criteria 1-3. Top solutions are in Table 4, Table S3 and supporting file 8, except for GPX1. The GPX1 solution obtained from the aforementioned search range and criteria has excessively lower kF than other genes. To enforce all solutions satisfy criteria 3, we narrowed the search ranges of {kF, T total } to be {1e4, 80} and {7e3, 60}, respectivly. After searching by refined grid, we obtain the final solutions and summaried as Table 4 and supporting file 8.

Derivation of the mRNA accuracy rates
Denote m, S 1 and S 2 the concentrations of free mRNAs, charged tRNAs, and release factors, and mS 1 , mS 2 as the concentrations of mRNAs bound by tRNA or RF. The rates of mRNA binding with tRNAs and RFs are: (1) Within a fixed short time interval, the probabilities of mRNA binding with tRNA and RF are linearly related to k 1 S 1 and k 2 S 2 respectively. We establish the following assumptions: 1. At the beginning of each time interval, a fixed amount of f 0 mRNAs are generated.
2. Each mRNA binds to tRNA, RF or none with probabilities p 1 ,p 2 ,1−p 1 −p 2 respectively. 3. Each mRNA possesses a state variable denoting the number of synthesis cycles it already undergoes.
4. If an mRNA undergoes no synthesis cycle before and binds to RF, then it is degraded with certainty.
5. If an mRNA undergoes N synthesis cycles, then it is degraded with certainty.
Denote c the state variable of the number of synthesis cycles an mRNA already undergoes. We derive the probabilities P(c=0),P(c=1),⋯,P(c=N) as follows.
At time step t, denote n 0 (t), n 1 (t), …, n N (t) the number of mRNA molecules that already undergo 0, 1, ⋯N synthesis cycles. At time step t+1, there are the following changes: 1. The mRNAs with c=0 include the old mRNAs that do not bind with tRNAs and RFs, and newly supplied mRNAs.
2. The mRNAs with c=1 include the old mRNAs with c=1 at time t and do not bind with tRNAs and RFs in this cycle, and the old mRNAs with c=0 at time t and bind with tRNAs.
3. The mRNAs with c=k, k>1 include the old mRNAs with c=k at time t and do not bind with tRNAs and RFs in this cycle, and the old mRNAs with c=k−1 at time t and bind with tRNAs or RFs.
4. The mRNAs with c=N become degraded.