Design principles for the glycoprotein quality control pathway

Newly-translated glycoproteins in the endoplasmic reticulum (ER) often undergo cycles of chaperone binding and release in order to assist in folding. Quality control is required to distinguish between proteins that have completed native folding, those that have yet to fold, and those that have misfolded. Using quantitative modeling, we explore how the design of the quality-control pathway modulates its efficiency. Our results show that an energy-consuming cyclic quality-control process, similar to the observed physiological system, outperforms alternative designs. The kinetic parameters that optimize the performance of this system drastically change with protein production levels, while remaining relatively insensitive to the protein folding rate. Adjusting only the degradation rate, while fixing other parameters, allows the pathway to adapt across a range of protein production levels, aligning with in vivo measurements that implicate the release of degradation-associated enzymes as a rapid-response system for perturbations in protein homeostasis. The quantitative models developed here elucidate design principles for effective glycoprotein quality control in the ER, improving our mechanistic understanding of a system crucial to maintaining cellular health.


Introduction
Model of glycoprotein quality control via the chaperone binding cycle. P g represents the monoglucosylated proteins, P c the unfolded chaperone-bound proteins, P cf the folded chaperone-bound proteins, P the proteins lacking a glucose tag, P b the background proteins, and P cb the chaperone-bound background proteins.
recognize proteins in need of folding, it is assumed to be non-specific and to allow the general binding of 'background' proteins onto the chaperones. Such background proteins could include ER-resident proteins, or folded proteins that have not yet been exported. The concentration of these additional background proteins is represented by P b (for free background proteins) and P cb for background proteins bound to the chaperone. We assume each chaperone can bind only one protein at a time.
Overall, the dynamics of the chaperone binding cycle are described by dP g dt = k g P + k -c P c − (k c C A + k -g )P g + k p , dP c dt = (k c P g + k -r P )C A − (k r + k -c + k f )P c , (1b) dP dt = k r P c + k -g P g − (k g + k -r C A − k d )P , Some proteins entering the chaperone binding cycle are unable to natively fold, as a result of translation errors or mutations [46]. Heat and oxidative stress can also cause proteins to enter states that cannot fold [46], and these stressors may have a differential impact on different proteins. We label these terminally misfolded, unfoldable proteins as simply 'misfolded'. Their dynamics are described by equations similar to Eq. 1a-c, with analogous protein quantities P * g , P * c , and P * . The misfolded protein production rate is defined as k * p and the folding rate is set to zero (k * f = 0). All other rate constants are assumed to be identical for foldable and misfolded proteins.
Both the background proteins (P b ) and misfolded proteins (P * i ) represent proteins capable of binding to and occupying the limited supply of total chaperone (C tot ) available in the cell. The concentration of available chaperones is then given by C A = C tot − P c − P cf − P * c − P cb . In our model, background proteins represent those proteins that can bind weakly to the chaperone in the absence of a glucose moiety flagging them as newly-made proteins requiring folding. These can represent, for example, already folded proteins. They are not subject to the glucosylation and Quality control engenders a trade-off between folding accuracy and speed. Shaded regions in the phase diagram represent all combinations of folding fraction f and total steady-state unfolded protein P tot that can be achieved by varying cycle parameters k c , k -c , k r , k -r , k g , and k -g , while keeping a fixed folding rate k f , production rate k pt , misfolded fraction m f , and background protein concentration P b . Solid lines represent the maximal achievable folding fraction f max . Dots represent the efficiency metric f * max .
deglucosylation processes of the quality-control cycle. By contrast, 'misfolded' proteins represent those that move through the quality control cycle with the same rate constants as normal proteins but are ultimately incapable of folding. In other words, the enzymes of the quality control cycle cannot distinguish these unfoldable proteins from native proteins [22,29,30]. The total rate of proteins entering the cycle is defined as k pt = k p + k * p with a misfolded fraction m f = k * p /(k p + k * p ) unable to fold. Equations 1 and the corresponding misfolded protein equations are non-dimensionalized by the timescale of glucose trimming for chaperone-bound proteins, setting k r = 1, and by total chaperone number, setting C tot = 1 (see Methods for details).
For a given set of k i , the steady state protein concentrations P i can be found, as derived in the Methods. We will use this steady-state solution to evaluate performance, on the assumption that protein production and processing parameters remain constant over timescales much longer than the individual cycle time.

Quality control efficiency and energy input
We begin by considering how the glycoprotein folding system illustrated in Fig. 1 is governed by a trade-off between accuracy and speed. On the one hand, the system needs to achieve robust, error-free quality control. On the other hand, it needs to process incoming proteins sufficiently rapidly to keep up with production and avoid accumulation of unfolded proteins in the cell. We quantify system accuracy using the steady-state fraction of foldable proteins that successfully undergo folding rather than degradation, A higher folding fraction f indicates a more efficient folding process that produces more functional proteins per input of nascent unfolded proteins.
A second metric for processing efficiency is the total unfolded protein present in the cycle at steady-state: P tot = P g + P * g + P c + P * c + P + P * . Low values of P tot correspond to rapid processing of individual nascent proteins that prevents their accumulation in the system. High concentrations of unfolded proteins can lead to protein aggregation, which impede cellular function and health [7]. With a typical influx to the ER of 0.1-1 million proteins per minute in each cell [47], proteins accumulate rapidly if the folding system cannot keep up with production. High protein concentrations also induce ERAD and the unfolded protein response to limit the accumulation of protein aggregates, curtailing the throughput of functional proteins [48].
Overall, we aim to understand how glycoprotein quality control can achieve both efficient shunting of foldable proteins towards folding rather than degradation, and rapid processing that limits the accumulation of unfolded proteins. To assess this interplay, we determine the maximum folding fraction for each fixed value of total unfolded proteins, generating a phase-diagram of achievable values for these two metrics (Fig. 2). For fixed values of the production rate k pt , misfolded fraction m f , folding rate k f , and background protein level P b , the cycle rate constants k c , k -c , k r , k -r , k g , and k -g are allowed to vary (details in Methods) to map out the space of accessible efficiency metrics. The curves of maximum folding fraction vs. total unfolded protein represent a Pareto frontier [49] of folding cycle performance, where performance above or to the left of the curves in Fig. 2 is not achievable. In Fig. 2, protein production (k pt ), misfolded fraction (m f ), and background protein concentration (P b ) are fixed for all curves, and each curve has a different protein folding speed (k f ). Faster folding speeds allow for more efficient folding at each given value for the total unfolded protein. The Pareto frontier has a characteristic shape of an increasing f max at low P tot , followed by a plateau in f max at high P tot . These curves demonstrate the trade-off between the two measures for efficient quality control, showing that maximization of folding fraction and minimization of total unfolded protein cannot be simultaneously achieved.
The characteristic curve shape in Fig. 2 for f max vs. P tot suggests it is not always feasible to operate the glycoprotein quality control pathway at or near the maximum folding fraction as these high folding fractions can require a very high concentration of unfolded proteins. To assess pathway performance, we choose to limit the total unfolded protein quantity to P tot = 1, corresponding to a total unfolded protein concentration equal to the concentration of chaperones. We then define the folding efficiency (f * max ) as the maximum folding fraction at P tot = 1, serving as an overall utility function to evaluate the performance of the glycoprotein quality control pathway. This metric represents the best efficiency that can be achieved by the pathway without accumulating so many unfolded proteins as to overwhelm the binding capacity of the chaperones. The consensus physiological model of the glycoprotein quality control pathway forms a cycle (Fig. 1), with proteins proceeding through the various states in a directed fashion. This directed protein flux requires free-energy dissipation [50], representing a cost to cellular resources. To evaluate the impact of this free-energy dissipation on pathway performance, we consider how the folding efficiency depends on the free energy input, for fixed values of protein production rate k pt , misfolded fraction m f , and protein folding speed k f . The free energy driving the quality control cycle is given by [50] For each value of this driving energy, the cycle rate constants are allowed to vary so as to maximize the folding efficiency f * max . Figure 3a shows that the folding efficiency can increase with the cycle driving energy. In the absence of chaperone-binding background proteins (P b = 0), the optimal folding fraction is independent of the energy input into the system. However, when there are Nonequilibrium driving improves performance. Each curve adjusts k c , k -c , k r , k -r , k g , k -g to maximize the folding fraction (Eq. 2) while the total cycle energy (Eq. 3) is varied and the total unfolded protein P tot = P g + P * g + P c + P * c + P + P * is constrained to equal one. Other parameters are fixed for each curve. (a) Each curve shows a distinct level of background proteins P b , with fixed k pt = 0.1, k f = 0.1, and m f = 0.001 for all curves. (b) Effects of increasing protein folding speed (red dashed), increasing protein production (blue dotted), and increasing misfolded fraction (green dashed-dotted) relative to the black curve, which is identical to the corresponding curve in (a). Background proteins are fixed to P b = 1 for all curves.
background proteins present (P b > 0), increasing the energy driving the quality control cycle enables more efficient allocation of chaperone resources specifically to foldable rather than background proteins. For example, reducing the rebinding rate of deglucosylated proteins (k -r ) would decrease the fraction of chaperones occupied by background proteins. In the extreme limit k -r → 0, background proteins no longer contribute to the system, and the maximal folding efficiency is achieved. However, fully eliminating binding of unglucosylated proteins would require an infinite energy input to provide a fully irreversible process.
In Fig. 3b, faster folding (higher k f ) leads to a higher folding efficiency, because faster folding can better compete with degradation, and folded proteins free chaperones for other proteins by exiting the cycle. Both higher protein production (k pt ) and misfolded fraction (m f ) lead to a lower folding efficiency because fewer chaperones are unoccupied and available for foldable protein binding.

Comparison of performance between models
A finite driving energy for the quality control cycle implies the presence of reverse processes for all the cycle transitions. We proceed to consider how the presence of the non-physiological reverse transitions for chaperone rebinding k -r and unbinding k -c modulates the pathway efficiency. Figure 4 shows that f * max monotonically decreases as k -r increases, for all cases where background proteins are present (P b > 0). This result suggests that removing untagged chaperone binding (i.e. setting k -r = 0) improves the performance of the chaperone cycle, allowing higher folded protein throughput. Removing untagged binding allows only those proteins recognized as foldable to occupy the chaperone. For the moderate level of background proteins assumed here (P b = 1), this effect becomes small when k -r < 1 (corresponding to a rebinding rate smaller than the rate of deglucosylation and chaperone unbinding). However, its importance increases for higher values of P b (see P b = 10 curve in Fig. 4). Removing the untagged rebinding process entirely can protect the quality control system from potential fluctuations in the total levels of untagged Untagged protein binding is disadvantageous. Maximal achievable folding fraction at fixed unfolded protein, P tot = 1, plotted versus the untagged rebinding rate k -r , as cycle parameters cycle parameters k c , k -c , k r , k g , k -g are free to vary. Other curves show similar behavior when folding and production rates are altered. background protein that can result in unproductive chaperone occupation. Having demonstrated the detrimental effects of untagged rebinding, we hereafter set k -r = 0, removing this process from the cycle. We next turn our attention to how quality control efficiency varies with k -c , the rate of protein detachment from the chaperone without removal of the glucose tag. For low production and slow folding rates, the folding fraction is maximized or nearly maximized when k -c is kept low (Fig. 5). In this regime, it is advantageous for the quality control cycle to operate slowly, and high values of k -c 5 lead to a reduction in the folding fraction by allowing proteins to escape the chaperones before they have a chance to fold.
By contrast, at high production and fast folding rates, the folding fraction peaks at an intermediate k -c value. In this regime, rapid turnover through the quality control cycle is advantageous and altering the unbinding rate k -c leads to two competing effects. On the one hand, more rapid unbinding allows already-folded proteins to be rapidly removed from the chaperones, freeing chaperones to bind other nascent proteins. When the folding process itself is very fast, then already-folded proteins (P cf ) can occupy a significant fraction of the available chaperones (Fig. 5b), leading to a decrease in efficiency for low unbinding rates k -c . This effect is not seen for slowly folding proteins, which can be released from the chaperones sufficiently rapidly by the standard deglucosylation pathway (k r ). On the other hand, if the unbinding rate becomes much higher than the folding rate, then there is a tendency for proteins to detach from the chaperone before they can fold, manifesting as low values of P c (Fig. 5b). Thus, very rapid unbinding reduces the efficiency of the system for both rapidly folding and slowly folding cases (Fig. 5a). Figure 5 suggests that different rates of chaperone unbinding (k -c ) become optimal in different regimes, depending on whether protein production is sufficiently high and folding is sufficiently slow to overwhelm the available quantity of chaperones. Chaperones in the ER, such as BiP, are thought to be present in excess quantities [51][52][53], to facilitate rapid chaperone binding of nascent proteins. This suggests that the glycoprotein quality control pathway typically operates in the regime of relatively low production k pt 1, so that protein release from chaperones can keep up with the incoming proteins and the chaperones do not become overwhelmed. At low protein production, Fig. 5 shows high values of k -c primarily decrease the maximum folding fraction. Removing the ability of a protein to detach from a chaperone without glucose trimming should thus improve the performance of the chaperone binding cycle, and folding proteins should be tightly bound to the chaperone until glucose removal. This tight binding may have additional functional importance, such as facilitating recruitment of other enzymes important for folding [30,54] or as a by-product of the high specificity of chaperone-glucose interaction [55].
Figures 4 and 5 demonstrate that removing non-specific chaperone binding (k -r ) and detachment of proteins from the chaperone without glucose trimming (k -c ) improves the performance of the chaperone binding cycle by increasing the maximum folding fraction with a limited accumulation of unfolded protein. The consensus physiological model, with these two processes absent, is thus shown to be more efficient (in the low-production regime) than the full model illustrated in Fig. 1.
We now explore further glycoprotein quality control pathway model variations, including those that are not cyclic (Fig. 6a). The non-cyclic models include all possible variations of a three-state model that lack untagged binding (no k -r ) and are capable of producing a finite steady-state solution. We compare the performance of these models to the consensus physiological model in terms of the efficiency metric f * max , at varying levels of protein production (Fig. 6b).
The WB (weak binding) model allows proteins to bind and unbind from the chaperone, until the glucose tag is removed. The WBSV (weak binding, safety valve) model introduces an additional "safety-valve" pathway where the glucose tag can be removed without chaperone binding. The OS (one shot) model treats chaperone binding and glucose trimming as irreversible, so that each protein only has one chance to attempt folding. These three models share a common feature -they lack the ability to restore a glucose tag once it is removed, irreversibly committing deglucosylated proteins (P ) to degradation. Each of these models performs worse than the physiological system in the regime of low protein production (k pt < 1), with the folding efficiency f * max dropping by approximately a factor of 2 (Fig. 6b). In the regime of high production, the WBSV model is capable of more effectively funneling proteins into a degradation-committed state, allowing it to significantly outperform the physiological model (Fig. 6b). However, as discussed previously, cells are believed to typically operate in a regime of limited protein production levels and excess chaperone capacity, so that we focus largely on model performance at low k pt .
In the physiological model, a deglucosylated protein (P ) is more likely to have first passed through chaperone binding than a monoglucosylated protein (P g ). This feature allows glucose moieties to serve as a form of molecular memory -the presence of a glucose tag means the protein is more likely to be newly made; the absence of the tag means the protein is more likely to have already attempted folding. A contrasting non-cyclic model is the NTM (no tag memory) model, which allows chaperone binding and glucose removal to function as independent processes (Fig. 6a). When the fraction of misfolded proteins (m f ) is low, the NTM model performs equivalently to the physiological model. However, when a substantial number of proteins entering the quality control cycle are incapable of being folded (high m f ), the NTM model is at a disadvantage to the physiological system (Fig. 6b). In the presence of such defective unfoldable proteins, the cyclic addition and removal of glucose tags allows the physiological model to have a memory of which proteins already attempted (and failed) folding and thus should be made vulnerable to degradation. Overall, the physiological model outperforms all non-cyclic models in the low-production regime.
The cyclic model with no safety valve (CNSV) exhibits the same cycle as the physiological model: of chaperone binding, deglucosylation upon release, and subsequent reglucosylation (Fig. 6a). However, it lacks the direct transition from the tagged state P g to the vulnerable state P . In the absence of this safety valve pathway, the CNSV model matches the performance of the physiological model at low production rates (Fig. 6b). However, for k pt > 1, proteins cannot be released from the chaperones fast enough to keep up with new protein production, and the CNSV model cannot reach a steady-state. In this regime, all chaperones would become clogged with protein and the protein would accumulate indefinitely. A similar behavior is observed for the non-cyclic WB model, which also lacks the safety-valve (Fig. 6b).

Performance and robustness of the physiological model
We now explore the performance of the physiological model, as well as the optimal kinetic parameter values under different conditions. Performance is quantified in terms of the folding efficiency f * max (the maximum folding fraction at a total protein content P tot = 1). We treat the total production rate k pt , protein folding rate k f , and misfolding fraction m f as external input conditions for the system. As always, these rates are expressed relative to the rate of chaperone removal (k r = 1 for non-dimensionalization), which is also treated as fixed. The quality control pathway is then allowed to adjust all other kinetic rate constants to optimize the folding efficiency -the resulting optimal folding fraction and the optimized parameters are plotted in Fig. 7.
When the overall production rate is low, the optimal folding fraction approaches one (blue curve in Fig. 7a), indicating that nearly all the foldable proteins that enter the quality control cycle are successfully folded. At higher production (k p 1), the removal of proteins from chaperones cannot keep up with the flux of incoming proteins. In this regime, the available chaperones in the system are overwhelmed and the folding efficiency drops.
The optimal parameters (red curves in Fig. 7a) describe how the optimized quality control system adjusts to changing production rates. For all conditions explored, binding rate constant (k c ) is always maximized, allowing nascent or reglucosylated proteins to bind to chaperones as quickly as possible. For low k pt , the reglucosylation rate constant k g is high and the rate constant k -g for glucose removal from free (not chaperone bound) proteins is low. High k g and low k -g indicate that the cycle is quickly removing proteins from the vulnerable state P to prevent degradation, which is expected for low protein production (k pt ) and low misfolded fraction (m f ) as proteins will then usually be provided multiple rounds of chaperone binding. In this regime, the optimal degradation rate (k d ) rises gradually with increasing production in order to maintain a constant amount of unfolded protein P tot = 1. Eventually (when k pt → 1) there will not be sufficient chaperones to fold all proteins, and protein degradation must increase sharply to maintain a fixed level of total unfolded protein.
As the production rate passes k pt ≈ 1, protein reglucusylation (k g ) steeply decreases and glucose removal (k -g ) increases. This switch indicates the activation of the 'safety valve' pathway which moves excess proteins directly into the degradation-vulnerable state P to avoid accumulation of unfolded proteins. As protein production continues to increase, glucose removal via k -g further increases to enhance this safety valve. Overall, there are two regimes: low protein production, where chaperones are available and proteins are quickly tagged for chaperone rebinding to prioritize folding; and high protein production, where chaperones are overwhelmed and rapid deglucosylation and degradation is prioritized. Figure 7b shows how performance and optimal parameters change as protein folding speed k f is varied. As expected, the folding fraction increases with folding speed. The increased folding speed does not cause significant changes in the optimal parameters, The folding fraction achieved with fixed rate constants is plotted as a fraction of the maximum achievable folding fraction f max for P tot = 1 as the protein production rate k pt is varied. Fixed rate constants k c , k g , k -g , and k d are those that achieve f max with m f = 0.001 at various protein production levels: k pt = 0.1 (red curves and star), k pt = 1 (green), and k pt = 10 (blue). (b) The P tot corresponding to the folding fractions in (a).
(c) Analogous plot to (a), with rate constants fixed at the optimal values for specific k pt values, except that the degradation rate constant k d adjusts to maintain P tot = 1. If k d adjustment cannot achieve P tot = 1, then k d adjustment minimizes the difference from P tot = 1. (d) The P tot corresponding to the folding fractions in (c).
with a modest increase in reglucosylation (k g ) and decreases in glucose removal (k -g ) and degradation (k d ) as faster folding frees up chaperones. Figure 7c shows that increasing the misfolded fraction m f modestly decreases the folding efficiency while leaving optimal parameters largely unchanged. Overall, Fig. 7 demonstrates that maintaining maximum folding efficiency requires large variation in parameters if the protein production level changes, but limited variation in parameters for changes to protein folding speed and misfolded protein fraction. We next proceed to explore how well the cycle can perform under changing production levels if a single fixed parameter set is used across all values of k pt . The goal is to assess the robustness of this quality control system to changes in protein production, for the case where other parameters cannot be adjusted sufficiently rapidly to keep up with such changes.
We consider the robustness of a fixed quality control system as follows. The rate constants are optimized to give maximal folding efficiency for a given value of input conditions k pt , k f , m f . For those parameters and input conditions, the system gives the highest folding fraction f * max that maintains a fixed protein content P tot = 1. When the input production rate k pt is varied and all remaining parameters are held fixed, the folding fraction will decrease below this optimum value (Fig. 8a) and the total protein content P tot will also change (Fig. 8b). The values plotted in Fig. 8a,b are given relative to the folding fraction and protein content at the point where the system was optimized.
If the parameters are optimized at low protein production (k pt = 0.1), the system continues to achieve close to the optimal folding fraction when the protein production is increased (Fig. 8a, red curves). However, the total accumulated protein increases by orders of magnitude even for a modest rise in the production rate (Fig. 8b, red curves). If the parameters are optimized at high protein production (k pt = 10), and the production rate is lowered significantly, then the folding efficiency is reduced to roughly half of the optimal amount and the accumulated protein is also decreased (Fig. 8a,b, blue curves). A system optimized at intermediate production (k pt = 1) exhibits analogous behaviors (Fig. 8a,b, green curves). If the production rate is lowered, the folded fraction drops below optimal values. If raised, then a massive increase in accumulated protein is observed. These results highlight a general principle: the quality control cycle can be optimized to operate in one of two regimes: a regime with excess chaperone capacity, and one where the chaperones are overwhelmed. Optimizing for the former requires shutting off the safety-valve, and prioritizing reglucosylation over degradation. Optimizing for the latter requires enhancing degradation and deglucosylation. The transition between the two regimes occurs when the rate of protein production becomes comparable to the rate at which chaperone-bound proteins detach from chaperones (i.e.: at k pt = 1). A system optimized for low production will result in large-scale protein accumulation if the production rate is increased by even a modest amount. A system optimized for high production yields suboptimal folding throughput if shifted to the low-production regime.
Without any flexibility to adjust cycle parameters, the glycoprotein quality control system will perform poorly in one of the two regimes. A natural question is to what extent adjusting a single kinetic parameter will allow the system to compensate for changing production rates and to perform well across a broad range of conditions. Figure 7a shows that the optimal degradation rate (k d ) continuously changes across a range of low protein production levels (k pt ), suggesting k d as a good candidate for an adjustable parameter. Thus we choose to treat the degradation rate k d as capable of adapting to changing production levels, while all other rate constants in the cycle are held fixed. At each value of the production rate, k d is adjusted to maintain a total protein content P tot = 1 whenever possible, with the resulting folding fraction shown in Fig. 8c. For a system optimized at low protein production, an adjustable degradation rate allows the optimum folding fraction to be maintained across all production rates. Even when the fraction of misfolded proteins is increased (dashed curves in Fig. 8c), the optimum folding fraction can be maintained up to intermediate production levels. Strikingly, a system optimized at low protein production can also maintain a fixed total protein P tot = 1 up to intermediate production levels (k p 0.7) by adjusting the degradation rate k d (Fig. 8d). The ability of this system to maintain fixed total protein content over a broad range of low to intermediate production values is in sharp contrast to the rapidly increasing protein levels that arise when all parameters are held fixed (Fig. 8b). At higher production rates, there is no value of the degradation rate that can maintain the fixed total protein content and we adjust k d as needed to minimize P tot . Allowing k d to adjust in a system optimized for intermediate or high protein production has little impact on both folding fraction and protein accumulation compared to the fully fixed system (Fig. 8c,d).
This analysis establishes that the quality control pathway can perform well at typical low production rates, yet be capable of adapting to moderate surges in protein production. Such robust behavior requires only for the protein degradation rate to be rapidly adjustable to changing conditions. Other parameters in the quality control cycle can be held constant while allowing near-optimal system performance over an order of magnitude range in protein production. Interestingly, there is evidence that cellular quality control systems do in fact control protein degradation throughput in response to perturbations in protein homeostasis. Specifically, cells maintain a reservoir of ERAD enzymes in ER-associated vesicles that can fuse with the ER lumen in response to an accumulation of unfolded proteins, rapidly upregulating protein degradation [56][57][58].

Discussion
We have investigated the impact of pathway architecture and kinetic parameters on the performance of the glycoprotein quality control cycle in the endoplasmic reticulum. Two metrics are used to evaluate steady-state performance. The fraction of foldable proteins that are successfully folded measures the accuracy of the system. The total quantity of unfolded proteins measures processing speed, with lower protein levels corresponding to more rapid processing.
Broadly, we find that a cyclic quality control process, with protein substrates driven in a preferred direction through three quality control states, leads to improved performance. Energy is required for cyclic driving, and increased driving energy per cycle allows higher protein folding fractions (Fig. 3). A higher folding fraction is achieved by eliminating reverse transitions that are absent from the consensus physiological model of the glycoprotein quality control pathway (Figs. 4 and 5). This matches the directed, cyclic behavior commonly described as occurring for physiological glycoprotein quality control.
The energy-consuming nature of the quality control cycle improves its decision making during the protein folding process, echoing other examples of biomolecular processes cyclically driven out of equilibrium to improve their performance. DNA copying is famously driven out of equilibrium in a 'kinetic proofreading' process that increases its accuracy [3,59]. Similar cyclic nonequilibrium processes increase the accuracy of T-cell signaling [4] and sensing of external concentrations [60].
By exhaustively considering all remaining cyclic and non-cyclic variations of the glycoprotein quality control pathway, we show that the consensus physiological model outperforms all other viable models (Fig. 6). Models lacking a 'safety valve', or a path for protein degradation without chaperone binding, will dangerously accumulate unfolded proteins at high protein production levels. This safety valve requirement aligns with the only two-way transition in the consensus physiological model, which allows glucose tags to be removed from proteins that are not bound the chaperone, facilitating their degradation.
We find that the optimal tuning of the consensus physiological model varies substantially with protein production level (Fig. 7a). If the cell must choose a particular set of rate constants, it will either sacrifice folded protein throughput at low protein production levels, or induce massive unfolded protein accumulation at higher production levels ( Fig. 8a,b). A particularly robust system design requires optimizing parameters for low protein production and allowing a single rate constant (the degradation rate) to adapt to changing production levels. Such a system can successfully maintain both maximum folding efficiency and low unfolded protein accumulation across a range of low-to-intermediate production rates (Fig. 8c,d).
In vivo glycoprotein folding in the ER is thought to operate in a low protein production regime, matching the robust system design. Namely, there is excess protein folding capacity in the ER under basal conditions [51], with abundant chaperones that exceed the requirements of the protein folding load [52,53]. Under conditions of high protein folding load, chaperones are overwhelmed [61,62], and the unfolded protein response is triggered, driving down the effective protein folding load by increasing chaperone quantity [51] and reducing protein translation [63].
The adjustable degradation rate, which alone can maintain both high folding throughput and low unfolded protein accumulation, corresponds to the dynamic behavior observed for some ERAD enzymes that remove proteins from the ER for degradation. Certain mannosidases, important for ERAD targeting, are largely sequestered to quality control vesicles in the absence of ER stress [56][57][58]. When the ER becomes stressed (i.e. unfolded proteins accumulate), these mannosidases converge on the ER, rapidly increasing degradation targeting [56][57][58]. Comparison of timescales for mannosidase convergence to the ER following proteasome inhibition (approximately a couple hours [51]) and the gene expression response to accumulation of unfolded proteins (approximately 5 to 10 hours [57]) indeed suggests that ERAD-mediated degradation may be enhanced relatively quickly.
In contrast to the large variations in optimal rate constants with protein production level, changes in protein folding speed require relatively little variation to the optimal rate constants (Fig. 7b). The glycoprotein quality control pathway must simultaneously process a variety of proteins, which can have folding times ranging from a few minutes to several hours [31]. The ability of a single pathway to near-optimally process this variety of folding speeds appears to be a strength of its design. The efficiency of protein throughput can approach 100% and range down to 25% or lower for slow-folding proteins or proteins with mutations [31]. Fig. 7b suggests that these low efficiencies (ranging down to 25% or lower) are not the result of a poorly-tuned quality control process, but instead that the low efficiencies are an unavoidable consequence of slow folding.
The optimal rate constants also change little with the fraction of produced proteins which are inherently misfolded or unfoldable (Fig. 7c). This suggests the design of the quality control pathway is robust to the onset of systematic misfolding, which may arise from translation errors, environmental stress, or mutations [46], so long as the total protein production levels remain relatively unchanged.
Effective quality control of glycoprotein folding in the endoplasmic reticulum ensures an adequate supply of functional natively-folded proteins and limits the accumulation of misfolded proteins. The failure to provide sufficient natively-folded proteins [31] and the formation of misfolded protein aggregates [9] can both contribute to the onset of disease. Our modeling quantitatively demonstrates how the performance of this pathway under a broad range of conditions is modulated by key kinetic parameters that serve as potential targets for pharmacological or genetic perturbations. This quantitative framework serves as a basic foundation for understanding the glycoprotein quality control pathway, which can be further expanded in future work to account for more complex aspects, such as sequential glycan sugar moiety removal [40,57] and the spatial organization of quality control activities [56].

Non-dimensionalization
We non-dimensionalize all times by the timescale of protein removal from the chaperone via glucose trimming, k −1 r , and all concentrations by total chaperone concentration, C tot . For conciseness of notation, all kinetic parameters in the text refer to non-dimensionalized values. The dimensionless dynamic equations for foldable proteins August 14, 2020 15/22 are then: The dynamics of misfolded proteins are described by Note that most rate constants are the same for both foldable and misfolded proteins, except k p changes to k * p to allow different production rates of foldable and misfolded proteins, k * f = 0 (misfolded proteins cannot fold), and P * b = 0 (as only a single comprehensive population of background proteins is considered). The available amount of chaperone is C A = 1 − P c − P cf − P * c − P cb , where C tot = 1 is the dimensionless total chaperone concentration.
This forms a quartic equation for C A , which can be solved with standard root-finding algorithms. Once C A is obtained, Eqs. 15 and 17 give steady state P c , P g , P * c , and P * g , from which Eq. 7 gives steady state P . Eq. 12 gives steady state P * once k * p and k d are selected, without needing other information.

Optimization of cycle efficiency
For the results in Fig. 2, the maximum folding fraction independent of total unfolded protein was first found by allowing k i = k c , k -c , k g , k -g , k -r , and k d to vary to maximize the folding fraction using the Matlab routine fmincon, with k i ∈ [10 −3 , 10 3 ]. The minimum total unfolded protein is found using the Matlab routine fmincon for each fixed folding fraction (at a value less than or equal to the maximum folding fraction), constrained with the nonlinear constraints option, and with k i ∈ [10 −3 , 10 3 ].
For the results in Fig. 3, k i = k c , k -c , k -r , k g , k -g , and k d are allowed to vary to maximize the folding fraction (Eq. 2), while fixing the energy (Eq. 3) at a specific value, and fixing the total unfolded protein P tot = 1. The folding fraction maximization was performed using the Matlab routine fmincon with energy and total unfolded protein fixed using the nonlinear constraints option. The k i were free within the range k i ∈ [10 −3 , 10 3 ].
The results in Fig. 4 are found similarly to those of Fig. 2, with the fixed folding fraction varied using the bisection method until a P tot ∈ (0.99, 1.01) is found. Results in Figs. 5 and 7 are found with the same method as Fig. 4 with the appropriate k i set to zero and the appropriate k i allowed to vary within k i ∈ [10 −3 , 10 3 ]. Almost all results in Fig. 6 are also found with the method of Figs. 5 and 7. The exception in Fig. 6 is the no tag memory (NTM) model, which lacks the transition represented with rate constant k r , and instead sets k -c = 1.
The f * max and optimizing k * i at particular k pt in Fig. 8 are found with the same method as Figs. 4, 5, 6, and 7. The optimal parameters that achieve f * max are then used as the fixed parameters in Eq. 19 to determine the folding fraction and total unfolded protein in Fig. 8a,b as the protein production is varied. The folding fraction and total unfolded protein in Fig. 8c,d with only k d free is found by using the bisection method to vary k d to attempt to find a k d value with P tot = 1. If P tot = 1 cannot be achieved with k d ∈ [10 −3 , 10 3 ] then k d = 10 3 is chosen to minimize P tot .