BiP Clustering Facilitates Protein Folding in the Endoplasmic Reticulum

The chaperone BiP participates in several regulatory processes within the endoplasmic reticulum (ER): translocation, protein folding, and ER-associated degradation. To facilitate protein folding, a cooperative mechanism known as entropic pulling has been proposed to demonstrate the molecular-level understanding of how multiple BiP molecules bind to nascent and unfolded proteins. Recently, experimental evidence revealed the spatial heterogeneity of BiP within the nuclear and peripheral ER of S. cerevisiae (commonly referred to as ‘clusters’). Here, we developed a model to evaluate the potential advantages of accounting for multiple BiP molecules binding to peptides, while proposing that BiP's spatial heterogeneity may enhance protein folding and maturation. Scenarios were simulated to gauge the effectiveness of binding multiple chaperone molecules to peptides. Using two metrics: folding efficiency and chaperone cost, we determined that the single binding site model achieves a higher efficiency than models characterized by multiple binding sites, in the absence of cooperativity. Due to entropic pulling, however, multiple chaperones perform in concert to facilitate the resolubilization and ultimate yield of folded proteins. As a result of cooperativity, multiple binding site models used fewer BiP molecules and maintained a higher folding efficiency than the single binding site model. These insilico investigations reveal that clusters of BiP molecules bound to unfolded proteins may enhance folding efficiency through cooperative action via entropic pulling.


Supplement Text S2: Global Sensitivity Analysis
For the global sensitivity analysis we ran 100,000 simulations for the 4 noncooperative and the 3 cooperative models. We varied 7 parameters simultaneously: the BiP association rate, the BiP disassociation rate, the aggregation rate, the unfolding rate, the misfolding rate, the folding rate, and the sequestration rate. From the nominal values, each parameter was allowed to vary one order of magnitude in either direction (i.e. one-tenth of the nominal value to ten times). The folding efficiency and chaperone cost metrics were recorded at t=100 seconds, a value by which time steady-state had been established. We used a simple Monte Carlo approach to generate the random values of the input parameters independently in MATLAB. There are many more parameters in the model but we use identical values by assuming that for example the BiP association rate doesn't change whether the chaperone binds to an unfolded protein, misfolded protein, or aggregate. While future evidence may prove this not to be the case, it allowed us to test large models while keeping the model complexity down.
We used the MATLAB toolbox GUI-HDMR [1]. This toolbox uses variance-based sensitivity indices [2,3] that quantify the relative contribution of each parameter to the uncertainties in the outputs. These methods explore the parameter space thoroughly by sampling large numbers of sets of inputs. GSA can be used to rank the importance of the various parameters as well as to discover importance interactions between parameters.
We also varied the initial conditions, though there are only two of main importance: BiP and unfolded proteins. That is because most of the large number of states in the models consist of some combination of these components and/or are derived from them. We allow those states to start at zero concentration and let the system evolve their populations during the simulation based on the reactions that occur.

Method: High Dimensional Model Representation (HDMR)
The high dimensional model representation (HDMR) is a set of tools in order to express the input-output relationship of complex models with a large number of input variables x 1 ,...,x n . The output variables f (x) = f (x 1 , ..., x n ) in the domain R n can be written in the following form: Here f 0 denotes the mean effect (zeroth order), which is a constant. The function f i (x i ) is a first-order term giving the effect of the variable x i acting independently (although generally non-linearly) upon the output f (x). The function f ij (x i , x j ) is a second-order term describing the interactive effects of the variables x i and x j upon the output f (x). The higher-order terms reflect the effects of increasing numbers of input variables acting together to influence the output f (x). If there is no interaction between the input variables, then only the zeroth-order term f 0 and the first-order terms f i (x i ) will appear in the HDMR expansion.
This expansion will be computationally very efficient if higher-order input variable correlations are weak and can therefore be neglected. For many systems a HDMR expression up to second-order already provides satisfactory results and a good approximation of f (x).
In the context of global sensitivity analysis, HDMR conceptually resembles the method of Sobol [4]. In this case, the calculation is of the variance expansion of the output given N random sets of n input parameters obtained by various methods. The variance expansion function consists of partial variances of the output related to first-and second-order influences of the input parameters. That is, the total output variance of a model is the sum of contributions obtained by varying the input variables.
The total variance V can be obtained by and the partial variances V i 1 ,...,is can be calculated from the HDMR expansion given above.
Once the partial variances are determined the sensitivity indices are calculated as follows: The first-order sensitivity index S i measures the main effect of the input variables x i on the output, or in other words the fractional contribution of x i to the variance of f (x). The second-order sensitivity index S ij measures the interaction effect of x i and x j on the output and so on.

Results
The first-order sensitivity indices are presented in Tables G1 and G2 for the output metrics folding efficiency and chaperone cost. Clearly the most important parameters of the set of seven is the sequestration parameter. This is due to the loss of chaperones due to entanglement in inclusion bodies which are inert aggregates that cannot be folded. Another parameter that is highly ranked is aggregation. The misfolding and unfolding kinetic rates have the least effect on the output for nearly all models. Surprisingly, the folding parameter has only a small effect on folding efficiency. The secondorder effects were much smaller and are omitted from the tables for brevity.   Table G2. First-order sensitivity indices for all models (chaperone cost).
Global sensitivity analysis does not measure the absolute size of the output variance, just the relative contributions to it. Thus, we examined the means and variances of all models for the metrics of chaperone cost and folding efficiency.

Means and variances of the output due to varying input parameters
The means of the folding efficiency metric for the non-cooperative scenarios confirm that the single binding site model produces the most folding (Table G3). This is due to the coverage effect as there are fewer binding sites to saturate on unfolded proteins and prevent them from misfolding. However, the cooperativity of multiple BiPs acting in concert through entropic pulling increases the amount of folding over the single binding site model. Thus, our large parameter study confirms our basic findings. In terms of chaperone cost, the non-cooperative models take a larger fraction of total chaperones per unit time to do their folding work. However, when cooperativity is introduced into the model, the chaperone cost of the higher binding site models drops below the single binding site model (Table G4). More chaperones are still bound to unfolded, misfolded, and aggregated proteins, but with the higher kinetic rate introduced by cooperativity, the speed of their action lowers the effective chaperone cost. Thus, our findings with the nominal parameters were confirmed by the large set of parameter values.  Table G4. Mean and variance for all models for the chaperone cost metric.
In fact, the variance tells a very important story. Over 700 thousand parameter sets, the output is quite invariant to changes in these inputs. Instead the results are a consequence of model structure. For each parameter, we rescaled the values from 0 to 1 and divided the input space into tenths (We call this the input percentile). From that restructuring, we calculated the mean of the outputs for the simulations for each of the 10 bins (folding efficiency and chaperone cost). We found that regardless of the region of the single parameter input space, the mean output metric was nearly constant. This was remarkable because one would think that the gulf from the 10th to the 90th percentile with a parameter like sequestration would produce different outputs. But the output was quite invariant. This was true for all seven parameters tested; the small variance in the output metrics for all models validates this. Figures G1-G7 show the folding efficiency for the cooperative and noncooperative models.We defined the input percentile as the folding efficiency mean of each of ten bins. The basic results for folding efficiency were not different from the nominal parameters, but it did show that these results did not depend on parameters at all. We suspect that the number of binding sites (model structure), the coverage effect, and the cooperativity were the main causes of the different folding efficiencies in line with the main results.  Figure G7. Folding efficiency of model C4 as a function of input parameters. Figures G8-G10 show the ratio of folding efficiencies for a cooperative model compared to its non-cooperative counterpart for cooperativity parameter C = 10. As the figures demonstrate, there is a several fold increase in folding efficiency, with very little variance in the findings across input space. The effect is even greater for models 3 and 4 compared 2, because their non-cooperative versions have a lower folding efficiency due to the coverage effect. This confirms our original findings that cooperativity through the concerted action of multiple BiPs is a major effect of our simulations.

Effect on Chaperone Cost Ratio of Varying Input Parameters
Figures G11-G13 (referring to models 2, 3, and 4, respectively) confirm our findings that chaperone cost drastically decreases when cooperativity is introduced into the model. In fact, the metric is less than a tenth of the cost for the comparative non-cooperative model. This ratio is quite invariant across the space of the 7 parameter values tested.  Figure G13. Effect of cooperativity on model 4 chaperone costs as a function of input parameters with a cooperativitiy parameter of 10.

Variation of the Amount of BiP and U Initial Conditions and Its Effect on Folding Efficiency
We also varied the amount of BiP and U molecules in the simulations' initial conditions simultaneously. Each species had a range of 100, 000 − 10 6 (equivalent concentrations), incremented by 100,000 molecules for a total number of 100 simulations for each of the seven models. We measured the folding efficiency metric as a function of the two initial conditions. Figure  G14 shows heatmap plots for each of the 7 models. For the non-cooperative models, the single binding site case produced the most folding over the range compared to models 2,3, and 4. This confirms our previous finding that the coverage effect caused the binding sites to be more saturated for model 1, leading to more folding and prevention of misfolding and aggregation.  Figure G14. Effect of varying BiP and U simultaneously on folding efficiency for all models.