Replication Vesicles are Load- and Choke-Points in the Hepatitis C Virus Lifecycle

Hepatitis C virus (HCV) infection develops into chronicity in 80% of all patients, characterized by persistent low-level replication. To understand how the virus establishes its tightly controlled intracellular RNA replication cycle, we developed the first detailed mathematical model of the initial dynamic phase of the intracellular HCV RNA replication. We therefore quantitatively measured viral RNA and protein translation upon synchronous delivery of viral genomes to host cells, and thoroughly validated the model using additional, independent experiments. Model analysis was used to predict the efficacy of different classes of inhibitors and identified sensitive substeps of replication that could be targeted by current and future therapeutics. A protective replication compartment proved to be essential for sustained RNA replication, balancing translation versus replication and thus effectively limiting RNA amplification. The model predicts that host factors involved in the formation of this compartment determine cellular permissiveness to HCV replication. In gene expression profiling, we identified several key processes potentially determining cellular HCV replication efficiency.


Model Calibration
The final replication model comprises 13 molecular species (two of them implicit, without separate equations), and is parameterized by 16 parameters corresponding to reaction rates, as well as three non-zero initial values and one scaling parameter. As the total amount of Ribosomes in the cell (R ibo tot ) is assumed constant and equal to the sum of free Ribosomes and Ribosomes bound in translation complexes T c , we do not need to include a separate equation for the free ribosomes (R ibo ), but can rather calculate them implicitly as R ibo = R ibo tot -T c . Similarly, the total amount of Host Factor HF tot =HF(0) is assumed constant, and is equal to HF(0) = HF + R Ip . Reaction rates in the model were taken from literature as far as known, or estimated by fitting the model to the experimental data. For simplicity, we assumed that all RNA species in the RC degrade at the same rate !" . Following [3], we used a value of k 2 =100 polyproteins per hour per polysome for protein translation. RNA replication was assumed to occur at a rate of k 4m = k 4p = 1.7 viral RNA molecules per hour per replication complex, assuming plus-and minus-strand synthesis to occur at the same rate [4,5,6]. Based on an estimated half-life of Luciferase of approximately 2 hours, we estimated the corresponding degradation rate to be µ L = 0.35 h -1 [7,8]. We furthermore estimated the NS protein half-life in the cytoplasm to be around 12 hours, corresponding to a rate of µ E cyt = 0.06 h -1 [9,10,11]. We observed from model calibration that the optimization would yield values with µ Tc > µ p cyt , violating the expectation that RNA in translation complexes should be more stable than free RNA in the cytoplasm. We hence added the constraint µ Tc / µ p cyt = 0.5, enforcing a 2-fold higher stability of RNA that is actively translated. We furthermore observed a low sensitivity of model output with respect to parameters k 1 , k c , k 3 and k 5 , compare figure 5 in the main manuscript and supplementary figure S5, and hence estimated these parameters based on manual model analysis. The effect of these equality constraints was thoroughly evaluated to assess the impact of these assumptions, as described below.

Formulation of the Parameter Estimation Problem
For the estimation of the remaining 7 parameters, three initial values and scale factor, we consider the following nonlinear, multi-experiment least-squares optimization problem, which is subject to equality and inequality constraints: Herein, is the state vector of the differential equation system at time in experiment , which in our model is given by the concentrations of the 11 species (R p unp , R p cyt , T c , P, E cyt , R Ip , R ds , E, R Ids , R p and L) at time t in experiment r. The vector p represents the unknown parameters to be estimated, including the scaling factor for the Luciferase measurements, and the initial values for the host factor in the two cell lines, as well as the total number R ibo tot of Ribosomes available for translation. The right hand side function f of the differential equation system is given by the model equations in the cytoplasm and replication compartment: Cytoplasm: !" !" The number m 1 in the optimization problem (1) is the number of different time points available. The number m 3 denotes the number of different experiments, in our case m 3 =2, because we have measurements for high and low permissive cells. The number m 2 denotes the number of different measurement types, which in our case is m 2 =6, because we have measurements for plus-strand RNA, minus-strand RNA, and Luciferase, and three steady-state ratios at t=72, which we include in the objective function [12]: (1) A ratio of 10:1 of plus to minus strand RNA; (2) a ratio of 6:1 of plus to minus strand RNA in the replication vesicles; (3) a ratio of 1:1 of plus strand RNA in the cytoplasm to plus strand RNA in the replication compartment. Further, r ij η denotes the value of the measurement of type j taken at time t i in experiment r. For each measurement type, there is a function h j , called measurement function or measurement model, which simulates the outcome of a measurement of this type at time t i . For plus-strand RNA measurements, the measurement function is given by log 10 (R p cyt + T c + R p + R ds + R Ip + R Ids + R p unp ), for minus-strand RNA measurements, the measurement function is given by log 10 (R ds + R Ids ), and for Luciferase measurements, the measurement function is given by log 10 (L) -log 10  for plus-strand RNA in cytoplasm to plus-strand RNA in the RC.
Finally, the factor r ij w is equal to one whenever a measurement of type j has been taken at time t i in experiment r, and zero otherwise.

Multiple Shooting Parameterization
The optimization problem given above is an infinite-dimensional optimization problem, because the functions ) (⋅ r y occur as optimization variables, which have to fulfill the differential equation (an infinite-dimensional equality constraint). In order to reduce the optimization problem to finite dimension, we use the multiple shooting method [13,14,15], which has successfully been applied to real world parameter estimation problems in many different application areas like chemical engineering (e.g. the denitrogenization of pyridine, see [13], biophysics, e.g. the photosynthesis process [16], and civil space flight, e.g. satellite orbit determination [17]).
Using multiple shooting, the finite dimensional vector of optimization variables is given by

The notation
indicates that the initial value problem solution depends on the initial value and the parameters. In order to ensure that ) (t y r is continuous in the solution of the optimization problem for all experiments r , additional equality constraints -so-called matching conditions -are imposed, which require that the state variables r k s 1 + are equal to the solution of the initial value problem on the preceding interval, i.e. 1 , , . Thus, we arrive at the following finite-dimensional, nonlinear least-squares optimization problem: ,..., 1 , in each experiment r as optimization variable, is derived from the above optimization problem by setting 0 = M and omitting the matching conditions. However, the usage of multiple shooting, i.e. the introduction of additional optimization variables and equality constraints, is known to have many theoretical and practical advantages, like improved convergence behavior through a suitable initialization of the augmented vector of optimization variables [17], decreased nonlinearity of the problem (even down to one-step convergence, see [15]) , and the possibility to estimate parameters in systems with unstable behavior, which cannot be treated using the single shooting method [14]. Collecting all components of F eq and the matching conditions into a single vector, and using the definition ) , , , ,..., , , where the equality and inequality constraints have to be fulfilled component-wise. In our case, all inequality constraints are bounds to the parameter values and were treated by adding suitable smooth potential terms to the least squares objective function, which keep the parameters within biologically plausible ranges.

Generalized Gauss-Newton Method
The remaining equality-constrained nonlinear least squares function is solved using the Generalized Gauss-Newton method, i.e. starting from an initial guess 0 x for the optimization variables, we iterate , where the increment k x Δ is determined as the solution of a linearized constrained least-squares problem: , which is given by where I denotes the identity matrix. For the practical numerical solution, Householder decompositions of the Jacobian matrices are used. By exploiting the block-diagonal structure in the Jacobian J 2 , which arises from the matching conditions, the solution of linear problems is done very efficiently. In order to improve the convergence behavior of the Gauss-Newton method, we use damped iterations , where the damping factor ] 1 , 0 ( ∈ k λ is obtained from the restrictive monotonicity test [18]. Ill-conditioned parameter combinations, which correspond to small diagonal elements in the triangular matrices of the Householder decomposition of J 1 , are regularized.

Computation of Initial Value Problem Solutions and Derivatives
As described in the section "Multiple Shooting Parameterization", the evaluation of the state vectors and needed for the derivative-based optimization method, are computed using Internal Numerical Differentiation. This method allows a very reliable and efficient computation of the derivatives [13,14].

Statistical Analysis
In the solution x* of the parameter estimation problem we perform a statistical analysis based on the covariance matrix C, which can be computed from the generalized inverse by Herein, I denotes the identity matrix of the dimension of the number of least-squares conditions and 2 c β denotes the so-called common factor, which is computed by , ) (

Software
The above described numerical methods for the solution of nonlinear constrained least squares problems and the statistical analyses are implemented in the software package PARFIT, which was used for the parameter estimation of our model. The initial value problem solutions and their derivatives are computed using the solver METANB, which is incorporated in PARFIT. Supplementary Table S5 shows the obtained estimates for the model parameters, together with 90% confidence intervals.

Statistical Model Analysis and Analysis of Model Constraints
We performed an analysis of structural model identifiability, using SensSB [20]. SensSB is a Matlab toolbox that integrates a variety of local and global sensitivity and identifiability analysis methods. We performed a local sensitivity analysis of the full model using SensSB. Results of this analysis are shown in Supplementary Figure S8. High correlation between two parameters means that a change in the model output caused by a change in one parameter can be compensated by an appropriate change in the other parameter.
We furthermore performed a sensitivity analysis of obtained model parameters using the extended Fourier Amplitude Sensitivity Test (eFAST) [21,22], shown in Figure 5 in the main manuscript and supplementary figure S4. Based on these analyses, we observed a low sensitivity of model output with respect to parameters k 1 , k c , k 3 and k 5 for all measured species (+RNA, -RNA and Protein) and hence fixed these parameters manually to plausible values, as given in Supplementary Table S5. We observed from model calibration that the optimization would yield values with µ Tc > µ p cyt , violating the expectation that RNA in translation complexes should be more stable than free RNA in the cytoplasm.
We hence added the constraint µ Tc / µ p cyt = 0.5, enforcing a 2-fold higher stability of RNA that is actively translated. We then examined the effect of these additional constraints in the parameter estimation individually. This is done by removing or changing each constraint individually, recalibrating the full model, and then discussing the changes seen both in the objective function, the obtained fits, as well as changes in obtained model parameters.

Effect of constraint on k1, the formation rate of the translation complex Tc
The effect of the constraint on k 1 is investigated by fixing this parameter to the alternative values 0.5, 10.0, and 20.0.
A small decrease in the objective function can be achieved by changing the constraint to k 1 = 0.5, i.e. below the lower bound (decrease of 1.5%, LSQSUM = 10.579). Recalibration of the full model then shows that the largest changes in parameter values are obtained for k Pin (increases to 2.0e-5), R Ibo tot (decreases to 390), and the scale factor (decreases to 1400). These changes are well within the given confidence intervals.
The parameter values change again only within the confidence intervals.
These results indicate, that essentially changes in the formation rate k 1 of the translation complex T c can be balanced by changing the available number of ribosomes and by changing the import rate of the translation complexes into the replication compartment, then giving comparable results. This parameter is difficult to address experimentally, we hence decided to fix it to a plausible value of 1/(h*molecule).

Effect of the constraint on kc, the polyprotein cleavage rate
The effect of the constraint on k c is investigated by fixing it alternatively to the values of 0.1, 0.5, 2.0 and 5.0. The least-squares sum changes only negligibly (less than 1%) for k c = 0.5, 1, and 2. The values of the other parameters change by less than 20%. For k c = 0.1, the least-squares sum increases to 11.109 (+3.4%). k Pin increases to 1.9e-5, k 0 increases to 5.4e-3, and R Ibo tot increases to 890. All these values are within the confidence intervals. This indicates that this parameter is ill conditioned, and cannot be estimated accurately from the data given.
When considering the structure of the model, this result is not surprising, and due to the linear cascade of processes in the cytoplasm (protein translation followed by cleavage followed by import into the replication compartment). Within this cascade, k c is not rate limiting, hence changes in the cleavage rate hardly affect results. Based on these considerations, we decided to fix k c to a value of 1/h.

Effect of the constraint on k3, the formation rate of the plus strand replication complex within the replication compartment
The effect of the constraint on k 3 is investigated by fixing it on the values 1e-3 and 1e-5, respectively. The least-squares sum decreases slightly for k 3 = 1e-5 to 10.702 (-0.4%), and increases notably to 11.0 (+2.2%) for k 3 = 1e-3. Other parameter values are almost unchanged (less than 10%). The nominal value 1e-4 seems to be an upper bound that is consistent with the measurement data (i.e. lower values are acceptable and lead to a slightly decreased objective function value, whereas higher values lead to a significant increase in the objective function value).

Effect of the constraint on k5, the formation rate of the minus strand replication complex
The effect of the constraint on k 5 is investigated by fixing this parameter to the values 1, 5, 100, and 500. For both the larger and the smaller values, the least squares sum changes only marginally (up to 1.3% for k 5 = 500). The largest deviations for the parameter values are again observed for k Pin (varies between 5e-6 and 2.4e-5), scale factor (between 1600 and 2900), and R Ibo tot (between 420 and 1000). Again, these variations are within the confidence intervals.

Effect of the constraint µTc = 0.5*µp cyt
Based on the biologically motivated assumption that the degradation rate of actively translated RNA in complexes should be less than that of free RNA in the cytoplasm, we included the constraint µ Tc = 0.5*µ p cyt in the model. The effect of this constraint is investigated by simply removing it from the optimization problem. Without the constraint, the objective function decreases to 10.195 (-5.2%). As a result, µ Tc = 0.29 and µ p cyt = 0.23 < µ Tc is obtained. The largest change in any of the other parameters is observed for k Pin (1.4e-5, +50%), whereas all other parameters change by less than 20%.

Effect of Host Factor -Consumed versus Enzymatic
One of the assumptions made in our model is the participation of a crucial host factor involved in the formation of the replication compartment / the structural rearrangements occurring in the rough endoplasmatic reticulum. While the nature of the host factor is not specified in more detail in our model, there are two different possibilities how this host factor might be involved: (1) as an active component of the structural rearrangements, being consumed in the process [consumed HF model], or (2) as an enzyme, that catalyzes the reactions, but is released afterwards and is not consumed [activatory HF model]. We evaluated and compared both of these model alternatives. The dynamics and parameters for both models are very comparable, as shown in Supplementary Figure S3. Table 1 on the main manuscript  shows the corresponding parameters estimated for the activatory model, supplementary table S5 shows the parameters for the consumed model.

Model Selection
We developed our model based on a model published by Dahari and coauthors [3]. This model was developed based on steady state measurements of viral RNA and polyprotein, and we determined that the model was not able to explain the initial replication dynamics as observed in our data. Furthermore, the model cannot explain the differences between high and low permissive cells.
In order to compare possible hypotheses about what differences in the considered Huh cell lines could give rise to the observed dynamics, we implemented different assumptions about differences in highand low permissive cell lines in Dahari's model, and re-fitted the full model simultaneously to the hp and lp cells, requiring all parameters to be identical between the two cell lines, except for the parameters relating to the hypothesis under consideration. Figure S2 shows the resulting time courses, chi-squared values and AIC values obtained.

Ternary interactions for host factor model
A central assumption made in our model is that translation complexes T c , host factor HF and cytoplasmic proteins E cyt act in cis for the initiation of minus strand RNA synthesis. This is implemented in our model through ternary reactions T c + E cyt + HF -> R Ip + R ibo . This is a model assumption that we made to avoid introducing additional parameters in the model. We tested the consequences of these assumptions by splitting the ternary interactions into two sets of binary reactions with intermediate complex C, as follows: (1) We then re-calibrated the three additional model-parameters using an interior-point method. By iterating this from different starting points, we observed that parameter k a consistently obtained values around 3e-4, whereas parameters k b and k c seemed correlated, i.e. increasing values of k c could be compensated by increasing k b and vice versa, without significantly affecting the quality of the obtained fit. To study this further, we fixed k b to different values between 1 and 200, and fitted the model to the data by optimizing parameter k c , compare figure S6. Based on this analysis, we concluded that the introduction of C as an intermediate species with two additional reaction rates renders the model nonidentifiable. Furthermore, reaction rate k b is in all cases about 25 to 26 fold larger than k c , thus making reaction (1) above the limiting reaction for the formation of replication complexes. Since furthermore k a << k c << k b , complexes C can be assumed to be almost constant at a low number, thus warranting an approximation using the ternary reaction T c + E cyt + HF -> R Ip + R ibo .

Stochastic effects in the viral life cycle
Since several of the species in our model attain low values, we assessed whether stochastic effects play an important role and possibly influence the model dynamics. For this purpose, we carried out stochastic simulations using the calibrated model and the implicit tau stochastic solver. Computations were carried out in Matlab, using the SimBiology toolbox. Supplementary figure S7 shows ten stochastic runs for high-and low permissive cells, indicating a very similar behavior to the deterministic differential equation model.