The Evolutionary Dynamics of a Rapidly Mutating Virus within and between Hosts: The Case of Hepatitis C Virus

Many pathogens associated with chronic infections evolve so rapidly that strains found late in an infection have little in common with the initial strain. This raises questions at different levels of analysis because rapid within-host evolution affects the course of an infection, but it can also affect the possibility for natural selection to act at the between-host level. We present a nested approach that incorporates within-host evolutionary dynamics of a rapidly mutating virus (hepatitis C virus) targeted by a cellular cross-reactive immune response, into an epidemiological perspective. The viral trait we follow is the replication rate of the strain initiating the infection. We find that, even for rapidly evolving viruses, the replication rate of the initial strain has a strong effect on the fitness of an infection. Moreover, infections caused by slowly replicating viruses have the highest infection fitness (i.e., lead to more secondary infections), but strains with higher replication rates tend to dominate within a host in the long-term. We also study the effect of cross-reactive immunity and viral mutation rate on infection life history traits. For instance, because of the stochastic nature of our approach, we can identify factors affecting the outcome of the infection (acute or chronic infections). Finally, we show that anti-viral treatments modify the value of the optimal initial replication rate and that the timing of the treatment administration can have public health consequences due to within-host evolution. Our results support the idea that natural selection can act on the replication rate of rapidly evolving viruses at the between-host level. It also provides a mechanistic description of within-host constraints, such as cross-reactive immunity, and shows how these constraints affect the infection fitness. This model raises questions that can be tested experimentally and underlines the necessity to consider the evolution of quantitative traits to understand the outcome and the fitness of an infection.


A.1 The model
We describe the within-host evolutionary dynamics with a hybrid stochastic-deterministic model. Similar approaches have been proposed to model within-host evolutionary dynamics in HIV, showing that one can assume a deterministic behaviour for the population of infected and immune cells whilst keeping a stochastic description of the viral evolution [1,2]. Here, contrary to previous models, we track the dynamics of two viral traits: the replication rate and the antigen characteristic of each viral strain present in the host (for a similar approach with only two strains per host, see [3]).
Due to the complexity of the model, numerical simulations are performed with a code written in C that is available upon request to the authors. In the model, at each time step, the population size of each viral strain and each lymphocyte clone are updated using the system of differential equation 1 given in the main text. This system is solved numerically with a fourth order Runge-Kutta algorithm [4]. A viral strain is removed from the system if the number of cells infected by this strain falls below 1.
Each viral strain is characterised by a replication rate and an antigenic value (see the Model section in the main text). The immune cells are described as a population of a finite number of T-cell clones m each with a receptor that recognises antigens within a range of values defined by the function given in Eq. 2 of the main text. In the simulations we assume m = 20. We assume that infections are generated by a single strain with an initial replication rate ϕ 0 and an antigen value A 0 . This assumption is supported by recent data for HIV infections [5]; little is know about HCV transmission dynamics.
Simulations are ended when the maximum time (T max = 800 days) is reached or when the total numbers of infected cells and immune cells both reach a threshold value of 10 12 that we assume a large value that correspond to the death of the host.

A.2 Mutation process
At each time point after infection, virus mutates with probability µ per day. The probability that a new strain emerges is drawn from a binomial distribution (similar to [2]) B(∆x i , µ), where ∆x i is the number of newly generated infected cells from strain i. We here use the approximation ∆x i ≈ ϕx i ∆t. For each new strain we draw randomly the new traits (the replication rate and the antigenic value). The new replication rate is drawn from a uniform distribution centred around the replication rate of the strain generating the mutation and with a range of values (width) w. We also assume a maximum value for the replication rate, ϕ max = 5. The new replication value is drawn from a finite set of values R max , which is assumed to be 100 in the default case. A new antigen is drawn from a finite set of values, A max that are uniformly distributed. Because the cross-immunity function we use is 2π-periodic, we assume that antigenic values are uniformly distributed on a circle [0, 2π], thus antigens are multiples of 2π/A max . We simulate the case for A max = 1000 (see also Supporting Text S4). In the model we assume a finite number of replication rates and a finite number of antigenic values. Therefore there is a probability that newly generated mutants are already present within the host. For each new mutant, if the newly generated strain has both replication rate and antigen value already in the system in a previously generated strain, then we add these mutant cells to the population infected with the same type of virus. Otherwise, we increment the number of types m(t) = m(t − ∆t) + 1 and the number of differential equations of system 1 in the main text accordingly.

A.3 Transmission dynamics
To assess the heritability of viral traits from one infection to the next, we simulate transmission events during the infection period. To do so, we assume a simple epidemiological model, where virus spread across an homogenous population of susceptible individuals. This assumption is analogous to that done to define the classical R 0 [6]. We assume that the probability of encountering a new susceptible host is homogenous and we neglect heterogeneity in the host population. In this model, we approximate the probability of transmission as a stochastic event that depends on the logarithm of the total viral load within the source. Recent data on HIV show that the probability of transmission during primary infections is at least a linear function of the logarithm of the viral load [7]. In our model, the production of viral particles is at a steady state, therefore the number of viral particle is proportional to the number of infected cells, which is the variable that we follow in the simulations (see the main text).
A simple way to model the probability of transmission is to assume a non-homogenous continuous Poisson process with rate λ(t) ≈ ρ log(X(t))∆t, where X(t) is the total number of infected cells at time t within the source, ∆t = 1 is the time interval between events in the simulations of the model, and ρ is a constant transmission probability (ρ is a free parameter that can be varied to reflect the environmental condition or genetics of the population considered). At any time step we consider the probability that a transmission event occur, which is simply 1the probability that no event occurs: 1 − e λ . On average, the time at which a transmission event occurs is 1 λ . The default case is a situation where transmission events occur on average every 50 days. Upon transmission, we assume that only a single strain is successfully transmitted into the new host. The transmitted strain is chosen proportionally to the number of infected cells present in the host at time t, namely with probability x i P j x j . This strong bottleneck assumption is supported by data on HIV suggesting that 80% of transmitted events involves single types [5]. During the simulations we store the values of the traits of the transmitted strain and the time of the events.

A.4 Virulence events
In the default case we assume that hosts do not die (unless unrealistic values of viral load or immune response are reached (we used a threshold of 10 12 ). As explained in the text, we make this assumption because the scope of this work is to study viral evolution; therefore host death and clearance have a similar effect. However, for completeness, we investigate the effect of virulence in a more detailed analysis, where host death is a stochastic event that occur with a probability that is proportional to the viral load.
As for the transmission dynamics, we describe the virulence event as a stochastic event with a non-homogenous Poisson process. The probability of virulence event occurring at time t is λ vir (t) ≈ η(X(t)), where X(t) is the total number of infected cells in the host at time t. Here the probability that a virulence event occurs is drawn in a parametric distribution (Poisson distribution). Virulence occurs at most once in an infection, it might be more appropriate to model it using a non-parametric probability distribution. However, our approach has the advantage of being a simple and intuitive way of modelling this event, assuming that it depends on the viral load. At any time point during the infection, we compute the probability of virulence as 1 minus the probability that no virulence event occurs. If a virulence event occurs, then the host dies and the simulation is stopped. We record the traits values and the time of death. Results are shown in Table 3 of Supporting Text 4.

A.5 Statistical tests for the correlation analysis on viral life-history traits
To assess the effect of parameter variations on the viral traits and their heritability, we perform statistical tests on multiple simulation runs, where each parameter setting was run 200 times. This value is sufficient to explore the variation for each parameter setting. We also perform simulations where we repeat 500 or 1000 times the same set of parameter values observing little variation in the range of values of life history traits (not shown). Table 2 in the main document summarises the effect of the variation of 12 factors on the infection life-history traits. The factors we vary are the initial replication rate of the infecting strain (ϕ 0 ), the intensity of cross-reactive immunity (q), the mutation rate (µ), the width of the distribution of the replication rates (w), the maximum immune activation rate (c max ), the maximum killing rate σ max , transmission rate (ρ), the virulence (η) and the treatment intensities (τ 1 and τ 2 ). We also study the effect of the initial viral load and of the precursor frequency of immune cells. Simulations are performed by varying only one parameter per setting and keeping all the others at their default values (see Table 1 in the main text). We simulate the model using a range of 22 values (between 0.01 and 2) for the initial replication rate ϕ 0 and with a range of 6 to 10 values for the other factors.
The statistical association analysis is conducted with a multivariate general linear model employing the function "lm" in the R package (freely available at http://cran.r-project.org/). In the model, we assume that each infection life-history trait depends on a linear combination of the 12 factors (see above). We also test statistical associations with a univariate linear model, which assumes dependency of the trait on a single factor. Tests are performed using the same factors. Results are very similar to those obtained with a general linear model (not shown).

A.6 Modelling host treatment
We investigate the effect of the two types of treatments on the life-history traits by adding the treatment efficiency rate as a further parameter of the multivariate linear model. For the treatment tests we simulate again 200 runs for each parameter settings and the treatment efficiency rates τ 1 and τ 2 .
Equation system 1 with treatment becomes: These equations illustrate the difference between τ 1 , which mimics killing of infected cells, and τ 2 , which blocks viral replication, which also affects the proliferation rate of the lymphocytes because it depends on ϕ i . Table 2 shows the result of the test performed on τ 2 . Similar results are obtained from simulations where we vary τ 1 . For both treatment types we simulate several strategies of administration: the treatment can be initiated as early as the infection, or at 30, 90, and 180 days after the infection.