Statistical Mechanics and Thermodynamics of Viral Evolution

This paper uses methods drawn from physics to study the life cycle of viruses. The paper analyzes a model of viral infection and evolution using the "grand canonical ensemble" and formalisms from statistical mechanics and thermodynamics. Using this approach we enumerate all possible genetic states of a model virus and host as a function of two independent pressures–immune response and system temperature. We prove the system has a real thermodynamic temperature, and discover a new phase transition between a positive temperature regime of normal replication and a negative temperature “disordered” phase of the virus. We distinguish this from previous observations of a phase transition that arises as a function of mutation rate. From an evolutionary biology point of view, at steady state the viruses naturally evolve to distinct quasispecies. This paper also reveals a universal relationship that relates the order parameter (as a measure of mutational robustness) to evolvability in agreement with recent experimental and theoretical work. Given that real viruses have finite length RNA segments that encode proteins which determine virus fitness, the approach used here could be refined to apply to real biological systems, perhaps providing insight into immune escape, the emergence of novel pathogens and other results of viral evolution.

can give rise to considerably different decision trees. For self-consistency to hold, the state reached post-infection from this cycle must be the same as that started from. The decisions portrayed in Fig B are simplified, in that they do not show the distribution of viruses (in genetic space). Moreover, and this is an important point which will hold for all of the analytic expressions to follow, all cell occupations are probabilistic and can take any value from zero to 1; likewise the free viruses are characterized by a probability distribution of matches.

Model Infection Process
With the criteria defined in the manuscript, we have analytically derived the overall infection rate. Below we give detail on several implications of this equation. As in the manuscript, Eq (10b), we define the infection rate, λ, as the probability a cell is infected as a function of the number, N, of free viruses in the environment.
where c is the number of target host cells, and B is the probability of a cell not being infected in a single viral pass given the distribution of virus in the environment P m .
Eq (S1) is only valid for integer numbers of free viruses N. In practice, however, just as all viruses in the environment are represented by a vector of probabilities P m for their distribution among all the m's, the total number of viruses N is likewise a real number in this model. A non-integer value of N, with integer part n 0 , can be viewed as a probability N − n 0 that the number of viruses is n 0 +1, and probability n 0 +1− N that the number of viruses is n 0 . It can easily be seen that N represents the most probable value in this case (i.e., the "expectation value").
To calculate the overall infection rate λ(N ) for non-integer N, therefore, we calculate where λ on the right hand side is evaluated using Eq (S1) at the integer values n 0 and n 0 +1 as shown in Eq (S3).

Solution for the Viral Genetic States
We derive the processes that occur when a virus reproduces. In the current model, when a virus reproduces, the cell dies, and a new empty (uninfected) cell takes its place. The distribution of free viruses changes significantly upon reproduction. In this model, the virus reproduces with a fecundity, , where we set = 20.
Each offspring is mutated at a randomly chosen single codon. This collection of mutated viruses is added to the pool of "free virus" available for infection at the next iteration. We call this pool "the virus in the environment." As the system evolves toward steady state, these mutations enable the virus to adapt to the competing factors such as the adaptive immune response and the barrier to infection. The number of virus in the environment fluctuates iteration to iteration, until stability is reached. The analytic solution, Eq (6), yields only the equilibrium or steady-state solution. We also solved the equations numerically and iteratively to track the dynamic approach to equilibrium and test for dynamic instability.
After reproduction the probability of a virus remaining in the cells is (1− e m )ψ Ξ (m) , and the probability to successfully reproduce is e m ψ Ξ (m) . We transform the probability functions above into a virus probability via Each offspring is subject to the possibility of mutation (manuscript Equations 5).
Here, M is a matrix of probabilities formed from manuscript Equations 5. Because the mutation can only change the number of matches by at most +/-1, the resulting matrix is tri-diagonal.
In substituting ψ Ξ (m) from manuscript Equations 6 into Eq (S5), an important point arises. The P m which appears in Equations 6 is in fact the post-mutation viral distribution from the previous iteration. Our stability criterion is that the post-mutation viral distribution must be constant, iteration to iteration. Setting the P m from Eq (6) equal to P postmutation (m) in Eq (S6) and calling them both P m , we obtain, finally, the self-consistent equation for P m , the post-mutation stable viral distribution: Solving Eq (S7) gives the steady-state viral probability distributions of the system.
The tri-diagonal matrix MD m is asymmetric. In order to increase the numerical accuracy of the eigenvalue calculations, we used a similarity transformation on MD m to form a symmetric tri-diagonal matrix S with the same eigenvalues.
The details of a general similarity transformation from nonsymmetric triadiagonal to symmetric tridiagonal matrix is shown as follows.
For a non-symmetric tri-diagonal matrix M with nonzero same-sign entries M ij , the transformation to a symmetric matrix S = B −1 MB with the same eigenvalues as M is obtained by similarity transformation with a diagonal matrix B. To determine B, there is an overall scale factor that is left undefined. For convenience we set B 0 = 1. With this definition, the remaining elements of B are: and in general We note that at a minimum for this transformation to be defined, off-diagonal pairs M n,n-1 and M n-1,n must be of the same sign and nonzero. In closed form this gives: The resulting matrix S has It can be easily shown that S defined in this manner has the same eigenvalues as M. The eigenvectors V of S are related to the eigenvectors P of the original matrix M by It can be noted therefore that if the elements of V n are real and positive definite, that the elements of P n will be as well.
Besides being faster to evaluate, the eigenvectors of S had much cleaner distinctions between physical and nonphysical eigenvalues (see manuscript). Since the transformation from the eigenvectors of S to those of MD m is real and positive definite, the physical eigenvectors of S correspond directly to the physical eigenvectors of MD m .

Order Parameter and Occupancy for General Values of p
As discussed in the text, Equations 6-10 can be solved for values of the probability p, the probability that the virus remains in the cell if not cleared by the immune response, with 0<=p<=1. We find that varying p leads only to a slight rescaling of the temperature parameter, T model , demonstrating universality in the solution. Supplemental Figs C-D compares the order parameter at p=1 with p=0, the limit where all virus exit the cell whether they successfully reproduce or not. The phase transition is observed for all p with only a slight rescaling of temperature.  In Figs E-F we similarly compare the occupancy of the cells for p=0 and p=1.  This boundary equation separates the regime of normal binding from the disordered phase, although the nature of the transition appears to change as Temperature is raised (and maximum immune response is lowered). . This is a singular point in our model which has different solutions depending on whether the system approaches T=0 or A=0 first. If A=0 and non-zero but small T, the steady-state solution in the cells is only the perfect virus. The probability becomes very narrow and peaked at m=50. The perfect virus has a reproductive advantage and no immune pressure. Approaching T=0 from this limit leads to the greatest possible number of virus. On the other hand, if T=0 and non-zero but small I, the steadystate solution is a 50/50 mix of m=49 and m=50. At every cycle the m=50 virus are cleared by the non-zero immune response but the m=49 virus reproduce, with mutation, filling the environment with a mix of m=49 and m=50. The number distribution is such that both successfully infect the cells on the next cycle. The mixed state remains stable approaching A=0 leading to the observed jump along T=0 axis.

Temperature Scale
In this model the effective k B has units of [matches/degree]. The temperature scale here begins at zero for the zero energy ground state, analogous to the Kelvin scale, but the scale itself is arbitrary (as is the energy of a "match"). It varies with effects like immune strength as discussed in the paper. In a physical system the binding energy would be specified in standard units (e.g. SI) and k B and Temperature defined accordingly.

Average Energy
As discussed in the text, the expectation value of the energy of a viral state with N total viruses in the environment at a given temperature and maximum immune response is: We plot this quantity in Fig I, and find all along the T=0 axis, the energy for all immunities is zero, as required.

Specific Heat
The main feature of the specific heat, defined in the manuscript by Eq (13), is the sharp peak along the apparent first order phase transition (Fig J).

Fig J.
Heat Capacity of the Cells, C V /k B T

Thermodynamic Temperature
To determine how effective temperature is related to real thermodynamic temperature, we calculate how the (genetic) states of the virus are distributed in energy.
where (E) is the number of accessible states at energy E. The accessible states represent the entire cohort of N viruses. In the previous sections we calculated the equilibrium viral state and its properties as a function of effective temperature (T model ) and maximum immune response (A). At a given effective T model and A, each state has a welldefined number of viruses, N, and a probability distribution, P m , representing the number of genetic matches (and mismatches) between the virus and target. While N and P m are sufficient to calculate average properties (e.g., average energy), in order to calculate thermodynamic temperature one must enumerate the complete set of realizations of all systems with N viruses, and probability of match distributed as P m. In order to calculate (E), we need to do a careful counting of states as a function of energy. The first step is to translate the probability distribution P m shown in manuscript Fig 3 as a function of matches into the corresponding distribution as a function of energy, P(E). Where we define and w j ∑ (S17c) In the equations above, N j , is the number of viruses in eigenstate j and, n E, is the number of states at energy E. We define w j as the distribution width of eigenstate j; n ij is the occupation of the i th "match" bin (measured in number of viruses) for the j th eigenstate. With these definitions, we can define the accessible states in energy as in manuscript Eq (16).
As discussed in the paper, in order to calculate thermodynamic temperature one must enumerate all realizations of all systems with N viruses, and a probability of match distributed as P m . This would include, for example, a state with N viruses all of which have zero matches but with very small probability. We note that it also includes states with mixed numbers of matches (every combinatorial possibility). For example, for N = 13, one possible state might have a single virus with zero matches, 4 viruses with 13 matches, and a final 8 viruses with 26 matches. The probability of this state is: For the example above nij ={1,4,8}.
Even this state can be represented in several different ways depend on which of the 13 viruses have m=13 vs. m=26, etc. It is clear that even for moderate N, the number of possible combinations is large. In principle enumerating all the states requires 51 nested computational loops.
Fortunately, we observe (see manuscript Fig 5a-d) that all distributions, P m , have a welldefined width. Given the finite distribution width, w, we can reduce the number of possible states by only considering a number of matches where the value of P m is greater than a threshold, which we take to be 10 -4 . With this approximation, no distribution is observed to have a width greater than 26 matches. Thus, the required computation can be done with a maximum of 26 nested logical loops (called by recursion).

Robustness vs. Evolvability
As discussed in the paper, we also plot the order parameter m robust /50, a measure of the evolvability or adaptive diversity of each quasistate, as a function of evolvability measured from the width of the normalized eigenstates P m . We find a universal curve (manuscript Fig 10) where trajectories in temperature and maximum immune response, A, lead away from the phase transition observed in manuscript Figs 5-7. In Fig K we show the normalized eigenstates as a function of immunity at several values of temperature shown inset. This figure is complementary to manuscript Fig 5. The figure indicates increasing immunity, A, by the colors progression black (low), yellow, red, green, blue (high). The phase transition is evident in Fig I as the "gap" between the ordered and disordered regimes at certain temperatures. From the color progression the figure demonstrates that the eigenstates or quasispecies distributions always move away from the gap as immune pressure is increased. At any T and A, there are two different pressures on the virus. Which direction the virus evolves in response to these combined pressures depends on the relative gradient of each environmental factor.

Fig K.
The normalized eigenstates, P(m) as a function of maximum immune response, A, at several values of temperature shown in inset. Increasing immunity, A, are indicated by the color progression black (low), yellow, red, green, blue (high). From the progression with increasing A, the figure shows that the quasispecies distributions always move away from the gap as immune pressure is increased.