Why are viral genomes so fragile? The bottleneck hypothesis

If they undergo new mutations at each replication cycle, why are RNA viral genomes so fragile, with most mutations being either strongly deleterious or lethal? Here we provide theoretical and numerical evidence for the hypothesis that genetic fragility is partly an evolutionary response to the multiple population bottlenecks experienced by viral populations at various stages of their life cycles. Modelling within-host viral populations as multi-type branching processes, we show that mutational fragility lowers the rate at which Muller’s ratchet clicks and increases the survival probability through multiple bottlenecks. In the context of a susceptible-exposed-infectious-recovered epidemiological model, we find that the attack rate of fragile viral strains can exceed that of more robust strains, particularly at low infectivities and high mutation rates. Our findings highlight the importance of demographic events such as transmission bottlenecks in shaping the genetic architecture of viral pathogens.

. . . when the process is initiated by one i-mutant. Because there are no back mutations, it solely depends on r i+j , j 0, and we write F i (r i , r i+1 , . . .).
We assume throughout this paper that the average number of (i + j)-mutants produced by an i-mutant is w i e −u u j /j!. This is achieved for instance if the total number of offspring of an i-mutant follows a Poisson distribution P (w i ), and if each of its offspring accumulates a number of additional deleterious mutations according to P (u). In this particular case, the i-th generating function is

Survival probability
Let m i = w i e −u the average number of i-mutants produced by an i-mutant. It follows from a general result of branching processes theory that if m i 1, any population starting with one i-mutant is doomed to extinction; if m i > 1, this population has positive survival probability. Let K = max{k : m k > 1} the largest number of mutations which can be carried by an individual without leading to its progeny's extinction, and call supercritical any k-mutant with k K. By independence of the lineages, the fate of the population (X n ) n∈N only depends on the initial numbers of supercritical mutants n = (X 0,0 , . . . , X 0,K ). First, we compute for each k K the partial extinction probability p ext,k (n) of all mutants up to type k, given the initial condition n, namely the probability of {X n,i → 0, ∀i k}. It corresponds to the extinction probability of the reducible branching process with finite set of types {0, 1, . . . , k} and generating functions F i (r i , . . . , r k , 1, 1, . . .), i k. As such [2], the partial extinction probability is given by where (q k0 , . . . , q kk ) is the smallest nonnegative solution of the system of equations (see Eq (19) for an explicit solution when F i is given by Eq (1)). The overall survival and extinction probabilities of the population are then obtained as

Muller's ratchet click probabilities
For a population starting with supercritical mutants n and k K, let p k (n) the probability that the fittest surviving individuals carry k mutations. Equivalently, it is the probability that Muller's ratchet clicks k − k 0 times, where k 0 = min{i : n i > 0} is the fittest initial supercritical type. It corresponds to event {X n,k 0, X n,i → 0, ∀i < k} (all i-mutants with i < k go extinct but k-mutants survive), therefore p 0 (n) = 1 − p ext,0 (n) , p k (n) = p ext,k−1 (n) − p ext,k (n) , It comes in particular p surv (n) = K k=0 p k (n), consistent with the fact that any click beyond the threshold K leads to extinction.

Mutant spectra and asymptotic mean fitness
The late-time composition of the population is completely determined by the number of clicks of the ratchet. On the event Ω k that the type of its fittest surviving individuals is k, the asymptotic behavior of the population can be described with the help of the branching process Z n with offspring generating functions G i = F i+k , i 0, which only considers particles with type greater than k. To study this process we use results from the theory on decomposable branching processes [3], also used in [4] in the context of a viral population. The offspring mean matrix of the process is the upper triangular matrix M = (m k+i,k+j ) i,j 0 , where m ij is the average number of j-mutants born from an i-mutant in the original process X n . Its largest eigenvalue is m k > 1, and u = (1, 0, . . .), v = (u/s d ) i /i! i 0 are respectively non-negative right and left eigenvectors of M corresponding to m k , satisfying u · v = 1. Assuming Z 0 = (1, 0, . . .), then m −n k Z n almost surely converges to W v, where W is a non-negative random variable with mean value 1, and {W = 0} = {Z n,0 → 0}. If Z 0 = (z 0 , 0, . . .), then by independence of the lineages m −n k Z n → W * z0 v, where W * z0 is the sum of z 0 independent copies of W . The limit remains the same if Z 0 = (z 0 , z 1 , . . .), since the z i i-lineages with i > 0 have growth rate m k+i < m k and tend to 0 if normalized by m n k .
It follows that for any initial value Z 0 , Z n / j Z n,j → e −u/s d v almost surely on the event {Z n,0 0}. Let N n = i X n,i be the total population size at time n and let We deduce from what precedes that the fraction of i-mutants (i k) in the population satisfies while this proportion converges to 0 if i < k. Since (Ω k ) 0 k K forms a partition of the survival set, the previous result can be stated as follows (setting the frequency to 0 if N n = 0): where Y i is a random variable equal to f i−k with probability p k (n), k min (i, K), and equal to 0 with probability p ext (n) + K k=i+1 p k (n). Therefore, the mean population fitness satisfies lim where w ∞ = i 0 w i Y i is a random variable equal to w k e −u with probability p k (n), 0 k K, and equal to 0 with probability p ext (n). It follows that the expected asymptotic population mean fitness is given by B Survival through multiple bottlenecks

Markov chain model
In order to assess the effect of successive bottleneck events on the viral population, we introduce a stochastic model describing the evolution of the random composition of the population after each size reduction event. If the population reaches its carrying capacity C, a sample of size B is taken, resulting in a new founding population. As described earlier, the fate of this new population only depends on its initial numbers of supercritical mutants n, with |n| = K k=0 n k B (the sample might include some particles which are not supercritical). We choose C large enough such that (i ) the probability of not reaching C is close to the extinction probability of the population, (ii ) if it reaches C, the mutant frequencies in the population follow the asymptotic random distribution Eq (7). The evolution of the viral population going through these multiple size reductions is then entirely determined by the successive states n 0 , n 1 , . . ., where n b describes the supercritical fitness classes after b bottleneck events. We describe this random evolution via a Markov chain with state space S = n ∈ N K+1 , |n| B , and transition probability from state m to state n: where Q k (n) is the probability of obtaining n supercritical mutants in the sample of size B, if the fittest surviving individuals in the population founded by m are of type k (event of probability p k (m)). It follows from Eq (6) that on the latter event, the asymptotic proportion of non-supercritical mutants is 1 − f i }, the last coordinate corresponding to the non-supercritical mutants. Therefore, the probability of getting n supercritical mutants is Extinction probability under multiple bottlenecks Let P = (P (m → n)) m,n∈S the stochastic matrix of this Markov chain, and P b its b-th power. Then the probability for a viral population with initial state n 0 to become extinct after going through at most b bottlenecks is i.e. entry (n 0 , 0) of matrix P b . This probability can thus be computed explicitly as soon as the extinction and click probabilities p ext and p k are known, which is the case if for instance the numbers of offspring and of accumulated mutations follow Poisson distributions (see Eq (19)).

Incubation period distribution
How long does it take for the in-host population carried by an exposed individual E to reach size C? Assuming C is large enough, this random time can be approximated thanks to the asymptotic behavior of the population. Let n the initial numbers of supercritical viral particles in the in-host population. As described earlier, if the population does not go extinct, its limit behavior depends on the type k of its fittest surviving particles. The distribution of the incubation period thus depends on n and k. When a host is infected, we therefore specify these parameters n and k and denote by τ n,k , E n,k and I n,k the corresponding incubation period, exposed and infectious states. More specifically, when a susceptible S meets an infectious individual I m,l , it receives n supercritical viral particles with probability Q l (n), n ∈ S, given by Eq (11). The susceptible individual then either remains in state S with probability p ext (n), or enters in exposed state E n,k with probability p k (n), 0 k K, given by Eq (5). If so, it remains in E n,k for τ n,k time steps Eq (13) before entering infectious state I n,k . Let N n the total viral population size carried by E n,k at time n. We deduce from previously mentioned results that m −n k N n converges almost surely to e u/s d W n,k , where W n,k is a positive random variable whose distribution is given by Eq (15), and explicitly by Eq (21) in the particular case of Poisson distributed numbers of offspring and mutations described by Eq (1). For C large enough, the time needed for N n to reach size C can consequently be approximated by In the general setting, the random variable W n,k is approximately the sum of L independent copies of V , where (i ) L 1 is the random number of surviving k-lineages (i.e. a line of descent of a k-mutant with no k-mutants as ancestors), and (ii ) V is the limiting distribution of m −n k X n on its survival set, if (X n ) n is a monotype branching process modeling only k-mutants. Although not all explicit, we can state the following general results on L and V .
(i ) Let k 0 the fittest initial type in n. If Muller's ratchet does not click, i.e. if k = k 0 , then the number L of surviving k-lineages is at least 1 and at most n k . Since the survival probability of a k-lineage is 1 − q kk , L follows a binomial distribution with parameters n k and 1 − q kk , conditioned on being positive. If k > k 0 however, there might be more than n k surviving k-lineages (namely also those stemming from the k−1 i=k0 n i initial mutants). We therefore approximate the maximum number of surviving k-lineages by the average number of k-mutants born from n after one time-step, conditioned on the event that the surviving individuals are of type k (which we denote Ω k ). This average is equal to k i=k0 n imik , wherem ik is the mean number of k-mutants born from one i-mutant, conditionally on Ω k . In the following computation, the subscript i indicates that the process starts with one i-mutant, and q k stands for the infinite vector (q k0 , . . . , q kk , 1, . . .) defined in Eq (4). It comes from Eq (2) and Eq (5) that P i (Ω k ) = q k−1,i − q ki , and that for any infinite vector x, using the notation x y = i∈N x yi i , P x (Ω k ) = q x k−1 − q x k . With this notation, F i (r) = E i r X1 and r k ∂Fi ∂r k (r) = E i X 1k r X1 . As a result, the meanm ik is obtained as (ii ) Let (X n ) n a branching process with X 0 = 1 and generating function F (r) := F k (r, 1, 1, . . .), r ∈ [0, 1]. It is therefore supercritical with mean offspring number m k > 1. From the theory on monotype branching processes [1] we know that its extinction probability is the smallest nonnegative solution of F (r) = r, which is by definition q kk given by Eq (4). Moreover, m −n k X n → W almost surely, where W is a nonnegative random variable whose Laplace transform ϕ (s) = E e −sW is the unique solution of ϕ (m k s) = F (ϕ (s)). Then the desired random variable V is distributed as W conditioned on W > 0. Since P (W = 0) = q kk , the Laplace transform of V is φ (s) = (ϕ (s) − q kk ) / (1 − q kk ).
To summarize, we assume that the random variable W n,k appearing in Eq (13) is given by where L is a random integer such that and where the V (i) are independent copies of the random variable V with Laplace transform  (1)), an exact computation of (q k0 , . . . , q kk ) can be found, leading to explicit expressions of the extinction and Muller's ratchet click probabilities Eq (4)-Eq (5) and of the expected asymptotic population mean fitness Eq (9). Indeed, system Eq (3) becomes where W [·] stands for the principal solution of the Lambert W function.
In this particular case, we can also compute Eq (14) and Eq (18), and therefore provide the explicit distribution Eq (15) of W n,k involved in the incubation period of the exposed state Eq (13). Indeed, with F i given by Eq (1), we have ∂Fi ∂r k (r) = m ik F i (r), where m ik = w i e −u u k−i / (k − i)!. In addition, since q k satisfies by definition F i (q k ) = q ki , Eq (14) becomes Finally, we show that, approximately, the random variable V involved in Eq (15) follows an exponential distribution with parameter 1 − q kk . Indeed, the latter means that φ (s) = 1−q kk 1−q kk +s , and thus ϕ (s) = 1 − (1−q kk )s 1−q kk +s . We have on the one hand ϕ (m k s) = 1 − m k (1 − q kk ) s 1 − q kk + s , while on the other hand hence for m k close to 1, ϕ (m k s) ≈ F k (ϕ (s) , 1, . . .), leading to our rough approximation. As the sum of independent exponentially distributed variables, W n,k thus follows a Gamma distribution with shape and rate parameters W n,k ∼ Gamma (L, 1 − q kk ) where L is given by Eq (16) with n i m ik q k−1,i − q kk q ki q k−1,i − q ki , k > k 0 .

Binary reproduction
Other reproduction mechanisms can be considered, with the same key assumption that the average number of (i + j)-mutants produced by an i-mutant is w i e −u u j /j!. For example, we can assume a binary reproduction (although less relevant for viral populations) where an i-mutant either dies, or produces two offspring with probability w i /2, each of them accumulating a random number of additional mutations following P (u). In this case, the generating function is quadratic and system Eq (3) comes down to which can be solved explicitly as well.