Effect of the SOS Response on the Mean Fitness of Unicellular Populations: A Quasispecies Approach

The goal of this paper is to develop a mathematical model that analyzes the selective advantage of the SOS response in unicellular organisms. To this end, this paper develops a quasispecies model that incorporates the SOS response. We consider a unicellular, asexually replicating population of organisms, whose genomes consist of a single, double-stranded DNA molecule, i.e. one chromosome. We assume that repair of post-replication mismatched base-pairs occurs with probability , and that the SOS response is triggered when the total number of mismatched base-pairs is at least . We further assume that the per-mismatch SOS elimination rate is characterized by a first-order rate constant . For a single fitness peak landscape where the master genome can sustain up to mismatches and remain viable, this model is analytically solvable in the limit of infinite sequence length. The results, which are confirmed by stochastic simulations, indicate that the SOS response does indeed confer a fitness advantage to a population, provided that it is only activated when DNA damage is so extensive that a cell will die if it does not attempt to repair its DNA.


Introduction
Genetic repair is an essential component of cellular genomes. Without mechanisms for repairing damaged and mutated DNA, genomes could not achieve sufficient information content to code for the variety and complexity of modern terrestrial life [1].
Genetic repair mechanisms fall into two main categories: Those that correct base mis-pairings during the replication cycle of a cell, and those that repair mutated and damaged DNA during the growth (G) phase of the cellular life cycle [1].
Two important examples of the first class of repair mechanisms are DNA proofreading and mismatch repair (MMR). DNA proofreading is a repair mechanism that is built into the DNA replicases themselves. During daughter strand synthesis, an erroneously matched base is excised, and a second attempt at a base pairing is made [1]. Mismatch repair also removes erroneous bases from the daughter strand, but does this shortly after daughter strand synthesis [1].
Two important examples of the second class of repair mechanisms are Nucleotide Excision Repair (NER) and the SOS response [1]. NER protects a cell from damage due to radiation, chemical mutagens, and metabolic free radicals by removing damaged portions of the DNA strand and using the other, presumably undamaged strand as a template for re-synthesis of the excised region [1].
The SOS response is a genomic repair mechanism that only activates when there is extensive damage to the cellular genome. When DNA damage is sufficiently extensive, the cell stops growing, and the SOS repair pathways attempt to restore complementarity to the genome [1]. The SOS response only takes effect when DNA damage is so extensive that it may be impossible to use undamaged template strands to correctly re-synthesize damaged portions of the genome. Thus, although this means that the SOS repair mechanism is highly error prone, it is evolutionary advantageous for the cell to repair the genome and risk fixing deleterious mutations, than it is to leave the damaged genome unrepaired [1][2][3][4].
This paper continues the theme of incorporating various details characteristic of cellular genomes by developing a quasispecies model that takes into consideration the SOS repair mechanism. The model is highly simplified, and therefore only a first step in developing proper evolutionary dynamics equations with SOS repair. Nevertheless, because our model is analytically tractable, we believe it is a useful and important initial approach to mathematically modeling the evolutionary aspects of the SOS repair pathway. A proper modeling of the SOS response is an essential component of developing a quantitative theory of mutation-propagation in cellular organisms, which is important for understanding phenomena such as the emergence of antibiotic drug resistance in bacteria, and cancer in multicellular organisms [2][3][4].

Definitions and Model Set-Up
We consider a unicellular population of asexually replicating organisms, whose genomes consist of a single DNA molecule, i.e. one chromosome. The genome may then be denoted by s,s' f g, where s, s' denote the two strands of the DNA molecule. If the genome is of length L, then we may write s~b 1 . . . b L , s'~b' 1 . . . b' L where each base b i , b' i is chosen from an alphabet of size S (usually~4). If b b i denotes the base complementary to b i (for the standard Watson-Crick bases, the pairings are Adenine A ð Þ{Thymine T ð Þ, Guanine G ð Þ{Cytosine C ð Þ), and s s denotes the strand complementary to s, then s s~ b b L . . . b b 1 . This follows from the antiparallel nature of double-stranded DNA [1].
We let n s,s' f g denote the number of organisms with genome s,s' f g, and we assume that replication occurs with a genomedependent, first-order rate constant, denoted k s,s' f g . The set of all k s,s' f g defines the fitness landscape. It should be emphasized that fitness in this model only refers to replication rate. In particular, this model does not consider cell death.
The semiconservative replication of the DNA genomes happens in three stages: 1. Strand separation, whereby each strand of the chromosome separates to act as a template for daughter strand synthesis. 2. Daughter strand synthesis. We assume a genome and baseindependent mismatch probability e. This error probability e includes all error correction mechanisms, such as proofreading and mismatch repair, that are active during the replication phase of the cell. 3. Lesion repair, where any post-replication mismatches are removed. Here, there is no longer the parent-daughter strand discrimination that was available during daughter strand synthesis, so in contrast to DNA proofreading and mismatch repair, lesion repair has a 50% chance of removing the mutation, and a 50% chance of communicating it to the parent strand and fixing the mutation in the genome (the lesion repair can occur via either Base or Nucleotide Excision Repair) [1]. We also do not assume that lesion repair is perfectly efficient, so that we consider a genome and base-independent probability l of removing a mismatch. We call l the lesion repair efficiency.
In our simplified model, the SOS response is triggered if a given genome has at least l S mismatches. The replication rate of all cells undergoing SOS repair is zero. We assume that removal of mismatches is catalyzed by an enzyme that binds to a mismatch and then eliminates the mismatch at a rate characterized by a firstorder rate constant k SOS . Therefore, the probability that a given mismatch is eliminated over an infinitesimal time interval dt is given by k SOS dt (see Figure 1).
In this paper, we will consider the behavior of the model in the limit of infinite sequence length. If m: L is held constant as L??, then the probability of an error-free daughter strand synthesis is given by 1{ ð Þ L ?e {m . Therefore, fixing m in the infinite sequence length limit is equivalent to fixing the pergenome replication fidelity. It should be noted that m is the average number of mismatches produced per DNA strand per replication cycle.
The assumption of infinite sequence length is a common assumption in quasispecies theory, because it is the mathematical formalization of the long genome-length regime that makes the neglect of backmutations exact. While finite genome length effects need to be considered in dynamic fitness landscapes, where adaptation to specific genomes is important [18], for static landscapes (like the one being considered in this paper), good agreement with the infinite sequence length results may be obtained with genomes as short as ten bases.
Finally, we assume that the fitness landscape is defined by a master genome s 0 , s s 0 f g. Specifically, we define a genome s,s' f g to be viable, with a first-order growth rate constant kw1, if it has fewer than l mismatches, and if it does not differ from s 0 , s s 0 f gby any fixed mutations. Otherwise, the genome is unviable, with a first-order growth rate constant of 1. We recognize that this terminology is somewhat inappropriate, since a genome with a first-order growth rate constant of 1 may still replicate. However, this is standard terminology from quasispecies theory, where ''viable'' and ''unviable'' are taken to be synonymous with ''higher fitness'' and ''lower fitness'' respectively.
The justification for this choice of fitness landscape is as follows: If a genome has a fixed mutation, then neither DNA strand corresponds to either of the master strands s 0 , s s 0 . As a result, the genome does not contain all of the information corresponding to a viable organism, hence the organism is unviable. While this assumption is clearly extreme and oversimplified, it is the analogue of the single-fitness-peak landscape for single-stranded genomes.
However, in the case of a mismatch where one of the bases is the same as the corresponding base in one of the master strands, the information contained in the master genome is still preserved in one of the strands, so that the organism is assumed to remain viable if the total number of such mismatches does not exceed some cutoff value l.
For convenience, Table 1 summarizes the main parameters of the model.

Symmetrized Population Distribution
We can develop the infinite sequence length equations for our model, assuming an initially prepared clonal population consisting entirely of the wild-type (mutation-free) genome s 0 , s s 0 f g, i.e. a population consisting entirely of the fastest replicating genotype.  The Hamming distance between two sequences s 1 and s 2 is simply equal to the number of positions by which they differ. It may be readily shown that the Hamming distance is a metric over the space of sequences [19], so that, in particular, the Hamming distance satisfies the Cauchy-Schwartz Inequality: Given three sequences s 1 , Þ. Now, in the limit of infinite sequence length, it may be shown that, with probability 1, that the Hamming distance between s 0 and its complement s s 0 is infinite [12]. Therefore, if ð Þfor a genome s,s' f g, where it is understood that s is a finite Hamming distance from s 0 and s' is a finite Hamming distance from s s 0 . A given genome s,s' ð Þ may then be characterized by four parameters l C , l L , l R , and l B . We let l C denote the number of sites where s and s' are both complementary, yet differ from the corresponding bases in s 0 and s s 0 . We let l L denote the number of sites where s differs from s 0 , but s' is identical to s s 0 . We let l R denote the number of sites where s is identical to s 0 , but s' differs from s s 0 . Finally, we let l B denote the number of sites where s and s' differ from s 0 and s s 0 , but are not complementary (for an illustration of these parameters, see [7,13]).
Note that the fitness landscape depends only on l C , l L , l R , and l B , and hence the fitness of a given organism may be denoted by k lC ,lL,lR,lB ð Þ , where for our single-fitness-peak landscape we have k lC ,lL,lR,lB ð Þ k if l C~0 and l L zl R zl B ƒl, and 1 otherwise. The condition l C~0 means that there are no mutations fixed in the genome, while the condition l L zl R zl B ƒl means that there are fewer than l lesions.
By the symmetry of the fitness landscape, and by the symmetry of the initial population distribution, we can group all genomes of identical l C , l L , l R , and l B , and derive the dynamical equations of the symmetrized population distribution. We therefore let n lC ,lL,lR,lB ð Þ denote the total number of organisms in the population whose genomes are characterized by the parameters l C , l L , l R , and l B , and we let n SOS ð Þ lC ,lL,lR,lB ð Þ denote the total number of organisms in the population undergoing the SOS response, whose genomes are similarly characterized by the parameters l C , l L , l R , and l B . The corresponding population fractions are denoted z lC ,lL,lR,lB ð Þ and z SOS ð Þ lC ,lL,lR,lB ð Þ , respectively.

Dynamical Equations
To develop the dynamical equations for both the z lC ,lL,lR,lB ð Þ and the z SOS ð Þ lC ,lL,lR,lB ð Þ quantities, we begin by considering a genome s,s' ð Þ, characterized by the parameters l C , l L , l R , and l B . We first consider the case where this genome is not undergoing the SOS response. Then, due to the semiconservative nature of DNA replication, this genome is being destroyed at a rate given by {k lC ,lL,lR,lB ð Þ n lC ,lL,lR,lB ð Þ . This genome, however, is produced by other genomes in the population, as a result of replication. So, consider some other genome s'',s''' ð Þwhich produces s,s' ð Þ upon replication. This can either occur via the s'' template strand, the s''' template strand, or both.
Because sequence lengths are infinite, the probability of a mismatch in one of these bases during daughter strand synthesis is 0. In the remaining sites, let l '' 1 denote the number of mismatches that are not corrected, and l '' 2 denote the number of mismatches that are repaired, but fixed as a mutation in the genome. Then the resulting genome s,s' ð Þ is characterized by: The probability of a given set of mutations corresponding to l '' 1 , sites on s'' remain identical to s 0 , and the corresponding daughter strand sites are identical to s s 0 . The per-site probability of this is the probability of error-free daughter strand synthesis, 1{ , plus the probability of a mismatch, times l, the probability that complementarity is restored, times 1=2, the probability that complementarity is restored correctly. It is assumed that complementarity is restored by various DNA repair mechanisms, such as Nucleotide Excision Repair (NER) and Base Excision Repair (BER) [1]. However, because NER and BER do not distinguish between parent and daughter strands, the probability of correctly removing a mismatch via these mechanisms is 1=2.
The degeneracy is given by Þ !, so in the limit of infinite sequence length the total probability becomes, If s,s' ð Þ is generated by s''', then we have, We also obtain an overall transition probability of ð Þ m . It is important to note from the s'' and s''' results that genomes with l B w0 cannot be generated during replication. Since SOS repair eliminates mismatches, it follows that a population where l B is initially 0 for all genomes will always have a population where l B~0 . Therefore, we may assume in subsequent derivations that l B , l '' B are 0.
Furthermore, note that strands s'' that are a finite Hamming distance away from s 0 can only generate daughter genomes where l L~0 , while strands s''' that are a finite Hamming distance away from s s 0 can only generate daughter genomes where l R~0 . Then for the genomes s,s' ð Þ generated by s'', we have l C~l '' C zl '' L zl '' 2 , and l R~l '' 1 . Therefore, the restriction on Then for the population number n lC ,0,lR,0 ð Þ , we have a contribution from the s'' strands of A similar expression is obtained for the population number n lC ,lL,0,0 ð Þ , except l R is replaced with l L , and the roles of l '' L and l '' R are exchanged.
It should also be noted that, by the symmetry of the fitness landscape, we have that n lC ,lL,lR,lB ð Þ n lC ,lR,lL,lB ð Þ . Another way to note this is that, for a given genome s,s' ð Þ, if we change the ordering of the strands so that the first strand is of finite Hamming distance to s s 0 , and the second strand is of finite Hamming distance to s 0 , then the genome s,s' f g must be represented as s',s ð Þ, and is characterized by the parameters l C , l R , l L , and l B . If n n lC ,lL,lR,lB ð Þ denotes the number of genomes characterized by l C , l L , l R , and l B , with respect to the s s 0 ,s 0 ð Þ strand ordering, then since there is a one-to-one correspondence between genomes s,s' ð Þ with parameters l C , l L , l R , l B with respect to the first ordering, and genomes s,s' ð Þ with parameters l C , l R , l L , l B with respect to the second ordering, it follows that n n lC ,lL,lR,lB ð Þ n lC ,lR,lL,lB ð Þ . However, since the fitness landscape is invariant under strand ordering, we have n lC ,lL,lR,lB ð Þ n n lC ,lL,lR,lB ð Þ , so that n lC ,lL,lR,lB ð Þ n lC ,lR,lL,lB ð Þ . Taking into consideration the contribution to n lC ,0,0,0 ð Þ , we may put everything together and obtain, after changing variables from population numbers to population fractions, the differential equations governing the time evolution of the various population fractions. These equations are, dz (l C ,0,0,0) dt~{ (k (l C ,0,0,0) z k k(t))z (l C ,0,0,0) where k k t ð Þ: The factor of 1=2 appearing in the SOS terms arises from the fact that when a mismatch is removed, it either corrects the daughter strand synthesis error, or it fixes the mismatch as a mutation in the genome. In the former case, the value of l C remains unchanged, while in the latter case it is incremented by 1.
It should be noted that this factor is missing in the contribution to z lC ,0,0,0 ð Þ from SOS repair. The reason for this is that this contribution comes from z we may combine identical terms and eliminate the factor of 1=2.
The factor of lz1 and l in front of the k SOS rate constant arises from the fact that the fraction of genomes whose SOS enzymes are bound to a mismatch is proportional to the total number of mismatches, hence the resulting SOS rate constant is proportional to the total number of mismatches.

Steady-State Behavior
Definitions and basic equations. To obtain the steadystate behavior of our model, we begin by introducing some definitions that will allow us to simplify the calculations.
where we set l~l S {1 whenever l was previously defined as §l S . The differential equations for z 1 , z 2 , z 3 , z 4 , z 5 , and z 6 are readily derived. From the equations, and X ?
l C~0 we obtain, We also have, We can add these equations to obtain, For the purposes of computing the mean fitness at steady-state, we can simplify the system of equations somewhat by defining z z 4~z4 z2z 5 z2z 6 . We obtain, For consistency of notation, in what follows we shall simply let z 4 denotez z 4 .

11
, and z (SOS) . To obtain the steady-state behavior of this system of equations, we begin by first solving for the steady-state of the population undergoing SOS repair. For l~0, . . . ,l S {1 we have at steady-state that, which gives, For l §l S , we have, This expression has the form of the recursion relation, x nz1~an x n {b n . Using mathematical induction, it is possible to where we define P 0 i~1 a i~1 . If we define g l m,l; k k t~?
, then imposing the requirement that lim l?? z (SOS) 0l~0 gives, at steady-state, that, Using a similar argument, we obtain, For the steady-state value of z (SOS) , we have, using the identity k k(t)~kz 1 z2kz 2 z2z 3 zz 4 , Computing the steady-state mean fitness k k t~? ð Þ. Plugging our expressions for k SOS z (SOS) 01 and k SOS z (SOS) 11 into the steady-state population fractions equations, we obtain, From these equations we may derive the equality, Below the error catastrophe, when z 1 , z 2 , z 3 are not all 0, we may cancel k z 1 zz 2 ð Þ zz 3 from both sides of the equation and re-arrange to obtain, k k t~? where, Beyond the error catastrophe, the mutation rate is sufficiently high that the selective advantage for remaining localized about the l C~0 genomes disappears, so that z 1 , z 2 , and z 3 drop to 0. The relevant steady-state equation is then, which may be solved for k k(t~?) to give, The error catastrophe occurs at the mutation rate for which the two expressions for the mean equilibrium fitness become equal. As with previous quasispecies models, the error catastrophe here also corresponds to a localization to delocalization transition over sequence space [5][6][7].
Limiting Cases. We now proceed to consider the behavior of the steady-state mean fitness for a number of limiting cases, in order to better understand our model. We consider the following cases: (1) l~1, corresponding to perfect lesion repair, so that there are non-complementary genomes in the population. (2) l S~? , corresponding to the case where no genome ever undergoes the SOS response. (3) k SOS ??, corresponding to the case where SOS repair happens rapidly, so that there is a negligible fitness penalty associated with undergoing the SOS response.
Case 1: l~1 When l~1, we get for l S w0 that g lS m,l; k k t~? This result is identical with the semiconservative quasispecies equations with perfect lesion repair, which makes sense, since here we assume that any lesion is eliminated instantaneously [13].
Optimal cutoff. If we assume that k&1, and k SOS ??, then it is possible to find the value of l S which maximizes the steadystate mean fitness k k t~? ð Þ. To do this, we define a normalized mean fitness w to be equal to k k t~? ð Þ=k, and if we divide Eq. (19) by k 2 , we obtain that w is the solution to, where, a m,l; w,k SOS ð Therefore, for large k we obtain that w? lim k?? a m,l; w,k SOS ð Þ , which gives, so that maximizing w is equivalent to maximizing f l (m,l){ 2f l S {1 (m=2,l). Now, because l must be re-set to l S {1 whenever we take l S ƒl, we can only vary l S independently of l whenever l S wl. In this regime, the expression f l (m,l){2f lS {1 (m=2,l) is maximized whenever l S~l z1.
In the regime where l S ƒl, l is re-set to l S {1, and so, and so this expression is equal to {1 for l S~1 ,2, and then increases with successive values of l S . Now, because l is re-set to l S {1 for l S ƒl, it follows that we take l~l S {1 for l S ƒlz1. For l~0, we then obtain that w is maximized over l S ƒlz1 for l S~1 , while when l~1, we obtain that w is maximized over l S ƒlz1 for l S~1 ,2. For l §2, we obtain that w is maximized over l S ƒlz1 for l S~l z1.
Therefore, in any case, we can maximize w over l S ƒlz1 by taking l S~l z1. Since we can maximize w over l S §lz1 by setting l S~l z1, it follows that w is maximized when l S~l z1.
We reach the conclusion that, when the fitness penalty for having a non-viable genome is sufficiently great, the SOS response will confer a h maximum selective advantage if it is activated when and only when the genome has sustained sufficient genetic damage so that it will be unviable without SOS repair. However, it should be emphasized that this only holds when m is not near the error catastrophe, so that w is sufficiently larger than 1 for large k that the above analysis holds.

Stochastic Simulations
We developed stochastic simulations of a unicellular population capable of undergoing the SOS response, in order to numerically test the analytical predictions of our model. We consider a constant population of genomes that is cycled over every time step. During each cycle, every genome is allowed to replicate with a probability k fs,s'g Dt, where k fs,s'g is the first-order growth rate constant of genome fs,s'g, and Dt is the length of the time step. We take Dt to be sufficiently small so that the probability of a given genome replicating more than once during a cycle is negligible.
We assume that the population initially consists of a clonal population of wild-type (mutation-free) genomes. The fitness of a given genome fs,s'g is determined by assigning l C ,l L ,l R ,l B parameters to the ordered-pairs (s,s'), (s',s) with respect to the ordered-pair (s 0 , s s 0 ). For each set of l C , l L , l R , and l B parameters, a fitness is assigned based on the fitness landscape defined previously. The fitness of the genome is then taken to be the larger of the two calculated fitnesses. In the limit of infinite sequence length, this prescription for calculating fitnesses becomes identical to the method used in the analytical solution of our model.
If a genome replicates during a cycle, then it is removed from the population, and the two daughters are added to the population of genomes. To maintain a constant population size, another, randomly chosen genome is removed from the population as well. Because this approach is simply the stochastic implementation of the quasispecies dynamics of the system, it converges to the infinite population, continuous time result as the population size gets larger and the time steps get smaller.
If a daughter genome is produced that has at least l S lesions, then it enters the SOS response, and is assigned a replication probability of 0. A genome that has initiated the SOS response continues to undergo SOS repair until all lesions have been removed, and a complementary genome has been restored. During every time step, a genome that is undergoing the SOS response has its lesions scanned, and each lesion is repaired with probability k SOS Dt. In addition to being chosen small enough so that the probability of a given genome replicating more than once during a cycle is negligible, we also choose Dt to be sufficiently small so that the probability that a given genome undergoing the SOS response has more than one lesion repaired during a cycle is also negligible.
The stochastic simulation is allowed to run for a sufficient number of time steps so that the mean fitness of the population does not change significantly, at which point the system is assumed to be at steady-state. Figures 2 and 3 show plots comparing the mean fitness obtained from the analytical solution to the mean fitness obtained from the stochastic simulations. As can be seen from the figures, the agreement between the analytical solution and the stochastic simulation is excellent.

Conclusions and Future Research
This paper developed a quasispecies approach for describing the evolutionary dynamics of a unicellular population that incorporated a simplified model of the SOS response. The model was a generalization of the single-fitness-peak landscape that is often used in quasispecies theory to study various problems in evolutionary dynamics. The model was shown to be analytically solvable, and it was found that the solution led to a maximal selective advantage to the SOS response in a manner that is broadly consistent with the behavior of actual organisms. Specifically, we showed that the SOS response should only be activated in a cell with a sufficiently damaged genome that it will be unviable if the SOS response is not activated. In such a situation, the cell has ''nothing to lose,'' meaning that it is better to attempt to repair the genome and risk introducing deleterious mutations, than it is to leave a highly damaged genome alone.