Similarity Selection and the Evolution of Sex: Revisiting the Red Queen

For over 25 years, many evolutionary ecologists have believed that sexual reproduction occurs because it allows hosts to change genotypes each generation and thereby evade their coevolving parasites. However, recent influential theoretical analyses suggest that, though parasites can select for sex under some conditions, they often select against it. These models assume that encounters between hosts and parasites are completely random. Because of this assumption, the fitness of a host depends only on its own genotype (“genotypic selection”). If a host is even slightly more likely to encounter a parasite transmitted by its mother than expected by random chance, then the fitness of a host also depends on its genetic similarity to its mother (“similarity selection”). A population genetic model is presented here that includes both genotypic and similarity selection, allowing them to be directly compared in the same framework. It is shown that similarity selection is a much more potent force with respect to the evolution of sex than is genotypic selection. Consequently, similarity selection can drive the evolution of sex even if it is much weaker than genotypic selection with respect to fitness. Examination of explicit coevolutionary models reveals that even a small degree of mother–offspring parasite transmission can cause parasites to favor sex rather than oppose it. In contrast to previous predictions, the model shows that weakly virulent parasites are more likely to favor sex than are highly virulent ones. Parasites have figured prominently in discussions of the evolution of sex, but recent models suggest that parasites often select against sex rather than for it. With the inclusion of small and realistic exposure biases, parasites are much more likely to favor sex. Though parasites alone may not provide a complete explanation for sex, the results presented here expand the potential for parasites to contribute to the maintenance of sex rather than act against it.

As an alternative to describing the population by the extended genotype frequency distribution, we can use the allele frequency and the patterns of associations of alleles within the extended genotype. The frequency of the upper case allele (i.e., M, A) at the k th locus in the i th genome is given by p i,k ! p i,k = F " " i, j,k j # " # (S1.1) For example, the frequency of the M allele among individuals of the current generation is p 1,1 whereas the frequency of M amongst the mothers of the individuals of the current generation is p 2,1 . (For simplicity, in the main text p 1,1 is denoted simply as p M and p 1,2 is denoted as p A .) It is useful to note that in the absence of mutation, allele frequencies in the offspring at the beginning of a generation should equal the equivalent allele frequencies among their mothers, i.e., p 1,k = p 2,k . However, after selection this may no longer be the case.
In addition, to allele frequencies, it is necessary to quantify the associations among the allelic states in the extended genotype. The symbol C {O1,O2|D1,D2} represents the association among loci in set O1,O2 in the offspring and the loci in set D1,D2 in the dam. O1 and D1 refer to alleles in the first haplotype of offspring and maternal genomes, respectively, whereas O2 and D2 refer to alleles in the second haplotype of offspring and maternal genomes, respectively. For example, C {MA,A|Ø,Ø} is an association involving loci only in offspring; specifically, it is the three-way association among the modifier allele and the fitness allele on the first haplotype of the offspring genome and the fitness allele on the second haplotype of the offspring genome. (More simply, this association can be described as the association between the modifier and homozygosity at the A locus in offspring.) This association involves no loci in either haplotype comprising the maternal genome as signified by ∅,∅ in the subscript. In general, the association C {O1,O2|D1,D2} is quantified as Products over empty sets are defined as 1; e.g., if O1 = {∅} then Organisms are semelparous and the life cycle involves three stages: organisms are born, selection occurs, organisms reproduce. The pre-subscripts b, s, and r are added to the symbols above to denote these stages. For example, b F Ω is the frequency of individuals with extended genotype Ω before selection, s p 1,1 is the frequency of M in individuals of the current generation after selection, and r C {A,A | Ø,Ø} is a measure of homozygosity at the A locus in the offspring of the following generation. Note that values denoted as being "after reproduction" in one generation are equivalent to being "before selection" in the following generation, i.e., r F Ω [t] = b F Ω [t + 1]. In the main text, the pre-subscripts are omitted from these symbols for simplicity as all values presented there refer to values before selection.
The frequency of individuals with extended phenotype Ω after selection is given by where w Ω is equation 1 from the main text evaluated with X a,1 = 1 -Ω 1,1,2 , X a,2 = 1 - Reproduction follows selection. Considering the entire population, the fraction of offspring in the following generation produced sexually is The frequency of haplotype x amongst the gametes produced by an individual with extended genotype Ω is Ψ Ω,x . These are haplotypes as normally defined (i.e., genotypes of the products of meiosis; they are not "extended haplotypes"). That is, Let the extended genotype {{o1, o2}, {d1, d2}} represent an offspring carrying haplotypes o1 and o2 from a dam carrying haplotypes d1 and d2. The frequency of such offspring in the next generation is given by (i.e., if the diploid genotypes are the same) but is otherwise zero. As defined in the main text, f is the fraction of sexually-produced offspring which are derived through sporophytic selfing. The first two terms in the equation above represent individuals created through asexual reproduction. There are two terms because the assignment of haplotype order is arbitrary. The latter two terms represent individuals created through sexual reproduction. Within the brackets of each of these latter terms, there are two terms representing individuals created through selfing and outcrossing, respectively.
Using the equations above, the extended genotype frequencies can be calculated at each stage of the life cycle. From these extended genotype frequencies, the allele frequencies and association measures can be calculated at each stage using equations S1.1 and S1.2. The goal is to determine the change in the frequency of the modifier over  To confirm the accuracy of the QLE approximation, simulations of the recursions above were compared to the QLE prediction. In these simulations, the A allele is beneficial and goes from low frequency to high frequency. At each generation during the sweep of A, the change in the frequency of the modifier allele m is calculated exactly from the simulation. The approximate change in the frequency of m is also calculated using the QLE prediction, i.e., equation 3 of the main text. In Figure S1.1, the actual changes are compared to the QLE predictions. The QLE prediction is usually in the same direction and is often close in magnitude to the actual change. As expected, the QLE prediction is least accurate when the baseline level of sex is low (e.g., σ = 0.1) and selection is strong.
Haploid, three-locus model As described in the main text, the A and B loci affect fitness and M locus affects an organisms investment into sexual versus asexual reproduction as well as its recombination rates. The model follows the basic layout of the diploid, two-locus model.
As in the previous model, the extended genotype of an individual is described by the tensor Ω. Element Ω i,k gives the allelic state of the individual at the k th locus (k ∈ {1,2, 3} for M, A, and B, respectively) of the i th genome (i ∈ {1,2} for individual's own genotype and its mother's genotype, respectively). Because organisms are haploid, no index is needed to indicate first or second haplotype within a genome. Allele frequencies and genetic associations are calculated as described above for the diploid model with the obvious adjustments for haploid organisms, i.e., sums and products are only done over one haplotype per genome rather than two. Symbols for the allele frequencies and associations are used analogously to the diploid model.
Reproduction follows selection. The fraction of offspring in the following generation that will be produced sexually is σ T = Σ Ω ( s F Ω σ Ω ) where σ Ω = σ + δσ (1 -Ω 1,1 ). The frequency of extended genotype Ω amongst the parents of offspring that will be produced by sex is s F Ω σ Ω /σ T . The frequency of haploid offspring of haplotype y amongst the progeny produced through sex between a female parent of extended genotype Ω1 and a male parent of extended genotype Ω2 is Ψ Ω1×Ω2,y . Values for Ψ Ω1×Ω2,y are calculated following the rules of meiosis. The recombination rates used in employing these rules depend on the genotypes of the parents. The recombination rates in the M-A interval and the A-B interval are r MA + δr MA (2 -(Ω1 1,1 + Ω2 1,1 )) and r AB + δr AB (2 -(Ω1 1,1 + Ω2 1,1 )), respectively.
Let the extended genotype {o1, d1} represent an offspring of haplotype o1 from a dam of haplotype d1. Considering both asexual and sexual reproduction, the frequency of such offspring in the next generation is given by where r MAB = 1 -(1 -r MA ) (1 -r AB ) and r MB = r MAB -r MA r AB . Using the QLE approximations above, the change in the modifier is found to be Substitution of equations S1.8a-f into the equation above gives equation 6 of the main text.
Extension to other studies of similarity selection In principle, the methods described here could be used to investigate other models incorporating similarity selection. A key step is defining the extended genotype appropriately to contain all necessary information. This means that the extended genotype must include information on all other individuals who affect the fitness of the focal individual. For example, in the models above, the extended genotype contained information an individual's own genotype as well as its mother's genotype. If one was investigating a model in which an individual's fitness depended on its siblings (e.g., Tangled Bank), then the extended genotype would need to include information on the genotypes of siblings. In practice, it would be become very difficult to use extended genotypes containing individual genotypes for more than a few siblings. One potential solution would be to use summary statistics to describe the genetic properties of the relevant relatives. For example, rather than listing the genotypes of each sibling as part of the extended genotype, one could simply include only the mean allele frequencies among siblings. It is necessary that fitness be defined in a compatible manner.  α a = 0.1 ι A = α a /2 γ A = α a /100 α a = 0.01 ι A = α a /2 γ A = α a /10 α a = 0.01 ι A = α a /2 γ A = α a /100 α a = 0.1 ι A = α a /2 γ A = α a /10