The Hitchhiking Effect of a Strongly Selected Substitution in Male Germline on Neutral Polymorphism in a Monogamy Population

Comparative genomic studies suggest that a huge number of genes that show the strongest evidence for positive selection in human are testis- or sperm-specific genes, which are possibly due to germline selection. We propose a novel selection model in which the germlines of heterozygous males in a monogamous population are under natural selection. Under this model, we study the dynamics of a strongly selected substitution in the male germline and its hitch-hiking effect on the preexisting linked neutral polymorphism. We show that the expected heterozygosity at the neural locus is reduced by , where c is the recombination rate between selected and neutral locus, s is selective coefficient of advantageous allele, and N is diploid effective population size.


Introduction
The hitchhiking effect [1] is commonly referred to a phenomenon that a selectively favored allele will change the frequencies of polymorphisms at linked loci on its way to fixation. This effect has been extensively studied in the last several decades [1][2][3] when the beneficial allele is codominant. Recently, Teshima and Przeworski examined the hitchhiking effect when the dominance coefficients of advantageous mutations are unknown [4]. To date, most models developed to characterize the hitchhiking effect assume that positive selection acts at the individual level through differential viabilities ( Figure 1A). However, positive selection may also act on the male germline ( Figure 1B) which might be common. In a comparative genomic study, based on the ratio of nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site (d N =d S ), Nielsen et al. found a surprising large number of testis-or sperm-specific genes positively selected in the humans and chimpanzees [5]. A possible explanation for this is sperm competition [5].
Males produce tens of thousands of sperms, but only one of them can fertilize an egg. This severe competition implies potential positive selection that exert pressure on sperms activities in the spermatogenesis and fertilization process, for example, motility, acrosome reaction, penetration, apoptosis during spermatogenesis etc. Indeed, evidence for selection on the male germline has emerged taking the advantage of highly improved techniques in molecular biology. Several independent studies show genotypedependent chances to fertilize eggs [6][7][8] and some reproductive proteins are identified as targets of positive selection [9]. As data on germline selection accumulates, it is critical to develop a model to characterize the dynamics of positive directional selection due to sperm competition, and its effects on linked neutral polymorphism.
This article considers the consequences of strongly-selected substitutions in the male germline on preexisting linked neutral ploymorphism. For simplicity, we make the following assumptions: (1) absence of viability selection, i.e., individuals with different genotypes have the same viability, and (2) monogamy, i.e., competition is only among different sperm haplotypes from the same heterozygous male, and there is no competition between sperms from different males.

Two-Locus Model
We study a two-locus model that describes one selected and one linked neutral locus. At the selected locus, the ancestral allele is called a and the mutated beneficial allele A. According to our assumption, the viability of individuals with genotype AA/Aa/aa are the same and can be assigned as 1, which is different from the traditional dominant additive fitness model, a special case of arbitrary dominance model that is well studied by Teshima et al. [4], but sperms with allele A have a selective advantage over sperms with allele a and their fitness are assigned as 1zs, and 1, respectively. However, sperms from a homozygous male (AA or aa) have no advantage over other sperms from the same individual since they are identical. Thus their fitnesses are set to 1. The alleles at the neutral locus are denoted by B and b. We assume that a beneficial mutation A arise at time t~0 and replaces allele a subsequently. The fixation process of alleles A may alter heterozygosity levels at the linked neutral locus.
To analyze this model, we follow Ohta and Kimura's treatment that separated the dynamics of the selected locus and that of the neutral locus [2]. Afterwards, we consider only the frequency trajectory of the beneficial allele, which is acceptable when selection is so strong that the dynamics can be treated deterministically [10]. Furthermore, we adopt a modified moment-analysis method [3] to study the effect of the beneficial allele on the neutral locus.

Frequency trajectory of beneficial allele
Assume the frequency of individuals with genotype AA, Aa and aa in the current generation is P, 2Q, and R, respectively, where Pz2QzR~1. In terms of genotype frequencies, the allele frequencies p of A and q of a are as follows: Note that pzq~Pz2QzR~1.
According to the sperm competition model we used, we can calculate the frequencies of allele A and a in the next generation. In total, there are nine possible mating pairs (Table 1). Since competition only occurs among sperms, there will be no selection at the individual level. So mating will be random and these matings take place in proportion to the genotype frequencies. For example, the proportion of Aa and AA matings is 2Q|P. The frequencies of all the nine mating pairs are given in Table 1.
Recall that, competitions only happen in the gametes produced by the heterozygous male. Thus, a mating of male Aa with female AA produces proportionally 1zs 2zs AA and 1 2zs Aa zygotes, while the mating of male AA with female Aa produce 1 2 AA and 1 2 Aa zygotes. The frequency of zygotes produced by all possible mating pairs is listed in Table 1.
The genotype frequencies of AA, Aa, and aa zygotes at the next generation are denoted as P', 2Q', and R', respectively. These frequencies can be calculated as follows: 2Q'~PQ 4zs 2zs z2PRzQR 4z3s 2zs R'~R 2 zRQ 4zs 2zs z2Q 2 1 2zs : Then, the frequencies of A and a, which are denoted as p' and q', respectively, can be calculated from genotype frequencies: where p and q are the allele frequencies given in Equation (1). From Equation (3a), we obtain the frequency change of the A allele after one generation of germline selection, In the traditional positive selection models, random union of gametes guaranties Hardy-Weinberg equilibrium, which means But in the germline selection model, gametes from the same heterozygous male have different fitness, which leads to nonrandom union of gametes. Thus genotype frequencies deviate from the expectation of Hardy-Weinberg principle. However, we can use Hardy-Weinberg proportions as a good approximation to calculate frequency change of allele A: The difference between equation (4) and (6) arises from frequency differences of heterozygotes (Aa) after one generation of germline selection. Here, without loss of generality, we use frequencies of heterozygote Aa in (i+1)-th generation as an example to show that differences between frequencies of Aa heterozygotes calculated from equation (4) and (6) are a first order infinitesimal item of s, i.e., O(s). Equation (6) yields the frequency of allele A after one generation of germline selection as Since, p~PzQ, we can derive the deviation between Q' and p' A (1{p' A ) to be a first order infinitesimal item of s, where f (P,Q) and g(P,Q) are polynomials of P and Q.
Then, equation (4) can be written as Dp~Q s 2(2zs)~p (1{p)sz(Q{p(1{p))s 2(2zs) which means that Hardy-Weinberg frequencies are a good approximation. The deviation between frequency trajectories that are calculated iteratively by Equation (4) and Equation (9) is demonstrated in Figure 2. Given selective strength s, the absolute value of this deviation is positively correlated with the frequency of heterozygotes Aa ( Figure S1). That means the more heterozygotes, the larger the deviation, and the deviation attains its maximum when the frequency of heterozygotes is maximized (results not shown). Of note, the maximum deviation increases with increasing selective coefficients s. However, the deviation is still very small even when s is as large as 0.1 (Figure 2). Moreover, difference in fixation times of these two frequency trajectories are also negligible (results not shown). Therefore, Equation (9) is a good approximation for the frequency trajectory of allele A. If the selective advantage of allele A is large, the frequency change of allele A can be treated deterministically as long as its frequency is not very close to either 0 or 1 [11]. Given 0ve%1, the frequency of allele A at time t can be approximated by x(t) that satisfies the differential equation: where we make a further approximation using the Taylor expansion of 1 2(2zs) and ignoring terms of order s 2 and higher. The solution of this ordinary differential equation (ODE) is For convenience, by the substitution t~t{t e , the time it takes for allele A to reach quasi-fixation, i.e., the time needed to increase from frequency e to 1{e, can be calculated to bê t t~{8ln(e)=s: Hitchhiking effect on heterozygosity In order to study the effect of the selected locus on the neutral one, we adopt the method of Stephan et al. [3], which is modified from Ohta and Kimura's moment-analysis method [2]. Ohta and Kimura divide the population into two parts: one part contains chromosomes carrying the advantageous allele A and the other part carrying the disadvantageous allele a [2]. Let p 1 be the frequency of allele B among chromosomes carrying A, and p 2 be the frequency of B among chromosomes carrying a. Then the frequency of allele B can be expressed by p 1 and p 2 as Following Stephan et al.'s method [3], which distinguishes the situations that the beneficial mutation occurs on a Bor b-carrying chromosomes, we calculate a weighted expectation for an arbitrary polynomial function f of p 1 and p 2 as where p 10 is the frequency p 1 at time t~0, and p 20 is the frequency p 2 at time t~0. The former equals either one or zero, depending on whether the beneficial mutation occurred on a B or b background. The expected heterozygosity H at B/b locus is defined by where p is the frequency of allele B (Equation (17)). A series of differential equations are derived and solved approximately following Stephan et al.'s method [3], and then we obtained the reduction in expected heterozygosity at the end of selection: where a~2Ns, c is recombination rate between selected locus and neutral locus, H 1{e and 2p 2e (1{p 2e ) are the expected heterozygosities at time when the frequencies of the allele A are e and 1{e, respectively. Note, that this is exactly the same form as the one derived by Stephan et al. [3] with s replaced by s=4. Intuitively, this is not too surprising, because selection occurs only in heterozygote males. Since males account only for half of the population, and at most half of the males are heterozygous, selection is roughly four times less efficient than in a corresponding viability selection model. The reduction in expected heterozygosity of these two models is demonstrated in Figure 3. If eƒ1=a, then the Equation (16) can be approximated as which imply that reduction in expected heterozygosity is only weakly dependent on e. Figure 3 shows that this is a very accurate approximation for various choices of c.

Discussion
Exploring alternatives to adaptive evolution driven by differential viabilities, we proposed a model in which sperms from a heterozygous male are under natural selection. This model is different from the classic models since the male germline rather than individuals are selected. Notably, this model is also distinct from fertility selection [12][13][14] where different mating pairs have different fertility, as well as sexual selection [15,16] in which selection functions through asymmetrical mating preferences. A similar model which is mentioned as ''meiotic drive'' has been study by Chevin and Hospital [17], however, their model sticks to hitchhiking effect results from non-random segregation of chromosomes during meiosis rather than germline competition after meiosis. Thus, their results introduce an additional factor that is related to recombination rate to describe hitchhiking effect, while our results related to selective strength. Compared to other sperm competition models [18][19][20] where ployandrous populations are considered, this model mainly focuses on sperm competition in heterozygous males, and it is more suitable to describe the evolutionary dynamics of germline selection in monogamous population, for example the testis-or sperm-specific genes in human population [5]. Interestingly, recently evolved new genes are often testis-specific as documented by Vinckenbosch et al. [21], and Zhang et al. [22]. Moreover, these genes are often associated with strong adaptive signals [22]. As argued by Meiklejohn and Tao, such new genes may originate under the pressure of meiotic drive [23]. However, as the model discussed in this manuscript, such genes may adaptively emerge under sperm competition.
Here, we studied the dynamics of a strongly selected substitution in male germline and its effect on the preexisting linked neutral polymorphism. Due to the selection on male germline, random union of sperms and eggs are disrupted. Thus, genotype frequency and allele frequency are no longer in Hardy-Weinberg equilibrium. However, the deviation of genotype frequencies from Hardy-Weinberg proportions are negligible ( Figure 2) and we found that Hardy-Weinberg frequencies calculated from genotype frequencies can still produce a good approximation for allele frequencies. The dynamics of the selected allele and its effect on linked neutral polymorphism are similar to the results derived by Stephan et al. [3], except that the selective coefficient in our formulas is s=4 rather than s, which means that germline selection is weaker than individual selection, given the same selection coefficient. This is reasonable because selection only occurs in heterozygous males. However, the selective strength in germlines may usually be much larger than that in individuals, which results in faster evolution of reproductive genes [24][25][26].
Notably, the hitchhiking effect of a beneficial mutation selected in the male germline is approximately identical to that of a beneficial mutation under viability selection in the absence of dominance, however, having just a quarter of the selective advantage. Hence, in genome-wide scans for traces of selection, positively selected targets in the male germline might be incorrectly inferred as candidates under moderate viability selection. Even further, in such situations between-population comparison studies designed to verify adaptive evolution might not provide evidence for viability selection and hitchhiking patters might be incorrectly explained by demographic effects. Hence, in populations in which germeline selection could potentially occur, experimental setups need to be adjusted accordingly to correctly infer the mechanisms of selection. As a basic model for germline selection, the model can be extended to some more complex models. For example, fitness of individuals can also be considered in the model. Several recent studies about a substitution in fibroblast growth factor receptor 2 (FGFR2) in the male germline showed that a selective germline advantage leads to unexpected high mutant prevalence, although this substitution causes defects to the descendants [27,28]. It indicates that the substitution is positively selected in germlines, but negatively selected in individuals, which could lead to overall balancing selection. Hence, models combining these two aspects may give a better prediction for diseases and worth further investigations. Notably our model assumes a monogamous population. However, it can also be extended to polyandry populations where females mate with different males in a short time period, which certainly leads to more severe sperm competition. Such phenomenon can be frequently found in social insects, for example bees [29]. In such cases the reduction of linked neutral polymorphism may accordingly be more severe.