Polygenic signals of sex differences in selection in humans from the UK Biobank

Sex differences in the fitness effects of genetic variants can influence the rate of adaptation and the maintenance of genetic variation. For example, “sexually antagonistic” (SA) variants, which are beneficial for one sex and harmful for the other, can both constrain adaptation and increase genetic variability for fitness components such as survival, fertility, and disease susceptibility. However, detecting variants with sex-differential fitness effects is difficult, requiring genome sequences and fitness measurements from large numbers of individuals. Here, we develop new theory for studying sex-differential selection across a complete life cycle and test our models with genotypic and reproductive success data from approximately 250,000 UK Biobank individuals. We uncover polygenic signals of sex-differential selection affecting survival, reproductive success, and overall fitness, with signals of sex-differential reproductive selection reflecting a combination of SA polymorphisms and sexually concordant polymorphisms in which the strength of selection differs between the sexes. Moreover, these signals hold up to rigorous controls that minimise the contributions of potential confounders, including sequence mapping errors, population structure, and ascertainment bias. Functional analyses reveal that sex-differentiated sites are enriched in phenotype-altering genomic regions, including coding regions and loci affecting a range of quantitative traits. Population genetic analyses show that sex-differentiated sites exhibit evolutionary histories dominated by genetic drift and/or transient balancing selection, but not long-term balancing selection, which is consistent with theoretical predictions of effectively weak SA balancing selection in historically small populations. Overall, our results are consistent with polygenic sex-differential—including SA—selection in humans. Evidence for sex-differential selection is particularly strong for variants affecting reproductive success, in which the potential contributions of nonrandom sampling to signals of sex differentiation can be excluded.


SI Appendix, Section A: Theoretical null distributions for FST estimates
In the following sections, we outline null models for between-sex FST metrics that potentially capture sex-differential effects of genetic variation on pre-adult viability ("adult FST"), on adult reproductive success ("reproductive FST"), and on total fitness ("gametic FST"). In each case, we follow bi-allelic loci, each with alleles labelled A1 and A2, and their frequencies in adults of each sex or the gametes contributing to production of offspring.

Adult FST
In the absence of sex differences in viability selection, the frequencies of autosomal alleles, which are equalized at fertilization, remain equal between adults of each sex. Random sampling of individuals included within a panel of sequenced adults will, nevertheless, generate nonzero estimates of between-sex FST (i.e., non-zero adult " #$ ). These sex differences arise from error in estimating female and male allele frequencies at each locus. Under a null model in which there are no sex differences in viability selection, and the population is at Hardy-Weinberg equilibrium for the locus, Ruzicka et al. [1] showed that, 2 ' " #$ follows a chi-squared distribution with 1 degree of freedom, where ' = 2)1 + ⁄ + 1 . ⁄ / 01 is the harmonic mean sample size of female-and male-derived sequences for the locus (i.e., nf = 2Nf and nm = 2Nm, where Nf and Nm are the female and male sample sizes, and the "2" accounts for diploidy) (see their Appendix A). This theoretical distribution for adult " #$ under the null (along with those developed below), applies well for large datasets (large nH) in which very rare polymorphic loci are excluded prior to analysis.
Their result can be generalized to cases where the population deviates from Hardy-Weinberg equilibrium, in which case: where X is a chi-squared random variable with 1 degree of freedom, and 5# = Let Mij be the total number of offspring produced by males with genotype ij (ij = 11 for A1A1 individuals, ij = 12 for A1A2 and ij = 22 for A2A2). The frequency of the A1 allele in male gametes contributing to offspring will be: 11 + 1? 11 + 1? + ?? where x12 is a binomially distributed random variable with mean and variance of E(x12) = M12/2 and var(x12) = M12/4. Thus, the expected frequency of the A1 allele in gametes transmitted by males to their offspring is: . G = 11 + 1 2 1? 11 + 1? + ?? Similarly, letting Fij represent the total number of offspring produced by females with genotype ij, the expected frequency of A1 in female gametes contributing to offspring is: 11 + 1? + ?? In the absence of selection, there will be two sources of variability affecting the values of Mij and Fij. First, there will be random variability in the numbers of individuals of each genotype within the sample of adults. For example, in a random sample of Nm males, the number of individuals of with genotypes 11, 12, and 22 (i.e., A1A1, A1A2, A2A2), denoted by the vector n = n11, n12, n22, will follow a multinomial distribution with parameters Nm, p11, p12, and p22, where pij represents the frequency of genotype ij (note that the frequency of the A1 allele is p = p11 + p12/2 and the frequency of the A2 allele is 1 -p = p22 + p12/2). Second, there will be random variability in the number of offspring produced by each individual in the population.
For the case where the genotype has no effect on reproductive success, then the offspring number, per male, follows a distribution with a mean and variance of . and .
? that is independent of genotype. Likewise, the offspring number, per female, follows a distribution with a mean and variance of + and + ? that is independent of genotype. Values of + , + ? , .
and . ? can be estimated from the females and males represented in the UK Biobank dataset. where the mk are IID random variables with mean and variance of . and . ? , corresponding to the mean and variance for the numbers of offspring reported by males from the population.
From the law of total expectation, the expected value of . G becomes: The variance for . G , conditioned on the numbers of individuals per genotype, is: Equivalent expressions for females are obtained by replacing "m" subscripts with "f".
With large sample sizes, the difference between the projected allele frequencies in the gametes of each sex will be approximately normally distributed: .

Reproductive FST
Adult FST potentially captures effects of sex differences in viability selection, whereas gametic FST potentially captures sex differences in selection through any fitness component. To isolate the effect of sex differences in selection through components of adult reproductive success, we require a measure of allele frequency divergence between the sexes that reflects the variation in reproductive success among reproductively mature adults, and which does not include (or removes the effect of) allele frequency differences between sexes in the adult samples.
Specifically, we wish to test for between-sex divergence in projected gametic allele frequencies (i.e., differences between ̂+ G and ̂. G ) beyond what can be explained by allele frequency differences between females and males within the sample of adults.
Let ̂+ and ̂. represent the female and male allele frequencies estimated from the adult samples, and ̂= 1 ? )+ +./ represent their average. A measure of the amount of allele frequency divergence arising from differential reproduction between the sexes is given by: where ̂+ G and . G are the projected gametic allele frequencies. From this expression, we will establish a null model for the projected gametic allele frequencies of each sex given the estimated allele frequencies in the adults. In our null model (outlined below), we will assume there are no intrinsic differences in reproductive success associated with each genotype or sex.
This null model is similar to the gametic FST null model (above) in that it accounts for random variation in reproductive success. It differs from the gametic FST model by discounting random sampling effects on sex-specific allele frequency estimates from adults (i.e., + and . are treated as constants in what follows).
For very large adult samples (large Nf and Nm, as in the UK Biobank) the null distributions for ̂+ G and . G (each conditioned on the allele frequencies in adults, + and .) will each be approximately normal with mean and variance for the j th sex given by: The null distribution for will, therefore, be approximately normal with mean and variance: . ? .
where is a chi-squared random variable with one degree of freedom. The result follows from the fact that the distribution of |var[ ] ⁄ has standard normal distribution (approximately).
In the special case where female and male allele frequencies in the sample are approximately equal (as is the case for the UK Biobank sites that pass quality control in our analysis), the null model for reproductive FST will further simplify to:

SI Appendix, Section B: Hitchhiking effects in between-sex FST
For a causal locus that differentially affects female and male fitness, the expected inflation of between-sex FST is given by: in which the derivatives capture the effect of the causal locus on female and male fitness (see SI Appendix, Section G).
Polymorphic loci that are physically linked to a given causal locus, and in linkage disequilibrium (LD) with it, will also exhibit inflated FST, on average. Let x refer to the frequency of one of a pair of alleles at a neutral locus that is linked to a selected locus. The expected within generation change in frequency of the neutral allele will be: in females, and: in males of the population, where D is the degree of linkage disequilibrium between the neutral and the causal locus. The expected inflation of between-sex FST at the neutral site is given by: where ? = ? ( (1 − ) ) 01 is the squared correlation coefficient between the neutral locus and the causal locus. From the final result, we see that each neutral locus in LD with the causal site will hitchhike along with it, leading to an inflation of FST at hitchhiking loci that is proportional to the FST at the causal locus and the square of the correlation coefficient between hitchhiking and causal loci.

FIS estimates arising from SA selection
Deviations from Hardy-Weinberg equilibrium (HWE) potentially reflect artefacts that we wish to eliminate from our analysis. However, SA selection is predicted to generate excess heterozygosity relative to predictions under Hardy-Weinberg equilibrium. We only wish to remove loci with deviations from HWE that are too pronounced to be explained by SA selection. To define the plausible range of HWE deviations under SA selection, we use the " 5# statistic to define the estimated deviation: where PAa is the frequency of heterozygotes at the locus, and ̅ is the sex-averaged allele frequency. For a locus under SA selection, let pf and pm represent the frequency of the femalebeneficial allele in eggs and sperm contributing to fertilization in a given generation (respectively). In a random sample of n individuals from the offspring cohort, " 5# for the locus will be a random variable from a normal distribution with mean and variance of: var`" 5# b = 1 [1]. Thus, we expect some degree of deviation from HWE owing to sex differences in selection.
For a SA locus at polymorphic equilibrium and additive fitness effects in each sex, the equilibrium allele frequency difference between sexes after selection is: [2]. If we let p represent the minor allele frequency, then at equilibrium, we have: where smax = max(sm, sf). With sufficiently small smax, the last expression can be approximated as: which gives us: The approximation is accurate for smax = 0.2. The following plot shows the exact and approximate results for n = 250,000 and smax = 0.2.

SI Appendix, Section E: Null Model for unfolded Reproductive FST
The elevation of reproductive FST relative to our null model is a genome-wide signal of sexdifferential selection, though in principle, the signal may have arisen because of sex-differences in the strength of selection (sexually concordant or SC selection), due to loci with sex-limited effects (SL selection), or because of sex differences in the direction of selection (SA selection).
These three mechanisms can be distinguished as follows. First, consider the divergence of the projected allele frequency in males (. G ) relative to the observed frequency (.). Under the null, the genotypes of the locus have no effect on male reproductive success, and therefore: . ? .
Under the null, the following standardized metric will follow a standard normal distribution: . ? .
The same applies to females: Beyond sampling effects, the expected allele frequency divergence between the sexes estimated in a sample will increase whenever sex differences in selection generate genuine allele frequency divergence in the population (i.e., pf ¹ pm). Under modest-to-weak selection at a locus (i.e., selection coefficients on the order of 0.1 or less), sex-specific allele frequencies in the population are given by: where ƒ + and ƒ . represent the mean relative fitness of each sex with respect to the locus [4].
Note that selection is sexually concordant (SC) when the gradients ln) ƒ + /⁄ and ln( ƒ . )⁄ have same sign; selection is sexually antagonistic (SA) when the gradients have opposite signs. The expected value of the estimate of FST becomes: It is clear from the final expression for E`" #$ b that any sex difference in selection will, on average, inflate the estimated allele frequency divergence between the sexes (i.e., E`" #$ b is inflated whenever ln) ƒ + /⁄ ≠ ln( ƒ . )⁄ ).
To illustrate how SA and SC selection affect the correlation between E`" #$ b and the minor allele frequency (MAF) per locus, we consider simple models of selection without dominance, with SA polymorphism maintained near equilibrium under balancing selection and SC polymorphism maintained at mutation-selection balance. We subsequently relax the equilibrium assumption via simulation.

The covariance between FST and MAF under SA selection
Let p refer to the female-beneficial allele at a SA locus (q = 1 -p refers to the male-beneficial allele). With purely additive fitness effects of the alleles, we have: where sf and sm represent the selection coefficients for females and males (i.e., the costs to each sex of being homozygous for a SA allele that benefits the other sex). At equilibrium, we have: The expected value of the FST estimate becomes: The final approximation (i.e., neglecting terms of [(2 − 1) ] , ̅ n ]), which is extremely accurate for ̅ ≤ 0.1, can be used to illustrate how SA selection generates a positive covariance between the minor allele frequency per locus (with MAF = min{p, 1 -p}) and E`" #$ b. For each locus, we: (i) randomly sampled female and male selection coefficients from a uniform distribution between 0 and smax (0 < smax < 1), where smax = 0.01; (ii) retained loci whose sex-specific selection coefficients met conditions for balancing selection (i.e., " -1˜" -< . < " -10" -; see [5]), and (iii) modelled the population allele frequency for the retained SA locus using its stationary distribution [6]:

If the strength of SA selection ( ̅ ) is independent of MAF, then it is clear that
where M is the expected change in allele frequency per generation at the locus, V is the variance in allele frequency change, and the constant C ensures that the distribution integrates to one.
Assuming there is no dominance in either sex, selection coefficients are small, equal mutation rates per allele, and autosomal linkage, M and V become: where Ne is the effective size of the population, u is the mutation of the locus, * is its deterministic equilibrium, and the mean relative fitness of females and males (respectively) are For each locus, we used a rejection sampling algorithm (described in Smith and Connallon [7]) to randomly sample an allele frequency from the stationary distribution for the locus.
FST for each locus was calculated as: where pf and pm correspond to the expected values for sex-specific allele frequencies after selection within the generation: Representative simulation output for population FST at SA loci is shown in the following To explore effects of selection and drift on SC polymorphisms, we carried out simulations with allele frequencies drawn from the following stationary distribution: ( ) = exp s2 Ÿ t = n£ ¤ ¥01 (1 − ) n£ ¤ ¥01 0£ ¤ )® -˜® § /< where = − 1 n ) + + . / (1 − ) + (1 − 2 ) and V is the same as above. Initially focusing on the simplest case of sex-limited loci, we sampled selection coefficients per locus from a gamma distribution with shape and scale parameters k and q, respectively (i.e., E[t] = kq and var(t) = kq 2 for the selected sex). Allele frequencies were simulated by rejection sampling (as above; see Smith and Connallon [7]), using the stationary distribution for each locus. For each set of parameters (i.e., Ne, u, k, q), we generated 5,000 polymorphic loci with minor allele frequencies greater than 1%. We then explored the more realistic scenario in which there is a mixture of sex-limited loci and loci that affect the fitness of both sexes, with fSL representing the proportion of loci that are sex-limited. We defined whether a given locus was sex-limited by sampling a random variable from a Bernoulli distribution with success probability of fSL. Selection coefficients for loci with sex-limited effects were randomly drawn from a gamma distribution, as described above. For loci affecting the fitness of both sexes, we generated selection coefficients in each sex by randomly sampling from a symmetric bivariate gamma distribution, with a cross-sex genetic correlation of rmf (the algorithm for pseudo-random sampling of correlated selection coefficients from a bivariate gamma distribution is presented in Morrow and Connallon [8]).
Allele frequencies for each locus were simulated using a rejection sampler based on the stationary distribution for the locus (i.e., given its selection coefficients). For each set of parameters (i.e., Ne, u, fSL, k, q, rmf), we generated 5,000 polymorphic loci with minor allele frequencies greater than 1%. The following figures show results with rmf = 0.9 and Ne = 10 6 , and plausible distributions of fitness effects. Between-sex FST negatively covaries with MAF for every parameter combination that we examined. Overall, models of SC genetic polymorphism consistently predict a negative covariance between MAF and between-sex FST.