Skip to main content
  • Loading metrics

Genome-wide sexually antagonistic variants reveal long-standing constraints on sexual dimorphism in fruit flies

  • Filip Ruzicka ,

    Contributed equally to this work with: Filip Ruzicka, Mark S. Hill, Tanya M. Pennell

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Research Department of Genetics, Evolution and Environment, University College London, London, United Kingdom

  • Mark S. Hill ,

    Contributed equally to this work with: Filip Ruzicka, Mark S. Hill, Tanya M. Pennell

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Research Department of Genetics, Evolution and Environment, University College London, London, United Kingdom, Department of Ecology and Evolutionary Biology, The University of Michigan, Ann Arbor, Michigan, United States of America

  • Tanya M. Pennell ,

    Contributed equally to this work with: Filip Ruzicka, Mark S. Hill, Tanya M. Pennell

    Roles Data curation, Formal analysis, Investigation, Methodology, Writing – review & editing

    Affiliations School of Life Sciences, University of Sussex, Brighton, United Kingdom, College of Life and Environmental Sciences, University of Exeter, Penryn, United Kingdom

  • Ilona Flis,

    Roles Investigation, Methodology

    Affiliations School of Life Sciences, University of Sussex, Brighton, United Kingdom, The Pirbright Institute, Pirbright, Surrey, United Kingdom

  • Fiona C. Ingleby,

    Roles Data curation, Investigation, Methodology

    Affiliation School of Life Sciences, University of Sussex, Brighton, United Kingdom

  • Richard Mott,

    Roles Formal analysis, Methodology, Software, Writing – review & editing

    Affiliations Research Department of Genetics, Evolution and Environment, University College London, London, United Kingdom, UCL Genetics Institute, University College London, London, United Kingdom

  • Kevin Fowler,

    Roles Funding acquisition, Project administration, Supervision, Writing – review & editing

    Affiliation Research Department of Genetics, Evolution and Environment, University College London, London, United Kingdom

  • Edward H. Morrow ,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – review & editing (MR); (EHM)

    ‡ EHM and MR also contributed equally to this work.

    Affiliation School of Life Sciences, University of Sussex, Brighton, United Kingdom

  • Max Reuter

    Roles Conceptualization, Formal analysis, Funding acquisition, Project administration, Supervision, Writing – original draft, Writing – review & editing (MR); (EHM)

    ‡ EHM and MR also contributed equally to this work.

    Affiliation Research Department of Genetics, Evolution and Environment, University College London, London, United Kingdom


The evolution of sexual dimorphism is constrained by a shared genome, leading to ‘sexual antagonism’, in which different alleles at given loci are favoured by selection in males and females. Despite its wide taxonomic incidence, we know little about the identity, genomic location, and evolutionary dynamics of antagonistic genetic variants. To address these deficits, we use sex-specific fitness data from 202 fully sequenced hemiclonal Drosophila melanogaster fly lines to perform a genome-wide association study (GWAS) of sexual antagonism. We identify approximately 230 chromosomal clusters of candidate antagonistic single nucleotide polymorphisms (SNPs). In contradiction to classic theory, we find no clear evidence that the X chromosome is a hot spot for sexually antagonistic variation. Characterising antagonistic SNPs functionally, we find a large excess of missense variants but little enrichment in terms of gene function. We also assess the evolutionary persistence of antagonistic variants by examining extant polymorphism in wild D. melanogaster populations and closely related species. Remarkably, antagonistic variants are associated with multiple signatures of balancing selection across the D. melanogaster distribution range and in their sister species D. simulans, indicating widespread and evolutionarily persistent (about 1 million years) genomic constraints on the evolution of sexual dimorphism. Based on our results, we propose that antagonistic variation accumulates because of constraints on the resolution of sexual conflict over protein coding sequences, thus contributing to the long-term maintenance of heritable fitness variation.


The divergent reproductive roles of males and females favour different phenotypes [1,2]. However, responses to these selective pressures are constrained by a shared genome, leading to ‘sexual antagonism’, in which different alleles at given loci are favoured in the two sexes [1,35]. A wealth of quantitative genetic studies has established sexual antagonism as near ubiquitous across a wide range of taxa, including mammals [6] (and humans [7]), birds [8], reptiles [9], insects [10,11], fish [12,13], and plants [14]. Accordingly, sexual antagonism can be considered a major constraint on adaptation [15] and an important mechanism for the maintenance of fitness variation within populations [16].

However, despite its evolutionary importance, we have little understanding of the biological mechanisms underlying this conflict and virtually no empirical data on the identity and evolutionary dynamics of antagonistic alleles [13]. While a small number of individual antagonistic loci have been identified [12,13], these are of limited use for elucidating general properties of loci experiencing sexual antagonism. On a genome-wide scale, previous transcriptomic work in D. melanogaster has associated antagonistic fitness effects with patterns of gene expression [17]. But despite potentially revealing some of the molecular correlates of fitness variation, this approach cannot distinguish between causal antagonistic loci and their downstream regulatory targets. In humans, genome-wide allele frequency differences between males and females have been used to infer sex-specific selection on viability [18], but this approach neglects important reproductive components of fitness, and furthermore cannot distinguish between loci with opposing fitness effects in each sex (sexually antagonistic loci) and loci where the strength of sexually concordant selection differs between the sexes. It is essential that we characterise causal antagonistic loci underlying lifetime reproductive success in order to understand the adaptive limits to sexual dimorphism and mechanisms of conflict resolution.

To address this shortcoming, we identified sexually antagonistic loci across the D. melanogaster genome and characterised their functional and evolutionary properties. Specifically, we measured male and female fitness for over 200 hemiclonal lines that had been extracted from LHM—the outbred, laboratory-adapted population in which sexually antagonistic fitness effects were first characterised [10,19]. Our fitness measurements estimate lifetime reproductive success in both sexes by replicating the regime under which LHM has been maintained for over 20 years [20]. We combined these fitness data with high-coverage genome sequences [21] and performed a genome-wide association study (GWAS) to map the genetic basis of sexual antagonism. We then examined the properties of candidate antagonistic polymorphisms, including their genomic distribution across the X chromosome and autosomes, their functional characteristics, the genes in which they occur, and their population genomic dynamics across a number of wild populations of D. melanogaster and two closely related species, D. simulans and D. yakuba.


Quantitative genetic analyses of sex-specific fitness

We measured the sex-specific fitness of hemiclonal fly lines (N = 223) that had been extracted from LHM as part of a previous study [21]. Individuals from each hemiclonal line carry an identical haploid genome comprising all major chromosomes (X, 2, and 3; i.e., about 99% of the total genomic content) paired with a random chromosomal complement from LHM (see Materials and methods and [22,23] for further details). For each line, we measured male fitness as competitive fertilisation success and female fitness as competitive fecundity. The fitness estimates obtained are based on measurements from 25 individual males and females for each hemiclonal line. Assays closely mimic the rearing regime experienced by flies in the base population, thus providing a good proxy for lifetime reproductive success in each sex.

Quantitative genetic analyses confirmed the presence of significant amounts of genetic variation for male and female fitness among the lines assayed. Estimating the genetic variances and covariances between the lines, we found appreciable heritabilities for fitness in both sexes (female = 0.42, 95% CI 0.30–0.54; male = 0.16, 95% CI 0.04–0.27). Comparable estimates were also obtained by treating single nucleotide polymorphisms (SNPs) as random effects in a linear mixed model and calculating SNP heritability () (female = 0.59, SD 0.13, P < 0.001; male = 0.29, SD 0.16, P = 0.007). The sex-specific heritabilities estimated via both approaches are consistent with previous estimates in this population [17,2426]. The intersexual genetic correlation for fitness (rMF) did not differ significantly from zero in this sample of genotypes (rMF = 0.15, 95% CI −0.21 to 0.46), which is consistent with some [24,26]—but not all [17,19]—previous work in this population. While antagonism is not dominant in this sample of LHM, the absence of a significant positive rMF, the high heritability for fitness, and further results (below and S4 Fig) suggest that antagonistic variation is present but overlaid with sexually concordant variation.

We quantified the antagonistic component of fitness variation by calculating an ‘antagonism index’ (Fig 1A). Specifically, we extracted the position of individual fly lines on the axis ranging from extremely male-beneficial, female-detrimental fitness effects to extremely female-beneficial, male-detrimental fitness effects (see Materials and methods). This approach for defining an antagonism index mirrors previous research in this field [27,28] and is analogous to other indices based on combinations of phenotypic measures, such as the widely applied transformation of human height and weight into a body mass index [29]. The antagonism index itself had high SNP heritability ( = 0.51, SD 0.15, P = 0.001), as expected from the heritability of its sex-specific fitness components.

Fig 1. Genome-wide association mapping of sexual antagonism.

(A) Relative male and female lifetime reproductive fitness estimates for 223 D. melanogaster hemiclonal lines. Fitness measures have been normalised, scaled, and centred (see Materials and methods). Colours denote each line’s antagonism index, i.e., their position along a spectrum (dashed arrow) ranging from male-beneficial, female-detrimental fitness effects (blue) to female-beneficial, male-detrimental effects (red). (B) Association of each SNP with the antagonism index along the five major D. melanogaster chromosome arms, presented as a Manhattan plot in which each point represents the −log10(P) value from a Wald association test. Colours denote chromosome arms; the horizontal line represents the Q-value cutoff (0.3) used to define candidate antagonistic SNPs. Data and code underlying this figure can be found at SNP, single nucleotide polymorphism.

GWAS of sexual antagonism

To identify putative antagonistic SNPs, we performed a GWAS based on the antagonism index and sequence polymorphism data [21] for 765,764 common (minor allele frequency [MAF] > 0.05) and stringently quality-filtered SNPs across 202 of the 223 lines (see Materials and methods; S1 Fig). We employed a linear mixed model that corrects for between-line relatedness and population structure by incorporating a genetic similarity matrix as a random effect [30] (S2 Fig). Fig 1B presents a Manhattan plot of raw P values from SNP-wise association tests along the D. melanogaster genome. The genomic inflation factor ( = 0.967) and analyses using alternative, permutation-based significance tests (see Materials and methods) confirmed that the parametric P values obtained are robust (S3 Fig).

Because our antagonistic phenotype is a linear combination of male and female fitness, our GWAS could potentially capture variation with sex-limited rather than antagonistic fitness effects. To assess this possibility, we compared GWAS P values for the antagonism index with P values for a ‘concordant index’, defined as an orthogonal phenotype to the antagonistic index ranging from extremely detrimental fitness effects to extremely beneficial fitness effects in both sexes (S4 Fig). Sex-limited variants should generate symmetrical effects on both the antagonistic and the concordant fitness indices. In contradiction to this expectation, we found that while variation in the antagonism index was dominated by fewer loci with larger effects, variation in the concordant index was distributed across many SNPs with low effect sizes and, consequently, elevated P values (S4 Fig). The asymmetry in these patterns indicates that fitness variation in LHM is not due to variants with sex-limited effects. Rather, antagonistic and concordant variation are qualitatively different and appear to be maintained through fundamentally different processes—most likely balancing selection and mutation-selection balance, respectively (see, also, Discussion).

Although no individual antagonistic SNP reached genome-wide significance based on stringent Bonferroni correction (P < 6.53 × 10−8), our focus was to characterise broad patterns associated with genome-wide antagonistic variation rather than identifying individual antagonistic sites with high confidence. Accordingly, we applied three main approaches to investigate the general properties of antagonistic SNPs and regions. First, we defined 2,372 candidate antagonistic SNPs (approximately 0.3% of all covered SNPs; henceforth ‘antagonistic SNPs’) as SNP positions with false discovery rate (FDR) Q-values < 0.3 (S4 Fig). This threshold achieves a balance between false positives and false negatives that is suitable for our genome-wide analysis and allowed us to contrast the properties of antagonistic and nonantagonistic (Q-value 0.3) SNPs. Second, we quantified the importance of different classes of SNPs (defined by chromosomal location or function) by partitioning total SNP heritability of the antagonism index (‘antagonistic ’) into the contribution of each class [31,32]. These contributions can then be tested for deviations from random expectations and interpreted without need for defining significance cutoffs for individual SNPs. Finally, we employed set-based association testing in which the joint effect of a set of SNPs (such as those in a chromosomal window) on the phenotype is assessed. This joint analysis alleviates the multiple testing burden and can be used to define antagonistic windows with more stringent support (Q-value < 0.1). Together, these approaches allowed us to characterise the functional properties and evolutionary dynamics of antagonistic genetic variation.

Identity and functional properties of antagonistic loci

We first examined the genomic distribution of antagonistic variants. The 2,372 antagonistic SNPs were significantly clustered along chromosome arms (median distance: 147 bp on autosomes, 298 bp on the X chromosome; permutation test: P < 0.001 for autosomes and X, S5 Fig). Based on observed patterns of linkage disequilibrium (LD) decay in LHM (S1 Fig), we estimated that the antagonistic SNPs form approximately 226 independent clusters. Some previous theory [3,33] and empirical quantitative genetic results [18] suggest that the X chromosome should harbour a disproportionate amount of antagonistic genetic variation because it is hemizygous in males and disproportionately transmitted through females—factors that together favour the invasion and maintenance of X-linked antagonistic polymorphisms in which the male-benefit allele is recessive (but see [34]). This prediction was not borne out by our data. We found that, relative to autosomes, the X chromosome neither contained a disproportionate number of antagonistic SNPs (Z-test, P > 0.05, S6 Fig) nor contributed more antagonistic than expected (Z-test, P = 0.274, Fig 2A).

Fig 2. Genomic distribution and functional characteristics of antagonistic variants.

(A) Relative contribution of different chromosomal compartments to total SNP heritability of the antagonistic index (i.e., estimated for a given compartment divided by total across all compartments). Dots represent estimated contributions (±95% CI), with expected contributions presented as black crosses. (B) Relative contribution of different functional categories to total antagonistic (i.e., estimated for a given functional category divided by total across all categories). Dots represent estimated contributions, with expected contributions presented as black crosses (±95% CI of the empirical null distribution computed through permutation; see Materials and methods). Colours indicate significant under- or overrepresentation (dark blue: P < 0.05; light blue: P > 0.05). Data and code underlying this figure can be found at SNP, single nucleotide polymorphism; UTR, untranslated region.

Our data also provide some of the first insights into the biological functions that underlie sexual antagonism. At the most basic level, our results suggest that antagonism arises primarily because of adaptive conflict over coding sequences. Thus, genomic partitioning revealed that variants that result in missense changes contributed significantly more antagonistic than expected from their proportional genomic representation (Fig 2B) and were significantly overrepresented among antagonistic SNPs (S6 Fig). As expected, intergenic regions were underrepresented among antagonistic SNPs and contributed qualitatively less antagonistic than expected (Figs 2B and S6). However, we found no evidence that SNP functions involved in expression regulation, such as 3′ untranslated region (UTR), intronic, upstream, or splice region variants, were overrepresented among antagonistic SNPs or (Figs 2B and S6).

We next performed a series of analyses to characterise the properties of genes harbouring antagonistic SNPs (one or more antagonistic SNPs within ±5 kb of the gene coordinates). The list of antagonistic genes included some genes known to be involved in sexual differentiation, including male-specific lethal 1, traffic jam, and roundabout 2, the circadian clock gene period, and the Golgi-associated transport protein gene Tango6 that has been previously found to harbour coding sequence polymorphisms shared between D. melanogaster and D. simulans [35] (see S1 Table for a complete list of antagonistic genes).

We first considered the relationship between antagonism and sex-biased gene expression. This relationship remains poorly delineated, with some studies assuming that sexually antagonistic genes are more likely to be sex biased because sex bias is indicative of the action of antagonistic selection [36]. Others, in contrast, infer that less sex-biased genes are more likely to be antagonistic because they have higher intersexual genetic correlations, and polymorphisms will result in more opposed fitness effects under sexually antagonistic selection [37]. Using estimates of sex bias in gene expression extracted from the Sebida database [38], we could address this question directly. Doing so, we found that antagonistic genes had lower sex bias in gene expression than nonantagonistic genes. This pattern was detectable on several levels. In qualitative terms, fewer antagonistic genes than expected by chance were classified as showing significant sex-biased gene expression (observed = 188, expected = 212, 11.3% deficit, = 7.78, P = 0.005). In quantitative terms, antagonistic genes had a lower degree of absolute sex bias than did nonantagonistic genes (W = 1,309,700, P < 0.001, Fig 3A) and the probability of genes being antagonistic peaked at zero sex bias (generalised linear model [GLM] with quadratic term, = 6.20, P = 0.013, Fig 3B). We also tested the predictions of a recent ‘Twin Peaks’ model [39], which proposes that antagonistic genes should be enriched among genes with intermediate sex bias in expression and depauperate among genes with low and high sex bias. Yet, we did not detect any enrichment for antagonistic genes among intermediately sex-biased genes as predicted by this model (comparison of quadratic versus fourth-degree polynomial GLM: = 0.67, P = 0.714).

Fig 3. Sex-biased gene expression among antagonistic genes.

(A) Distributions of the absolute degree of sex-biased expression for antagonistic (blue) and nonantagonistic (grey) genes. (B) Proportion of genes that are antagonistic across bins of expression sex bias (100 genes per bin). Points represent mean expression level in each bin. Blue curve (±SE) shows the best-fit quadratic model for the relationship between antagonism and expression sex bias. Data and code underlying this figure can be found at F, female; M, male.

Gene Ontology (GO) analysis revealed little evidence for preferential association of antagonistic variation with specific biological processes. Only one term, ‘sodium-channel-regulator-activity’, was significant after correction for multiple testing (Q-value = 0.013). However, this annotation is shared by only a few genes (N = 5), a cluster of four of which carry antagonistic SNPs. It thus appears that antagonism is not enriched in genes involved in specific functions. We also did not find a significant overlap between the antagonistic genes identified here and genes that had previously been shown to have sexually antagonistic expression patterns (opposing relationships between expression level and fitness in males and females [17]; observed overlap = 41, expected overlap = 36, = 0.59, P = 0.44). This discrepancy is not necessarily unexpected. Leaving aside the fact that the previous study of antagonistic expression was based on a small sample of genotypes [40] and did not correct for their kinship, large overlap need not exist between loci that show antagonistic expression patterns and those that harbour causal antagonistic sequence variants. For instance, antagonistic expression patterns in a large number of genes can be caused by a very small number of genetic changes in regulatory genes. Equally, the numerous coding sequence variants that we detect can cause antagonism in the absence of expression differences.

Finally, we tested whether antagonistic variation is enriched in genes that are likely to be subject to pleiotropic constraints. Elevated gene pleiotropy—defined here as ‘molecular gene pleiotropy’ [41], in which a gene performs several functions and affects several traits—has been proposed to make the evolution of sex-specific expression more difficult, because altered expression could alleviate antagonism in some traits but have a deleterious effect on other traits mediated by a gene. These pleiotropic effects could then impede the resolution of sexual antagonism via the evolution of sex-biased expression [42]. We did not find support for this hypothesis in our data, as there was no association between antagonism and higher levels of pleiotropy, measured either as tissue breadth [43] (W = 1,264,700, P = 0.70) or as the number of protein–protein interactions (PPIs) [44] (F1,5276 = 2.43, P = 0.12). This implies that pleio-tropy—at least as captured by and PPIs—does not contribute significantly to maintaining sexually antagonistic genetic variation.

Comparative population genomic analyses of balancing selection

In addition to assessing the functional properties of antagonistic loci, we also investigated the population genetic effects of sexual antagonism. Population genetic models predict that the opposing sex-specific fitness effects of antagonistic alleles generate balancing selection [16,4547], resulting in elevated levels of genetic polymorphism at antagonistic loci. Having identified candidate antagonistic variants, we can test this prediction by comparing levels of polymorphism at antagonistic and nonantagonistic loci. Examining levels of polymorphism in LHM, we found that antagonistic sites have higher MAFs than nonantagonistic sites (W = 1,024,100,000, P < 0.001) and that regional polymorphism (Tajima’s D, measured within 1,000-bp windows along the chromosome arms) is higher at antagonistic windows (those with Q-value <0.1 in a window-based GWAS) than nonantagonistic windows (Q-value 0.1; F1,195208 = 279.6, P < 0.001).

However, although these patterns are suggestive, looking at polymorphism within LHM is potentially problematic because variation in MAF will generate ascertainment bias: the power to detect antagonistic effects is higher at more polymorphic sites, and candidates will therefore tend to show above-average polymorphism, even in the absence of significant balancing selection. A more robust approach is to use data from independent populations and ask whether polymorphism in those populations is greater at antagonistic than at nonantagonistic sites, while controlling for a range of potential confounders. Specifically, we compared levels of polymorphism at antagonistic and nonantagonistic sites while controlling for (i) MAF ascertainment bias (i.e., increased GWAS power to detect antagonistic effects at more polymorphic sites in LHM), (ii) differences in genome-wide estimates of linked selection, which could inflate polymorphism near antagonistic sites if they are situated in regions of the genome less affected by selective sweeps or background selection, and (iii) pseudo-replication due to close-by sites in LD showing correlated signals of both antagonistic fitness effects and polymorphism.

We performed analyses controlling for these effects (see Materials and methods for details), first using publicly available polymorphism data from the Drosophila Genetic Reference Panel [48,49] (DGRP), a collection of 205 wild-derived inbred lines. Like LHM, the DGRP was established from a North American D. melanogaster population. Given the relatively recent colonisation of the continent by D. melanogaster (about 150 years [50]), the two populations are closely related. We found that antagonistic SNPs had elevated MAFs in the DGRP. Yet, owing to the close relationship between LHM and the DGRP (and the resulting similarity in allele frequencies), this difference was not statistically significant (empirical P = 0.322, Fig 4A and 4B; = 0.010, P = 0.66, Fig 4D). Nevertheless, we also found that the probability of SNPs being polymorphic (i.e., to have MAF > 0) in the DGRP increased with absolute GWAS effect size ( = 76.23, P < 0.001, Fig 4C). This shows that—once the level of polymorphism in LHM, linked selection, and pseudo-replication are accounted for—LHM SNPs are more likely to also be polymorphic in the DGRP if they are more closely associated with antagonism. This evidence for antagonism-driven balancing selection at individual SNPs was corroborated by patterns of regional polymorphism. Thus, Tajima’s D was significantly higher in 1,000-bp antagonistic windows than in nonantagonistic windows (F1,115477 = 224.6, P < 0.001, Fig 5A). Overall, these analyses show that the heritable phenotypic variation in sex-specific fitness that can be generated and maintained by sexual antagonism is mirrored by a signal of increased polymorphism at the underlying genetic loci.

Fig 4. SNP-based signatures of balancing selection associated with antagonistic variants in three independent populations (DGRP, ZI, and SA).

(A,E,I) Spectra of raw MAF for LD-independent antagonistic (blue) and nonantagonistic (‘Control’, grey) SNPs. (B,F,J) Distribution of mean MAFs for 1,000 sets of LD-independent nonantagonistic SNPs that have been frequency matched and linked-selection matched to LHM antagonistic SNPs (‘Analysis A’; see Materials and methods). Blue line denotes mean MAF of antagonistic SNPs; black dashed line denotes mean MAF of nonantagonistic SNPs without matching for LHM MAF or linked selection. (C,G,K) Odds ratio (±95% CI) that a site is polymorphic (i.e., has the same alleles as in LHM and has MAF >0) as a function of its absolute GWAS effect size (regression coefficient), while controlling for LHM MAF and genome-wide linked selection (‘Analysis B’; see Materials and methods). An odds ratio >1 (dashed horizontal line) indicates that sites with higher absolute effect sizes are more likely to be polymorphic in a given population. Of the LD-independent sites considered, the number and percentage of sites with MAF > 0 were N = 31,092 (91.3%), N = 25,578 (77.3%), and N = 21,659 (68.7%) in the DGRP, ZI, and SA populations, respectively. (D,H,L) Mean MAF across 100 sets of LD-independent SNPs, presented in ascending order by absolute GWAS effect size (‘Analysis C’; see Materials and methods). Each set of LD-independent SNPs has been matched for LHM MAF and genome-wide estimates of linked selection. For visualisation purposes, a linear regression line (±95% CI) is shown. Data and code underlying this figure can be found at DGRP, Drosophila Genetic Reference Panel; GWAS, genome-wide association study; LD, linkage disequilibrium; MAF, minor allele frequency; SA, South Africa; SNP, single nucleotide polymorphism; ZI, Zambia.

Fig 5. Regional and LD-based signatures of balancing selection associated with antagonistic variants in three independent populations (DGRP, ZI, and SA).

(A) Mean (±SE) residual Tajima’s D (i.e., the residuals of a regression of Tajima’s D on genome-wide estimates of linked selection) for antagonistic windows (blue; ‘antagonistic status = 1’) and nonantagonistic windows (grey; ‘antagonistic status = 0’). (B) Mean (±SE) residual FST (i.e., the residuals of a regression of FST on genome-wide estimates of linked selection) for antagonistic and nonantagonistic windows. Because these are residuals of a regression, residual FST does not vary between 0 and 1. (C) LD (r2) in the ZI population between pairs of antagonistic SNPs (blue, ‘Antag./antag.’), pairs of nonantagonistic SNPs (grey, ‘Control/control’), and mixed pairs (black, ‘Antag./control’). Points represent mean r2 across 25-bp bins; r2 is modelled as a declining exponential function of distance (fitted lines). Data and code underlying this figure can be found at Antag., antagonistic; DGRP, Drosophila Genetic Reference Panel; LD, linkage disequilibrium; SA, South Africa; SNP, single nucleotide polymorphism; ZI, Zambia.

A key, yet so far unresolved, question is whether antagonistic polymorphisms are mainly short-lived and population specific or persist over prolonged periods of time. The analyses of polymorphism in the DGRP shed some light on this question, demonstrating that antagonistic polymorphisms are maintained at least over the tens to hundreds of years (hundreds to a few thousand generations) that separate this population from LHM. In order to assess signals of balancing selection over longer time spans, we repeated these analyses with polymorphism data from two populations in D. melanogaster's ancestral sub-Saharan distribution range, in Zambia (ZI, 197 genomes from phase 3 of the Drosophila Population Genomics Project [51]) and South Africa (SA, 118 genomes from South Africa [52]). Just as in the DGRP, we found that antagonism generated a clear signature of balancing selection in these ancestral population samples. Analyses based on binary categories showed that antagonistic SNPs had significant excess MAF in ZI and SA compared with nonantagonistic SNPs (ZI: empirical P = 0.024, Fig 4E and 4F; SA: empirical P = 0.001, Fig 4I and 4J; see also S7 Fig), while analyses based on GWAS effect sizes showed that sites with stronger evidence for antagonistic effects were again more likely to be polymorphic (ZI: = 53.03, P < 0.001, Fig 4G; SA: = 39.41, P < 0.001, Fig 4K) and also have more elevated MAFs (ZI: = 0.047, P = 0.037, Fig 4H; SA: = 0.055, P = 0.014; Fig 4L; see also S7 Fig). At a larger chromosomal scale, antagonistic windows had significantly higher polymorphism (Tajima’s D) than nonantagonistic windows (ZI: F1,116099 = 60.63, P < 0.001; SA: F1,110954 = 4.24, P = 0.039; Fig 5A). Furthermore, they also exhibited lower population differentiation between all pairs of populations (measured as FST; DGRP-ZI: W = 50,667,000, P < 0.001; DGRP-SA: W = 50,975,000, P < 0.001; SA-ZI: W = 55,322,000, P < 0.001; Fig 5B), in line with balancing selection maintaining similar frequencies across distant populations.

In addition to elevated polymorphism in antagonistic regions of the genome, we also found evidence for increased LD—another hallmark of balancing selection [53,54]. We compared local LD (<1,000 bp, measured as r2) between pairs of antagonistic sites, pairs of nonantagonistic sites, and ‘mixed’ site pairs (consisting of an antagonistic and a nonantagonistic SNP) in the ZI population, which is most phylogenetically distant from LHM and where a signal of LD should be weakest in the absence of long-term balancing selection. Consistent with selection, we found that pairs of antagonistic sites had higher LD in this population than pairs of nonantagonistic sites (W = 8,346,500,000, P < 0.001, Fig 5C). They also had higher LD relative to mixed pairs (W = 33,823,000, P < 0.001, Fig 5C). Thus, high LD between antagonistic sites is not an artefact of unusually low levels of recombination near antagonistic regions, but instead reflects the action of long-term balancing selection.

The signal of conserved maintenance of antagonistic polymorphisms across populations of D. melanogaster raises the possibility that sexual antagonism is maintained over yet longer timescales. To assess this, we asked whether antagonistic loci are more likely to be detected as polymorphic than nonantagonistic loci in two closely related species, D. simulans and D. yakuba, that are separated from D. melanogaster by about 1 and 3 million years, respectively [55]. An enrichment of antagonistic loci among such ‘trans-specific’ polymorphisms—while controlling for possible confounders, as previously—would indicate that antagonistic selection maintains allelic variants across species boundaries.

Consistent with this hypothesis, we found that antagonistic loci are enriched among trans-specific polymorphisms in D. simulans. Thus, we detected a significant positive correlation between absolute GWAS effect size in LHM and the probability that a polymorphism is trans-specific among a panel of 170 North American D. simulans genomes [56] ( = 6.13, P = 0.013, Fig 6A) and a panel of 20 sub-Saharan African D. simulans genomes [57] ( = 5.65, P = 0.017, Fig 6A). A similar pattern was detectable when comparing antagonistic/nonantagonistic sites as a binary classification among the dataset of 170 North American genomes (empirical P = 0.001, Fig 6B). In the smaller panel of 20 African genomes, antagonistic loci also displayed elevated polymorphism, but the excess was not statistically significant (empirical P = 0.100, Fig 6C). We finally tested whether antagonistic loci are enriched among trans-specific polymorphisms in D. yakuba, using polymorphism data from a panel of 20 African genomes [57]. In this species, we could not detect an enrichment of antagonistic loci, whether using GWAS effect sizes as a continuous predictor of trans-specific status ( = 1.59, P = 0.207, Fig 6A) or comparing antagonistic/nonantagonistic SNP classes (empirical P = 0.844, Fig 6D).

Fig 6. Signatures of balancing selection associated with antagonistic variants in two sister species, D. simulans and D. yakuba.

(A) Odds ratio (±95% CI) that a polymorphism’s trans-specific status varies with absolute GWAS effect size (regression coefficient), controlling for LHM MAF and genome-wide linked selection (see Materials and methods). An odds ratio >1 indicates that sites with higher effect sizes are more likely to be trans-specific. The relationship between trans-specific status and effect size is presented for three datasets: a panel of 170 North American D. simulans genomes (left), 20 African D. simulans genomes (middle), and 20 African D. yakuba genomes (right). Colours indicate significant under- or overrepresentation (dark blue: P < 0.05; light blue: P > 0.05). Of the LD-independent sites considered in each sample, the number and percentage of trans-specific sites were N = 3,608 (2.2%), N = 7,466 (5.5%) in the 170- and 20-genome D. simulans datasets, respectively, and N = 2,760 (2.7%) in the D. yakuba dataset. (B) Histogram of the proportion of trans-specific polymorphisms for 1,000 sets of LD-pruned nonantagonistic SNPs that have been frequency and linked-selection matched to antagonistic SNPs. Blue line denotes mean proportion of trans-specific antagonistic SNPs; black dashed line denotes mean proportion of trans-specific nonantagonistic SNPs without any correction for LHM MAF or linked selection. Trans-specific status was determined by considering polymorphism data from a panel of 170 North American D. simulans genomes. (C) Same as B, with trans-specific status derived from polymorphism data from a panel of 20 African D. simulans genomes. (D) Same as B., with trans-specific status derived from polymorphism data from a panel of 20 African D. yakuba genomes. Data and code underlying this figure can be found at GWAS, genome-wide association study; LD, linkage disequilibrium; MAF, minor allele frequency; SNP, single nucleotide polymorphism.

Taken together, these comparative population genomic analyses demonstrate that the antagonistic allelic variation identified in LHM is neither recent nor population specific. To a significant degree, balancing selection maintains antagonistic variation over timescales that extend beyond the extension of the species range out of Africa, more than 10,000 years ago [50], and across species boundaries to D. simulans. The elevation in polymorphism generated by antagonism is not, however, indefinite, as indicated by the absence of a signal in the relatively more distant D. yakuba.


Our study provides the first, to our knowledge, analysis of the identity, function, and evolution of genome-wide sexually antagonistic sequence polymorphisms. Remarkably, we find that genetic variation at antagonistic loci is stably maintained across D. melanogaster populations throughout the species’ distribution range, and across species boundaries into D. simulans. These results demonstrate that the targets of antagonistic selection have been largely conserved for many millennia [50,5860]—and hundreds of thousands of generations—and that a number of antagonistic polymorphisms have arisen and persisted since the speciation event between D. melanogaster and D. simulans, approximately 1 million years ago. It is possible that our GWAS only captures a subset of antagonistic variants, i.e., those that remain polymorphic in the constant laboratory environment to which the LHM population has adapted. Nevertheless, the geographical stability and low turnover in antagonistic sequence variation that we detect among these loci imply that a significant proportion of the adaptive conflict between males and females is rooted in a fundamental aspect of the biology of the sexes. As a consequence, it persists even in the face of environmental variation [61] and is relatively unaffected by the adaptation of populations to the range of environmental conditions that they encountered during their colonisation of the globe [50,59,62], or the continuous adaptive evolution that occurs within temperate populations over the course of the seasons [63].

While sexual antagonism can generate balancing selection, the range of parameters over which simple models of antagonistic selection predict this to be the case is restrictive [16]. The fact that antagonistic alleles have persisted over such long timescales therefore suggests that additional forces operate to stabilise polymorphism. One prime candidate for such a force is dominance reversal, in which, at a polymorphic site, the allele with the beneficial effect is dominant in each sex. Such sex-specific dominance drastically widens the range of male and female fitness effects over which antagonistic selection actively maintains polymorphism, improving the prospects for the long-term maintenance of antagonistic allelic variation [34,64]. An empirical example of dominance reversal at a putative sexually antagonistic polymorphism has recently been documented in salmon [13], and a quantitative genetic study in seed beetles inferred a large and significant contribution of sex-specific dominance to sexually antagonistic fitness variation [28]. Based on this evidence, it is plausible to assume that dominance reversal will also be involved in the long-term maintenance of antagonistic polymorphisms in fruit flies.

The long-term stability of sexually antagonistic polymorphisms further suggests that the evolutionary constraints on sexual dimorphism inherent in antagonism are difficult to resolve. While we do not find any evidence that there is elevated pleiotropy among genes experiencing ongoing conflict, the persistence of antagonism fits with our finding that antagonistic polymorphisms are highly enriched for missense variants. While antagonistic selection on expression levels can be accommodated by the gradual evolution of sex-specific gene expression [65], adaptive conflicts over coding sequences can only be resolved through a complex multistep process [66] of gene duplication, sex-specific subfunctionalisation of coding sequences, and the evolution of differential expression of the two paralogues [67] (there is some evidence for this process in D. melanogaster, in which paralogues have been shown to acquire male-biased expression, but there is little correlation between sex-biased paralogue expression and sequence divergence [68]). The requirement for gene duplication, in particular, would be expected to constitute a severely limiting barrier for this route towards resolution, as suitable mutation events will be exceedingly rare. This large barrier to resolution, and the resulting stochasticity in which antagonisms will undergo resolution, may also help to explain the lack of GO enrichment observed among antagonistic genes.

The fact that antagonistic polymorphisms are enriched in coding regions and among genes with lower than average levels of expression sex bias may seem at odds with some recent work, which has placed emphasis on conflicts over gene expression. For instance, sex-biased gene expression has been shown to correlate positively with genome-wide polymorphism in flycatchers [36], while male-biased genes showed lower polymorphism in guppies [69], and alleles under sex-specific viability selection have been shown to exhibit intermediate levels of sex-biased expression in humans [39]. However, the conclusions from these studies are not straightforward to interpret because antagonistic selection is not measured directly; instead, it is inferred from expression bias or polymorphism, and these inferred effects are potentially confounded by partially sex-limited or relaxed selection. Furthermore, the results we have produced here do not imply that antagonism over gene expression levels is absent (indeed, we find a number of highly sex-biased antagonistic genes). Nevertheless, they do suggest that coding regions play a disproportionate role in ongoing and long-term constraints on adaptive evolution between the sexes—a feature that may extend beyond sexually antagonistic polymorphisms and affect other types of trade-offs. Interestingly, this is mirrored in polymorphisms associated with fluctuating selection in Drosophila [63] and with trade-offs between traits in humans [70], both of which are enriched among variants with missense effects. Conversely, genetic loci underlying local adaptation are found in excess among gene regulatory regions relative to coding regions [71], implying that regulatory regions can respond to context-specific selection faster. Thus, our results corroborate an emerging pattern whereby regulatory regions facilitate adaptation while protein-coding regions constrain it, at least over more modest evolutionary time spans.

We find no convincing evidence that the X chromosome is enriched for antagonistic variation, in contradiction to classic theory [3]. This discrepancy could be due to the presence of sex-specific dominance, as mentioned previously, and which is predicted to shift enrichment of antagonism from the X to the autosomes [34]. Antagonistic loci could also interact epistatically—a common process [72] that similarly favours the accumulation of autosomal antagonism [73]. However, while these general effects might explain the lack of X enrichment, our result also contradicts previous empirical findings obtained in the LHM population itself [18], which found that the X chromosome contributed disproportionally to antagonistic fitness variation. The previous study was based on a much smaller sample of genomes, with large uncertainties about the estimated chromosomal contributions. It was also performed more than 10 years ago and much closer to the establishment of LHM as a laboratory population. Accordingly, the discrepancy to our results might in part be explained by stronger genetic drift on the X chromosome relative to autosomes, which could in turn lead to a disproportionate loss of X-linked antagonistic polymorphisms [45]. Finally, given that our GWAS focuses on common variants, the fitness effects of low-frequency X-linked antagonistic variants might not be adequately captured, potentially exacerbating the absence of an enrichment of antagonistic variation on the X.

In contrast to some previous studies [19,25,74] (but see [24,26]), we do not detect a significant negative rMF among this sample of genotypes. One potential explanation for this result could be that current fitness variation in LHM is caused by variants with sex-limited effects. However, this hypothesis is difficult to reconcile with the contrasting properties of variants mapping to the antagonism and concordant fitness indices. Our data instead suggest that while some antagonistic variation may have been lost through genetic drift [45,47,75], the remainder is overlaid with highly polygenic sexually concordant variation [76]. This explanation fits with the observation that most traits in D. melanogaster exhibit strong positive intersexual genetic correlations [37,77,78]. This provides a large target for generating sexually concordant deleterious variants that would subsequently be maintained at mutation-selection balance—with fitness effects that have been documented in numerous experimental studies (see [79] for review). A further contribution to the nonnegative rMF that we observed could come from sexually concordant variants evolving under balancing selection due to their antagonistic pleiotropic effects on unmeasured components of fitness (e.g., juvenile fitness). Finally, we acknowledge that the hemiclonal fitness design does not allow us to capture the totality of fitness variation in LHM. For example, the fact that we measure fitness in a heterozygous background implies that unconditionally deleterious recessive alleles segregating at low frequencies are not adequately screened; additionally, small differences between the assay environment and the rearing regime typically experienced by LHM flies—despite all attempts at minimising such differences—may have contributed some measurement error in our fitness estimates.

In addition to addressing long-standing gaps in our understanding of sexual antagonism, this study also informs broader debates regarding the forces that maintain genetic variation for fitness in natural populations, which remain insufficiently understood [80]. By identifying genome-wide antagonistic variants and linking these loci to signatures of balancing selection in independent populations, our results supplement a growing body of evidence suggesting that balancing selection can influence patterns of genetic variation on a genome-wide scale [81], rather than a very limited number of isolated loci, as is sometimes assumed [82]. For example, previous quantitative genetic analyses have shown that levels of genetic variation observed among populations of Drosophila [79] and other species [83] far exceed the levels predicted under mutation-selection balance alone, implying an important role for balanced polymorphisms. Moreover, a recent genomic study in D. melanogaster has linked candidate loci under opposing selection between seasons with long-term elevations in genome-wide polymorphism [63], emphasising the importance of fluctuating balancing selection across the genome. Sexually antagonistic selection should contribute particularly strongly to the buildup of balanced polymorphisms, given that there is abundant evidence for sex-specific selection in nature [4,84] and that many traits exhibit strong positive genetic correlations between the sexes [37,77]. Together, these factors generate permissive conditions for the evolution of sexually antagonistic polymorphisms relative to alternative sources of balancing selection [46,85].

Our study provides a starting point from which to clarify the relative role of antagonistic and nonantagonistic modes of balancing selection towards the maintenance of genetic variation. But it also provides the foundation for further work on the genetics of sexual antagonism, including elucidating its functional basis and testing theories regarding its resolution (e.g., [86]). Together, these future studies will allow us to better understand how males and females can, or cannot, respond to sex-specific selection and adapt to their respective reproductive roles.

Materials and methods

LHM hemiclones

LHM is a laboratory-adapted population of D. melanogaster that has been maintained under a highly controlled rearing regime since 1996 [23]. A random sample of 223 genetic lines was created from the population [21] using a hemiclonal approach [22]. Individuals of each line carry an identical haploid genome comprising the major chromosomes X, 2, and 3. Crosses with flies from custom stocks allow the generation of many replicate individuals—males and females—that carry a line’s X-2-3 haplotype alongside a random chromosomal complement from the LHM population that can be assayed for fitness. In our experiment, 16 randomly sampled LHM females were used to supply the chromosomal complements for each line, sex, and block.

Fitness measurements

Lifetime adult reproductive fitness of males and females of each line was measured using proven assays designed to mimic the LHM rearing regime [87]. For male fitness, we measured competitive fertilisation success by setting up competition vials containing 5 hemiclonal males from a given line, 10 competitor bw males, and 15 virgin bw females. After two days, bw females were isolated into individual vials containing no additional yeast and left to oviposit for 18 hours. On day 12 post egg laying, progeny were scored for eye colour. Male fitness was calculated as the proportion of offspring sired by the 5 hemiclonal males (those with wild-type eye colour), combining progeny data from the 15 oviposition vials. This assay was repeated five times in a blocked design; estimates for each line were therefore based on fitness measurements from 25 hemiclonal males. As expected for flies carrying a homozygous phenotypic marker mutation, bw males are slightly inferior to wild-type males (they sired approximately 46.4% instead of the expected two thirds of offspring across all fitness assays). However, they provide a meaningful competitive standard for our male fitness measures, as evidenced by the significant heritability in male mating success.

Female fitness was measured as competitive fecundity. Competition vials containing 5 virgin hemiclonal females from a given line, 10 competitor bw females, and 15 bw males were set up. Two days later, the 5 hemiclonal females were isolated into individual vials and left to oviposit for 18 hours. These vials were immediately chilled at 4°C and fecundity was measured by counting the number of eggs laid per female. This assay was replicated five times in a blocked design; each line estimate therefore measured the fitness of 25 hemiclonal females.

Fitness data were subjected to quality control and preprocessing in preparation for quantitative genetic and association analysis. Male fitness data from competition vials in which not all 5 focal males were present at the end of the assay were removed from further analysis. Similarly, we omitted female oviposition vials in which fewer than 2 eggs were present (indicating partial sterility or failure to mate) or in which the female had died over the course of the assay. For each sex, fitness measurements were then first box-cox transformed to be normally distributed within each block, then scaled and centred. To calculate SNP heritabilities and for association analysis, data from each block were averaged to obtain one fitness estimate for each line and sex.

Quality control of whole-genome sequences

We used previously published whole-genome sequences generated from the hemiclonal lines analysed here [21] (available at Details about DNA extraction, library preparation, sequencing, read processing, and SNP calling are provided in the original publication. Prior to the association analysis performed here, further site-level quality filtering steps were performed in vcftools [88] and PLINK [89]. First, individual variant calls based on depth <10 and genotype quality <30 were removed. Second, individuals with >15% missing positions were removed. Third, positions with poor genotype information (<95% call rate) across all retained individuals were discarded. Finally, given the relatively small sample size of the dataset as a whole and the low power of an association test for rare variants, we retained only common variants (MAF > 0.05) for further analysis. From an initial dataset of 220 hemiclones containing 1,312,336 SNPs, this yielded a quality-filtered dataset of 765,980 SNPs from 203 hemiclones.

To detect outliers, we examined LHM’s population structure using principal components analysis (PCA). Overlapping SNP positions from the 203 LHM genomes and from an outgroup population (DGRP [48]) consisting of 205 whole-genome sequenced individuals were used as input to construct a genetic similarity matrix. This set of SNPs was pruned for LD such that no two SNPs with r2 > 0.2 within 10 kb remained. The leading PC axes were extracted in LDAK (‘Linkage Disequilibrium Adjusted Kinships’) [90]. After removal of one outlier (see S1 Fig), the final dataset used for association analysis contained 202 individuals and 765,764 SNPs.

Heritability analyses

We estimated the variance-covariance matrix for fitness and sex-specific residual variances by fitting a model using MCMCglmm [91] implemented in R. Specifically, we fitted the model , where is the scaled and centred fitness of individual k of genotype j and sex i, is the sex-specific random effect of genotype j in sex i, and describes the sex-, genotype-, and individual-specific residual. The genotypic fitness effects in males and females follow a bivariate normal distribution , where is the genetic variance-covariance matrix across sexes (composed of male and female additive genetic variances and and the intersexual genetic covariance ). Residuals follow a normal distribution , where is the sex-specific residual variance, and are assumed to be uncorrelated across sexes.

From these variance estimates, we calculated male and female heritabilities of fitness as , where the subscript i indicates either male or female. The factor 2 in the heritability calculation reflects the fact that with the hemiclonal approach, individuals assayed share half their genetic material (the hemizygous hemiclonal genome). The intersexual genetic correlation was calculated as . The quantitative genetic parameters , and were calculated for each sample from the Monte Carlo Markov chain. From these series of values, we obtained point estimates (averages) and 95% credible intervals (using the function HPDintervals).

As a complementary approach, we estimated the SNP heritability () of male and female fitness in LDAK [90]. This approach uses restricted maximum likelihood (REML) [92] to fit a linear mixed model that expresses the vector of phenotypes Y as a function of genome-wide SNP genotypes, treated as random effects: where K is the kinship matrix, a vector of additive genetic variances for each SNP, the vector of residual variances, and I an individual identity matrix. SNP heritability is then estimated as .

LDAK corrects for local linkage when calculating SNP heritabilities to avoid inflation of in clusters of linked sites that otherwise arises because several SNPs tag the same causal polymorphism. SNPs are weighted inversely proportional to their local linkage, such that SNPs in high LD contribute less to than SNPs in low LD. This model has been shown to substantially improve heritability estimates across a wide range of traits [31]. LDAK also allows us to set the parameter α that determines how SNPs are weighted by their MAF (as ) when calculating the kinship matrix K. We used the default of = −0.25, which provides a steeper relationship between MAF and than the value of −1 that is frequently used in studies on humans. Significance of estimates was assessed by permuting phenotype labels 1,000 times, recalculating on each permutation, as above, and calculating the number of permuted estimates that exceeded the observed.

Quantification and association analysis of sexual antagonism

To identify loci underlying sexual antagonism, we followed the approach employed by Berger and colleagues [27] and subsequent researchers [28] and defined an ‘antagonism index’ (Fig 1A). Specifically, we rotated the coordinate system of the male and female fitness plane by 45 degrees, by multiplying the matrix of fitness coordinates (average male and female fitness estimate for each hemiclonal line) by a rotation matrix:

This transforms the matrix of hemiclonal male and female fitness values into positions on a bivariate coordinate system (of dimension 2 × 202 lines) with one sexually antagonistic and one sexually concordant axis. These positions represent the values of the ‘antagonism index’ used to map antagonistic genetic variation, as well as the ‘concordant index’ used for comparative purposes (see section ‘Comparison of antagonistic and concordant variants’ below).

The univariate approach we employ for measuring antagonistic effects has several benefits. First, it mirrors the approach taken by previous researchers in this field [27,28]. Second, univariate approaches have proved effective in other contexts (e.g., the study of human obesity [29]). Third, although additional power to disentangle sex-limited from antagonistic effects could be gained by employing bivariate analysis, this method would additionally require combining significance measures on two axes to distinguish variants with significant sexually antagonistic effects from those with sex-limited effects. Our approach, in contrast, provides a simple and clear measure of antagonistic effects that is appropriate for our main focus, which is to draw general conclusions about the properties of antagonistic variants.

We calculated the SNP heritability of the antagonism index (‘antagonistic ’) in LDAK, following the same procedure and settings as those for estimating sex-specific SNP heritabilities.

We performed a GWAS by applying a linear mixed model to test the effect of allelic variants at each SNP on the antagonism index, while including the kinship matrix as a random effect to account for the heritable portion of genetic variation attributable to kinship between individuals. This approach has been shown to effectively control the false positive rate and increase power to detect true associations in samples with moderate degrees of population structure and close relatedness, such as LHM [30,93]. The GWAS was implemented in LDAK (settings as above) and a Wald test was used to generate P values for each position (Fig 1B).

To estimate the extent to which genetic confounding affects GWAS P values, we performed two procedures. First, we calculated a genomic inflation factor [94] (median /median using the GenABEL [95] package) (S2 Fig). Second, we used a permutation-based test to compute empirical P values and compared these P values with those computed through a Wald test (S3 Fig). In order to perform permutation testing when individuals have different degrees of genetic resemblance, we used the method in Nicod and colleagues [96]. In brief, we modelled the phenotypic variance-covariance matrix as , where is the genetic similarity matrix, is the identity matrix, and are the genetic and environmental variance components. This is the standard mixed-model decomposition. We then computed the matrix square root and multiplied the phenotypes and genotypes by the matrix to generate a transformed dataset whose variance matrix is the identity . Thus, the transformed phenotypes are all equally related, and so are exchangeable for permutations. We then performed 100,000 permutations of z and performed the mixed-model GWAS (which, in the case of the transformed data, becomes an ordinary least squares model) to determine the empirical P value of each SNP. Empirical and parametric P values were highly correlated (S3 Fig).

Defining candidate antagonistic SNPs and regions

We corrected for multiple testing using an FDR approach and converted P values into Q-values using the method of Benjamini and Hochberg [97]. We defined antagonistic SNPs as sites with FDR Q-values <0.3 and nonantagonistic SNPs as sites with Q-values 0.3. Given the observed distribution of Q-values (see S4 Fig), choosing a cutoff of 0.3 allowed us to achieve a suitable balance between false positives and false negatives. These candidate antagonistic SNPs show a near-perfect overlap with the set of SNPs exhibiting the lowest empirical P values (as obtained through the permutation-based approach), further supporting their robustness as candidates (S3 Fig).

For analyses that consider larger genomic regions (windows), we ran a set-based association test implemented in LDAK (options using ‘–calc-genes-reml’, ‘ignore-weights YES’, and = −0.25). The test calculates set-wide via REML, corrects for local relatedness using the predictors in each window, and computes a P value using a likelihood ratio test (LRT). The sets we used were 1,000-bp windows (500-bp step), defined according to Drosophila Reference 5 genome coordinates and subsequently converted (using the liftOver tool [98]) to Release 6 coordinates. This was a necessary step, as publicly available polymorphism data were mapped to Release 5 of the D. melanogaster genome, whereas the GWAS data were mapped to Release 6. We then calculated window-based Q-values from the LRT P values and defined antagonistic windows as those with a Q-value <0.1.

Comparison of antagonistic and concordant variants

To support our inference that the GWAS of the antagonism index captures genetic variation with antagonistic effects, we compared these GWAS P values with P values estimated from a GWAS of the ‘concordant index’ (S4 Fig), which describes the position of individual fly lines on the axis ranging from extremely male- and female-detrimental fitness effects to extremely male- and female-beneficial fitness effects. We also transformed these P values into Q-values (as described above) and compared Q-values for both indices. We did not analyse sexually concordant variants further, owing to the absence of any sites that are significantly associated with this phenotype (minimum Q-value = 0.78).

Genomic distribution of antagonistic SNPs

To estimate the number of independent antagonistic regions, we performed LD clumping in PLINK [89]. We used a significance threshold of 0.00093 for the index SNP (the maximum, least significant P value across all antagonistic SNPs) and clustered (‘clumped’) neighbouring antagonistic SNPs by specifying an r2 threshold of 0.4 and a distance threshold of 10 kb.

We also quantified the clustering by calculating the median distance between all pairs of adjacent antagonistic SNPs across chromosome arms (S5 Fig). We did this separately for the autosomes and X chromosome, to accommodate for the lower SNP density on the X chromosome. We tested for significant clustering by using a permutation test, in which antagonistic/nonantagonistic labels were permuted among all SNPs, distances between adjacent SNPs labelled as ‘antagonistic’ after permutation were recalculated as before, and the median distance recorded. This process was repeated 1,000 times in order to generate a null distribution of median distances. The significance of clustering among true antagonistic SNPs was calculated as the proportion of median distances in the null distribution that were lower than or equal to the true median distance.

To examine the proportional contribution of autosomal and X-linked antagonistic variants to total , we used two complementary methods. First, we partitioned the genome into X chromosome and autosome subsets, and calculated via REML in LDAK, each subset in turn (settings as above) (Fig 2A). The observed proportion of contributed by each compartment was then compared with the expected proportion (i.e., the proportion of LD-weighted predictors belonging to each compartment). We tested whether the two compartments contributed significantly more than expected using a two-sample Z-test. Second, we compared the proportion of antagonistic SNPs (Q-value < 0.3) with the proportion of all SNPs mapping to each chromosomal compartment, using Z-tests (S6 Fig). The under- or overrepresentation of antagonistic SNPs (deficit or excess of antagonistic compared with all SNPs) in each compartment is therefore unaffected by differences in SNP density between chromosome arms, such as the lower density on the X chromosome.

Functional analyses of antagonistic loci

We used the variant effect predictor (Ensembl VEP [99]) to map SNPs to functional categories. We partitioned total antagonistic into functional subsets, and estimated the observed proportion of contributed by each subset using REML in LDAK (settings as above) (Fig 2B). We then used a permutation test to compare observed and expected for each functional category, in which we shifted genome-wide annotations to a random starting point along a ‘circular genome’. This procedure breaks the relationship between each SNP and its annotation while preserving the order of annotations and their associated LD structure [100]. was recalculated via REML for each of 1,000 permuted datasets, and two-tailed P values were determined as the sums of permuted estimates with more extreme absolute values than the observed. As a complementary approach, we compared the proportion of antagonistic SNPs with the proportion of all SNPs mapping to each functional category (S6 Fig). We then assessed enrichment for each functional category in turn using Z-tests.

We also used the VEP to map SNPs to genes. We included extended gene regions (± 5 kb of gene coordinates, VEP default) in our gene definition. To gain preliminary insights into the functions of antagonistic genes we used the Gorilla [101] GO tool, with FDR correction for multiple testing across GO terms. All genes covered in the final SNP dataset were used as the background set.

To examine the relationship between antagonistic genes and sex-biased gene expression, we used the Sebida online database [38] to annotate genes as having either sex-biased or unbiased expression profiles (meta-class identifier). We then used a test to compare the sex-biased expression status of antagonistic and nonantagonistic genes. We additionally examined the quantitative degree of sex bias using this same dataset (Fig 3A). We took the absolute value of the log2-transformed ‘M_F’ bias variable, such that large values indicate more extreme sex bias in expression, irrespective of whether this bias is towards males or females. We compared the distributions of this variable between antagonistic and nonantagonistic genes using a Wilcoxon rank-sum test. Finally, to examine the shape of the relationship between antagonism and expression sex bias, we modelled the binary candidate status of genes (antagonistic/nonantagonistic) as a function of expression sex bias using GLMs with binomial error structure (logit link function). We tested for a quadratic relationship between sex bias and candidate status (Fig 3B) by comparing a second-degree polynomial model with a model including only the linear effect of sex bias and its square. We also assessed the fit of a fourth-degree polynomial model by comparing it with the second-degree model. Model comparisons were performed using LRTs based on the distribution.

To assess the degree of overlap between antagonistic genes identified here and those associated with sexually antagonistic expression patterns in a previous study [17], we included only genes covered in both datasets, and only those genes in both datasets that were adult expressed. To determine whether genes were adult expressed, we used the Drosophila gene expression atlas (FlyAtlas [102]). Conservatively, we considered a gene ‘adult-expressed’ if its transcript was detected as present in at least one library of one adult-derived sample. We then used a test to assess the degree of overlap between the datasets.

We used the tissue-specificity index () to compare pleiotropy between antagonistic and nonantagonistic genes. We used gene expression data from FlyAtlas [102] to get average expression values for each gene and in each tissue and then calculated as where is the proportional expression level of the gene in tissue , and is the number of tissues. We then excluded sex-limited genes from this dataset by removing those genes that fell into the most extreme 5% quantiles for the female- and male-biased M/F ratio distributions from the Sebida data. Finally, we compared values of for antagonistic and nonantagonistic genes using a Wilcoxon rank-sum test.

As an additional proxy for pleiotropy, we examined the number of PPIs between antagonistic and nonantagonistic genes. We used the physical interactions table from FlyBase [103] to summarise the total number of PPIs for all genes and then compared antagonistic and nonantagonistic genes using a GLM with quasipoisson error structure to account for overdispersion.

Comparative population genomic data

To analyse SNP polymorphism outside the LHM population, we used publicly available population genomic data from three wild D. melanogaster populations. The first is an introduced population from North America (DGRP [48,49]: 205 whole-genome sequences derived from inbred lines). The two others come from D. melanogaster's ancestral distribution range in sub-Saharan Africa (ZI: 197 whole-genome sequences derived from haploid embryos; SA: 118 whole-genome sequences derived from inbred lines, which combine data from subpopulations 'SD' and 'SP' and have very low population differentiation [52]).

All genome sequences were downloaded as FASTA files from the Drosophila Genome Nexus website ( These files had been generated following standardised alignment and quality-filtering steps [51] and were further quality filtered for admixture and identity by descent using scripts provided on the Genome Nexus website. We used SNP-sites [104] to call SNPs and convert the multiple sequence alignments to VCF format. Allele frequencies in the three populations were calculated using vcftools [88]. We further excluded tri-allelic and poorly covered sites (call rate <20).

SNP-based analyses of balancing selection

To test whether antagonistic sites are associated with signatures of balancing selection at the level of individual SNPs, we used three comparison populations (DGRP, ZI, SA) and only considered data for those positions that appeared as SNPs in the LHM. Our analyses focussed on SNP polymorphism (MAF) in relation to antagonistic fitness effects but controlled for a number of potential confounders (the exact details are elaborated on in the following paragraphs). First, they controlled for MAF ascertainment bias, which could be caused by the higher MAF of antagonistic relative to nonantagonistic sites in LHM itself. Second, they controlled for ‘linked selection’, which could differentially affect antagonistic/nonantagonistic sites if each class of site is nonrandomly distributed across the genome. Third, they controlled for pseudo-replication between neighbouring SNPs due to LD. Finally, all SNP-based analyses focussed on within-population comparisons, so demographic differences between populations did not confound our analyses.

We performed three main analyses (A, B, and C). In analysis A, we asked whether antagonistic SNPs showed greater polymorphism than nonantagonistic SNPs and compared MAF (corrected for confounding effects) at the two classes of sites. In analysis B, we assessed the sharing of SNP polymorphism between LHM and a comparison population in relation to antagonistic fitness effects by looking for an association between absolute GWAS effect size and the probability that an SNP shows polymorphism (i.e., has MAF > 0 for the same variants as in LHM) in each non-LHM population. In analysis C, we assessed the relationship between polymorphism and antagonistic fitness effects in a quantitative way and looked for an association between absolute GWAS effect size and MAF in each population.

In our first analysis comparing antagonistic/nonantagonistic MAF (analysis A; Fig 4B, 4F and 4J), we first LD pruned the LHM dataset by clumping (in PLINK) to avoid pseudo-replication due to correlations between SNPs. For antagonistic sites, we used the 226 index SNPs identified in the previous clumping (see section ‘Genomic distribution of antagonistic SNPs’). For nonantagonistic sites, a nonantagonistic SNP was randomly chosen as an index SNP and clumped by clustering all SNPs within 10 kb with r2 > 0.4. Pruning in this manner reduced the original dataset of 765,764 SNPs to 36,316 ‘LD-independent’ SNPs. Note that LD typically decays within 1 kb in wild D. melanogaster populations and that the degree of LD pruning used in the above analyses is therefore highly stringent (analyses that use less stringent thresholds return similar results).

We then used this LD-independent dataset to compare MAF between antagonistic and nonantagonistic SNPs (we assigned MAF = 0 to sites that were monomorphic in a comparison population and those in which a comparison population was polymorphic for variants other than those segregating at that site in the LHM). We did this using a Monte Carlo approach in which, 1,000 times, we drew 226 nonantagonistic ‘control’ SNPs and carefully matched them to the 226 antagonistic SNPs in terms of their LHM MAF and genome-wide estimates of ‘linked selection’ [105] (estimates of linked selection quantify local recombination rates and proximity to functional sequences in D. melanogaster and thereby account for factors that affect polymorphism along the genome, such as background selection and selective sweeps). The matching procedure first corrected LHM MAF for linked selection by taking the residuals of a linear regression of LHM MAF on estimates of linked selection. Then, sets of 226 nonantagonistic SNPs were drawn to match the linked selection-corrected LHM MAF distribution of the 226 antagonistic SNPs, and for each set we calculated the mean MAF in the comparison population. The 1,000 sets generated in this way provided a null distribution of MAFs for nonantagonistic sites in each comparison population. P values for deviations in polymorphism between antagonistic and nonantagonistic sites were then calculated by comparing, in each population, the mean MAF of the 226 antagonistic SNPs with the null MAF distribution.

In our second analysis (analysis B; Fig 4C, 4G and 4K), we used the same LD-pruned dataset but considered SNP polymorphism as a function the whole spectrum of effect sizes, rather than a binary split of SNPs into antagonistic/nonantagonistic categories. We performed a logistic regression, modelling the binary response of whether or not an LHM SNP is polymorphic in a given population (i.e., has MAF > 0 and harbours the same alleles as those found in LHM) as a function of the absolute effect size of the SNP in LHM, while including LHM MAF and estimates of linked selection as covariates. To assess significance, we performed a test comparing a model that did include absolute effect size to one that did not.

Our third analysis (analysis C; Fig 4D, 4H and 4L) was similar to analysis B, but rather than asking whether the presence of polymorphism in another population varied with antagonistic effect size in LHM, it assessed whether MAF in another population varied with effect size. To do so, we used the same LD-pruned dataset and performed binning in two dimensions, by residual LHM MAF (20 quantiles) and GWAS effect size (100 quantiles). We then drew one SNP from each of these MAF/effect size bins (2,000 SNPs in total), recorded the MAF for each in the comparison population of interest, and finally correlated these MAF values with effect sizes using a Spearman’s rank correlation. Under the hypothesis of antagonism-mediated balancing selection, we would expect to see a positive correlation between MAF and effect sizes across these matched sets of SNPs, with SNPs with higher effect sizes tending to be associated with higher MAFs in a comparison population under consideration than SNPs with lower effect sizes.

To provide insights into the functions of long-term balanced antagonistic sites, we repeated the above analyses for each functional category in turn in the ZI population (S7 Fig). Because of the small number of antagonistic SNPs in each category and the low resulting power of this analysis, we did not perform LD clumping prior to binary antagonistic/nonantagonistic comparisons. In effect size–based analyses, in which the pool of possible sites to draw from was larger, LD clumping was performed.

Window-based analyses of balancing selection

We performed genome-wide sliding window analyses (1,000-bp windows, 500-bp step size) to investigate regional signatures of balancing selection (Fig 5A). Tajima's D, which compares SNP polymorphism (nucleotide diversity, ) with SNP abundance (the Watterson estimator, ), was compared for windows defined as antagonistic (Q-value < 0.1) or nonantagonistic (Q-value 0.1) from the set-based analysis (see section ‘Defining candidate antagonistic SNPs and regions’). Under the hypothesis that antagonism generates balancing selection, Tajima’s D is expected to be elevated in antagonistic windows. We calculated Tajima’s D for each comparison population using PopGenome [106] in R. As in SNP-based analyses, we incorporated estimates of linked selection [105] (estimated in 1,000-bp windows) by calculating the residuals of a regression of Tajima’s D on estimates of linked selection. Because estimates of linked selection were not available for windows on the X chromosome, we instead used estimates of the recombination rate on this chromosome [107]. We then used a GLM, assuming Gaussian error structure, to compare residual Tajima’s D between antagonistic and nonantagonistic windows.

We also tested for another signature of balancing selection, reduced population differentiation (Fig 5B). Measures such as FST are often considered problematic because they do not correct for the dependency of FST on local levels of polymorphism [108]. However, the availability of genome-wide estimates of linked selection in D. melanogaster allowed us to incorporate this confounding variable explicitly. We therefore estimated FST over windows, using PopGenome, correcting FST for linked selection in a way analogous to that used for Tajima’s D. Because the distribution of FST values is not normally distributed, we contrasted residual FST between antagonistic and nonantagonistic windows using Wilcoxon rank-sum tests.

Linkage-based analyses of balancing selection

We examined the extent to which antagonistic haplotypes are selectively maintained by investigating whether antagonistic SNPs have unusually high LD in the ZI population, the population that is most distant from LHM and in which levels of LD between antagonistic SNPs should be weakest in the absence of long-term balancing selection. Thus, for all SNPs situated within 1,000 bp of one another in ZI and that were also covered in LHM (i.e., SNPs that could be inferred to be either antagonistic or nonantagonistic), we calculated pairwise LD (r2) in PLINK. We then compared r2 values between pairs of antagonistic SNPs and two control pairs: nonantagonistic pairs and ‘mixed’ pairs (antagonistic/nonantagonistic) (Fig 5C). Comparing pairs of antagonistic SNPs to the mixed pairs allowed us to consider only SNPs located close to an antagonistic SNP, thus effectively controlling for possible nonrandom distributions of antagonistic pairs and nonantagonistic pairs with respect to genome-wide recombination rates.

To test for significant differences in LD between antagonistic pairs and the two control pairs, we modelled variation in r2 as a declining exponential function of chromosomal distance and assessed differences in residual r2 (once distance was regressed out) using Wilcoxon rank-sum tests.

Population genomic analyses for D. simulans and D. yakuba

We analysed polymorphism data from two closely related species from the D. melanogaster species group, D. simulans and D. yakuba, which are estimated to share a common ancestor with D. melanogaster approximately 1.5 million years and approximately 3 million years ago, respectively [55]. Our primary dataset for D. simulans consisted of 170 high-coverage whole-genome sequences from North American D. simulans inbred lines [56] (VCF files downloaded from These SNPs were further quality filtered (call rate >80%, depth >10, biallelic sites only) and allele frequencies estimated using vcftools [88]. The liftOver tool was then used to convert D. simulans to D. melanogaster coordinates.

A secondary dataset consisted of 20 high-coverage genomes from D. simulans and D. yakuba isofemale lines [57], respectively (downloaded from These sequences are derived from flies sampled in each species’ African distribution range (Madagascar and Kenya in the case of D. simulans; Cameroon and Kenya in the case of D. yakuba). Whole-genome sequences from each individual were first aligned to the D. melanogaster Release 5 genome using Mauve [109]. We then filtered the alignments to keep only positions that were polymorphic and whose call rate was 100% across the 20 individuals of each species. We finally converted coordinates from D. melanogaster Release 5 to Release 6 using the liftOver tool.

To test whether antagonistic SNPs are associated with signatures of balancing selection in D. simulans and D. yakuba, we asked whether antagonistic sites were enriched among sites that were trans-specific. A trans-specific site was defined as a position where (i) polymorphism was detectable in LHM and the species under consideration, and (ii) the allelic variants at that polymorphism matched those observed in LHM. If one or both of these conditions were not met, the site was categorised as ‘non-trans-specific’.

To compare the trans-specific status of antagonistic and nonantagonistic sites (Fig 6B, 6C and 6D), we replicated the Monte Carlo approach used to compare antagonistic/nonantagonistic MAFs among populations of D. melanogaster (i.e., analysis A in section ‘SNP-based analyses of balancing selection’)—that is, we sampled a set of LD-pruned nonantagonistic sites (LD clumping using r2 > 0.4, within 1 kb) and matched their (linked selection–corrected) MAFs in LHM to those of antagonistic sites. For each of 1,000 sets of frequency-matched nonantagonistic sites, we recorded the proportion of sites that were trans-specific. The observed proportion of trans-specific antagonistic sites was then compared against this null distribution to generate an empirical P value.

To test whether trans-specific status varied with absolute GWAS effect size (Fig 6A), we mirrored analysis B described in ‘SNP-based analyses of balancing selection’. We performed a logistic regression with trans-specific status as the dependent variable and effect size the independent variable, with MAF in LHM and linked selection included as covariates to account for ascertainment bias. We used a set of LD-pruned polymorphisms (LD clumping using r2 > 0.4, within 1 kb) for this analysis as well.

Statistical software

All statistical analyses were carried out in RStudio (version 1.0.136 [110]). The analysis code is available at

Supporting information

S1 Fig. Population structure and relatedness in LHM.

(A) Scatterplot of the first and second principal components of a PCA constructed from SNPs present among LHM (grey) and DGRP (red) populations. Principal components are computed from common (MAF > 0.05), LD-pruned (r2 > 0.2 within 10 kb) and high-quality (site-level call rate >95%) sites only. One notable outlier individual (black arrow) was removed prior to performing the GWAS. (B) Histogram of off-diagonal genomic relationship values between the 202 LHM individuals retained for GWAS. Our sample consists of individuals with mostly low relatedness, with a small number of pairs of highly related individuals. (C) LD (measured as r2) in LHM between pairs of SNPs situated within 1 kb of each other. Points represent mean r2 across 25-bp bins of distance; line represents a fitted declining exponential relationship between distance and r2. Data and code underlying this figure can be found at DGRP, Drosophila Genetic Reference Panel; GWAS, genome-wide association study; LD, linkage disequilibrium; MAF, minor allele frequency; PCA, principal component analysis; SNP, single nucleotide polymorphism.


S2 Fig. Testing for inflation due to population structure and relatedness.

Q–Q plots of expected and observed P values from SNP-wise Wald χ2 tests for allelic effects on the antagonism index, based on a linear mixed model including the kinship matrix to correct for population structure and relatedness (purple dots), or on a simple linear model without kinship correction (black dots). The genomic inflation factor of the mixed model (λmedian = 0.967) indicates that population structure and relatedness have been well controlled for versus the simple linear model (λmedian = 1.209). Data and code underlying this figure can be found at Q–Q, quantile-quantile; SNP, single nucleotide polymorphism.


S3 Fig. Relationship between empirical and parametric P values.

(A) SNP-wise P values obtained through a Wald χ2 test plotted against empirical P values obtained through 100,000 permutations of the kinship-scaled phenotypic values across individuals (see Materials and methods). The two sets of P values are very highly correlated (Pearson’s r > 0.999) and the regression coefficient (fitted line, pink) is very close to 1 (β = 0.996), indicating that parametric P values are robust. (B) Overlap between 2,372 candidate antagonistic SNPs—as defined from a Wald χ2 test and using a Q-value cutoff of 0.3—and the 2,372 sites with the lowest empirical P values. The near-perfect overlap between these two sets of sites (purple) reflects the strong positive correlation between parametric and empirical P values illustrated in A. and indicates that the candidate antagonistic SNPs defined through a parametric approach are robust. The mean Q-value across the 2,372 candidate antagonistic SNPs is comparable across both approaches, although somewhat higher when estimated from empirical P values (mean Q-value = 0.407) relative to parametric P values (mean Q-value = 0.267). Data and code underlying this figure can be found at SNP, single nucleotide polymorphism.


S4 Fig. Properties of variants mapping to the antagonism and concordant fitness indices.

(A) Relative male and female lifetime reproductive fitness estimates for 223 D. melanogaster hemiclonal lines. Colours denote each line’s concordant index, i.e., their position along a spectrum (dashed arrow) ranging from male-detrimental, female-detrimental fitness effects (red) to male-beneficial, female-beneficial effects (blue). The concordant index is orthogonal to the antagonism index. (B) Histogram of Wald χ2 P values for variants mapped to the antagonism and concordant index. Variants associated with the antagonism index are significantly more enriched for very low P values than those associated with the concordant index. (C) Histogram of Q-values for the antagonism and concordant index. Vertical dashed line represents Q-value cutoff used for defining antagonistic/nonantagonistic sites. Data and code underlying this figure can be found at


S5 Fig. Permutation tests of median distance between antagonistic SNPs.

Density curves depict the distribution of median distances between SNPs labelled 'antagonistic', across 1,000 permutations of labels. Permutation tests were performed separately for the autosomes and the X chromosome. Red lines show the observed median distance between antagonistic SNPs on the autosomes (147 bp) and the X chromosome (298 bp), respectively. Data and code underlying this figure can be found at SNP, single nucleotide polymorphism.


S6 Fig. Enrichment of antagonistic SNPs among chromosomal compartments and variant types.

(A) Enrichment of antagonistic SNPs across individual chromosome arms. Results of Z-tests that compare the number of antagonistic candidate SNPs on each chromosome arm relative to all SNPs covered in the final SNP dataset are shown. (B) Same as A. but grouping autosomal chromosome arms together. (C) Enrichment of variant types among antagonistic SNPs within variant type categories (see for definitions). Shown are the results of Z-tests comparing the number of candidate antagonistic SNPs falling into each functional category against the representation of each category among all SNPs (see Materials and methods). For all plots, dark blue = statistically significant Z-test (P < 0.05), light blue = non-statistically significant Z-test (P > 0.05). Data and code underlying this figure can be found at SNP, single nucleotide polymorphism.


S7 Fig. SNP-based signatures of balancing selection in the ZI population, split by functional annotation.

These analyses replicate Fig 4 but only consider SNPs situated in each functional category in turn. Excess polymorphism among antagonistic sites and positive relationships between excess polymorphism and GWAS effect size indicate that SNPs with effects on the antagonistic phenotype show elevated MAF. This can be the case even if a particular class of function is not more often (or even significantly less often) associated with antagonism than expected by chance (cf. Figs 2B and S6), indicating that the signal of balancing selection is due to the antagonistic effects of individual SNPs and not the general properties of its functional class. Data and code underlying this figure can be found at GWAS, genome-wide association study; MAF, minor allele frequency; SNP, single nucleotide polymorphism; ZI, Zambia.


S1 Table. List of antagonistic genes.

Genes where antagonistic SNPs were found. This includes information on their genomic location, how many antagonistic/nonantagonistic SNPs fall in each gene, and whether the antagonistic/nonantagonistic SNPs have missense effects. SNP, single nucleotide polymorphism.



We are grateful to Aida Andrés for her suggestions on analyses of balancing selection; Doug Speed for his suggestions on the usage of LDAK; and Laurent Keller, Andrew Pomiankowski, and François Balloux for helpful comments on earlier versions of the manuscript.


  1. 1. Bonduriansky R, Chenoweth SF. Intralocus sexual conflict. Trends Ecol Evol. 2009;24:280–8. pmid:19307043
  2. 2. Van Doorn GS. Intralocus sexual conflict. Ann N Y Acad Sci. 2009;1168:52–71. pmid:19566703
  3. 3. Rice WR. Sex chromosomes and the evolution of sexual dimorphism. Evolution. 1984;38:735–742. pmid:28555827
  4. 4. Cox RM, Calsbeek R. Sexually antagonistic selection, sexual dimorphism, and the resolution of intralocus sexual conflict. Am Nat. 2009;173:176–87. pmid:19138156
  5. 5. Pennell TM, Morrow EH. Two sexes, one genome: The evolutionary dynamics of intralocus sexual conflict. Ecol Evol. 2013;3:1819–34. pmid:23789088
  6. 6. Mokkonen M, Kokko H, Koskela E, Lehtonen J, Mappes T, Martiskainen H, et al. Negative frequency-dependent selection of sexually antagonistic alleles in Myodes glareolus. Science. 2011;334:972–4. pmid:22096197
  7. 7. Stulp G, Kuijper B, Buunk AP, Pollet T V, Verhulst S. Intralocus sexual conflict over human height. Biol Lett. 2012;8:976–8. pmid:22875819
  8. 8. Tarka M, Åkesson M, Hasselquist D, Hansson B. Intralocus sexual conflict over wing length in a wild migratory bird. Am Nat. 2014;183:62–73. pmid:24334736
  9. 9. Svensson EI, McAdam AG, Sinervo B. Intralocus sexual conflict over immune defense, gender load, and sex-specific signaling in a natural lizard population. Evolution. 2009;63:3124–35. pmid:19624721
  10. 10. Rice WR. Sexually antagonistic genes: experimental evidence. Science. 1992;256:1436–9. pmid:1604317
  11. 11. Berger D, Berg EC, Widegren W, Arnqvist G, Maklakov AA. Multivariate intralocus sexual conflict in seed beetles. Evolution. 2014;68:3457–69. pmid:25213393
  12. 12. Roberts RB, Ser JR, Kocher TD. Sexual conflict resolved by invasion of a novel sex determiner in Lake Malawi cichlid fishes. Science. 2009;326:998–1001. pmid:19797625
  13. 13. Barson NJ, Aykanat T, Hindar K, Baranski M, Bolstad GH, Fiske P, et al. Sex-dependent dominance at a single locus maintains variation in age at maturity in salmon. Nature. 2015;528:405–8. pmid:26536110
  14. 14. Delph LF, Andicoechea J, Steven JC, Herlihy CR, Scarpino S V., Bell DL. Environment-dependent intralocus sexual conflict in a dioecious plant. New Phytol. 2011;192:542–52. pmid:21726233
  15. 15. Connallon T, Hall MD. Genetic constraints on adaptation: a theoretical primer for the genomics era. Ann N Y Acad Sci. 2018;1422:65–87. pmid:29363779
  16. 16. Kidwell JF, Clegg MT, Stewart FM, Prout T. Regions of stable equilibria for models of differential selection in the two sexes under random mating. Genetics. 1977;85:171–83. pmid:838269
  17. 17. Innocenti P, Morrow EH. The sexually antagonistic genes of Drosophila melanogaster. PLoS Biol. 2010;8(3):e1000335. pmid:20305719
  18. 18. Lucotte EA, Laurent R, Heyer E, Ségurel L, Toupance B. Detection of allelic frequency differences between the sexes in humans: a signature of sexually antagonistic selection. Genome Biol Evol. 2016;8:1489–500. pmid:27189992
  19. 19. Chippindale AK, Gibson JR, Rice WR. Negative genetic correlation for adult fitness between sexes reveals ontogenetic conflict in Drosophila. Proc Natl Acad Sci U S A. 2001;98:1671–5. pmid:11172009
  20. 20. Rice WR, Linder JE, Friberg U, Lew TA, Morrow EH, Stewart AD. Inter-locus antagonistic coevolution as an engine of speciation: assessment with hemiclonal analysis. Proc Natl Acad Sci U S A. 2005;102:6527–34. pmid:15851669
  21. 21. Gilks WP, Pennell TM, Flis I, Webster MT, Morrow EH. Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample. F1000Research. 2016;5:e2644.
  22. 22. Abbott JK, Morrow EH. Obtaining snapshots of genetic variation using hemiclonal analysis. Trends Ecol Evol. 2011;26:359–68. pmid:21507504
  23. 23. Rice WR, Linder JE, Friberg U, Lew TA, Morrow EH, Stewart AD. Inter-locus antagonistic coevolution as an engine of speciation: assessment with hemiclonal analysis. Proc Natl Acad Sci U S A. 2005;102:6527–34. pmid:15851669
  24. 24. Long TAF, Miller PM, Stewart AD, Rice WR. Estimating the heritability of female lifetime fecundity in a locally adapted Drosophila melanogaster population. J Evol Biol. 2009;22:637–43. pmid:19210593
  25. 25. Pischedda A, Chippindale AK. Intralocus sexual conflict diminishes the benefits of sexual selection. PLoS Biol. 2006;4(11):e356.
  26. 26. Collet JM, Fuentes S, Hesketh J, Hill MS, Innocenti P, Morrow EH, et al. Rapid evolution of the intersexual genetic correlation for fitness in Drosophila melanogaster. Evolution. 2016;70:781–95. pmid:27077679
  27. 27. Berger D, Grieshop K, Lind MI, Goenaga J, Maklakov AA, Arnqvist G. Intralocus sexual conflict and environmental stress. Evolution. 2014;68:2184–96. pmid:24766035
  28. 28. Grieshop K, Arnqvist G. Sex-specific dominance reversal of genetic variation for fitness. PLoS Biol. 2018;16(12):e2006810. pmid:30533008
  29. 29. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518:197–206. pmid:25673413
  30. 30. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–63. pmid:20548291
  31. 31. Speed D, Cai N, Johnson MR, Nejentsev S, Balding DJ. Reevaluation of SNP heritability in complex human traits. Nat Genet. 2017;49:986–92. pmid:28530675
  32. 32. Yang J, Manolio TA, Pasquale LR, Boerwinkle E, Caporaso N, Cunningham JM, et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat Genet. 2011;43:519–25. pmid:21552263
  33. 33. Patten MM, Haig D. Maintenance or loss of genetic variation under sexual and parental antagonism at a sex-linked locus. Evolution. 2009;63:2888–95. pmid:19573084
  34. 34. Fry JD. The genomic location of sexually antagonistic variation: Some cautionary comments. Evolution. 2010;64:1510–6. pmid:19922443
  35. 35. Langley CH, Stevens K, Cardeno C, Lee YCG, Schrider DR, Pool JE, et al. Genomic variation in natural populations of Drosophila melanogaster. Genetics. 2012;192:533–98. pmid:22673804
  36. 36. Dutoit L, Mugal CF, Bolívar P, Wang M, Nadachowska-Brzyska K, Smeds L, et al. Sex-biased gene expression, sexual antagonism and levels of genetic diversity in the collared flycatcher (Ficedula albicollis) genome. Mol Ecol. 2018;27:3572–81. pmid:30055065
  37. 37. Griffin RM, Dean R, Grace JL, Rydén P, Friberg U. The shared genome is a pervasive constraint on the evolution of sex-biased gene expression. Mol Biol Evol. 2013;30:2168–76. pmid:23813981
  38. 38. Gnad F, Parsch J. Sebida: a database for the functional and evolutionary analysis of genes with sex-biased expression. Bioinformatics. 2006;22:2577–9. pmid:16882652
  39. 39. Cheng C, Kirkpatrick M. Sex-specific selection and sex-biased gene expression in humans and flies. PLoS Genet. 2016;12(9):e1006170. pmid:27658217
  40. 40. Rowe L, Chenoweth SF, Agrawal AF. The genomics of sexual conflict. Am Nat. 2018;192:274–86. pmid:30016158
  41. 41. Paaby AB, Rockman MV. The many faces of pleiotropy. Trends Genet. 2013;29:66–73. pmid:23140989
  42. 42. Mank JE, Hultin‐Rosenberg L, Zwahlen M, Ellegren H. Pleiotropic constraint hampers the resolution of sexual antagonism in vertebrate gene expression. Am Nat. 2008;171:35–43. pmid:18171149
  43. 43. Yanai I, Benjamin H, Shmoish M, Chalifa-Caspi V, Shklar M, Ophir R, et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics. 2005;21:650–9. pmid:15388519
  44. 44. He X, Zhang J. Toward a molecular understanding of pleiotropy. Genetics. 2006;173:1885–91. pmid:16702416
  45. 45. Mullon C, Pomiankowski A, Reuter M. The effects of selection and genetic drift on the genomic distribution of sexually antagonistic alleles. Evolution. 2012;66:3743–53. pmid:23206133
  46. 46. Connallon T, Clark AG. Balancing selection in species with separate sexes: Insights from Fisher’s geometric model. Genetics. 2014;197:991–1006. pmid:24812306
  47. 47. Connallon T, Clark AG. A general population genetic framework for antagonistic selection that accounts for demography and recurrent mutation. Genetics. 2012;190:1477–89. pmid:22298707
  48. 48. Mackay TFC, Richards S, Stone EA, Barbadilla A, Ayroles JF, Zhu D, et al. The Drosophila melanogaster Genetic Reference Panel. Nature. 2012;482:173–8. pmid:22318601
  49. 49. Huang W, Massouras A, Inoue Y, Peiffer J, Ràmia M, Tarone AM, et al. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res. 2014;24:1193–208. pmid:24714809
  50. 50. Duchen P, Živković D, Hutter S, Stephan W, Laurent S. Demographic inference reveals African and European admixture in the North American Drosophila melanogaster population. Genetics. 2013;193:291–301. pmid:23150605
  51. 51. Lack JB, Cardeno CM, Crepeau MW, Taylor W, Corbett-Detig RB, Stevens KA, et al. The Drosophila Genome Nexus: A population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population. Genetics. 2015;199:1229–41. pmid:25631317
  52. 52. Lack JB, Lange JD, Tang AD, Corbett-Detig RB, Pool JE. A thousand fly genomes: An expanded Drosophila Genome Nexus. Mol Biol Evol. 2016;33:3308–13. pmid:27687565
  53. 53. Kelly JK. A test of neutrality based on interlocus associations. Genetics. 1997;146:1197–206. pmid:9215920
  54. 54. Charlesworth B, Nordborg M, Charlesworth D. The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations. Genet Res. 1997;70:155–74. pmid:9449192
  55. 55. Obbard DJ, MacLennan J, Kim KW, Rambaut A, O’Grady PM, Jiggins FM. Estimating divergence dates and substitution rates in the Drosophila phylogeny. Mol Biol Evol. 2012;29:3459–73. pmid:22683811
  56. 56. Signor SA, New FN, Nuzhdin S. A large panel of Drosophila simulans reveals an abundance of common variants. Genome Biol Evol. 2018;10:189–206. pmid:29228179
  57. 57. Rogers RL, Cridland JM, Shao L, Hu TT, Andolfatto P, Thornton KR. Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans. Mol Biol Evol. 2014;31:1750–66. pmid:24710518
  58. 58. Lachaise D, Cariou M-L, David JR, Lemeunier F, Tsacas L, Ashburner M. Historical biogeography of the Drosophila melanogaster species subgroup. Evol Biol. 1988;22:159–225.
  59. 59. Li H, Stephan W. Inferring the demographic history and rate of adaptive substitution in Drosophila. PLoS Genet. 2006;2(10):e1004775.
  60. 60. Thornton K, Andolfatto P. Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics. 2006;172:1607–19. pmid:16299396
  61. 61. Connallon T, Hall MD. Genetic correlations and sex-specific adaptation in changing environments. Evolution. 2016;70:2186–98. pmid:27477129
  62. 62. Pool JE, Corbett-Detig RB, Sugino RP, Stevens KA, Cardeno CM, Crepeau MW, et al. Population genomics of sub-Saharan Drosophila melanogaster: African diversity and non-African admixture. PLoS Genet. 2012;8:e1003080. pmid:23284287
  63. 63. Bergland AO, Behrman EL, O’Brien KR, Schmidt PS, Petrov DA. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. PLoS Genet. 2014;10(11):e1004775. pmid:25375361
  64. 64. Spencer HG, Priest NK. The evolution of sex-specific dominance in response to sexually antagonistic selection. Am Nat. 2016;187:658–66. pmid:27104997
  65. 65. Connallon T, Clark AG. Sex linkage, sex-specific selection, and the role of recombination in the evolution of sexually dimorphic gene expression. Evolution. 2010;64:3417–42. pmid:20874735
  66. 66. Stewart AD, Pischedda A, Rice WR. Resolving intralocus sexual conflict: genetic mechanisms and time frame. J Hered. 2010;101:94–9.
  67. 67. VanKuren NW, Long M. Gene duplicates resolving sexual conflict rapidly evolved essential gametogenesis functions. Nat Ecol Evol. 2018;2:705–12. pmid:29459709
  68. 68. Wyman MJ, Cutter AD, Rowe L. Gene duplication in the evolution of sexual dimorphism. Evolution. 2012;66:1556–66. pmid:22519790
  69. 69. Wright AE, Fumagalli M, Cooney CR, Bloch NI, Vieira FG, Buechel SD, et al. Sex-biased gene expression resolves sexual conflict through the evolution of sex-specific genetic architecture. Evol Lett. 2018;2:52–61. pmid:30283664
  70. 70. Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nat Genet. 2016;48:709–17. pmid:27182965
  71. 71. Fraser HB. Gene expression drives local adaptation in humans. Genome Res. 2013;23:1089–96. pmid:23539138
  72. 72. Huang W, Richards S, Carbone MA, Zhu D, Anholt RRH, Ayroles JF, et al. Epistasis dominates the genetic architecture of Drosophila quantitative traits. Proc Natl Acad Sci U S A. 2012;109:15553–9. pmid:22949659
  73. 73. Arnqvist G, Vellnow N, Rowe L. The effect of epistasis on sexually antagonistic genetic variation. Proc R Soc B Biol Sci. 2014;281:20140489.
  74. 74. Prasad NG, Bedhomme S, Day T, Chippindale AK. An evolutionary cost of separate genders revealed by male‐limited evolution. Am Nat. 2007;169:29–37. pmid:17206582
  75. 75. Hesketh J, Fowler K, Reuter M. Genetic drift in antagonistic genes leads to divergence in sex-specific fitness between experimental populations of Drosophila melanogaster. Evolution. 2013;67:1503–10. pmid:23617925
  76. 76. Connallon T. Genic capture, sex linkage, and the heritability of fitness. Am Nat. 2010;175:564–76. pmid:20331359
  77. 77. Cowley DE, Atchley WR. Quantitative genetics of Drosophila melanogaster. II. Heritabilities and genetic correlations between sexes for head and thorax traits. Genetics. 1988;119:421–33. pmid:17246429
  78. 78. Ayroles JF, Carbone MA, Stone E a, Jordan KW, Lyman RF, Magwire MM, et al. Systems of complex genetics in Drosophila melanogaster. Nat Genet. 2009;41:299–307. pmid:19234471
  79. 79. Charlesworth B. Causes of natural variation in fitness: evidence from studies of Drosophila populations. Proc Natl Acad Sci U S A. 2015;112:1662–1629. pmid:25572964
  80. 80. Leffler EM, Bullaughey K, Matute DR, Meyer WK, Ségurel L, Venkat A, et al. Revisiting an old riddle: What determines genetic diversity levels within species? PLoS Biol. 2012;10(9):e1001388. pmid:22984349
  81. 81. Charlesworth D. Balancing selection and its effects on sequences in nearby genome regions. PLoS Genet. 2006;2(4):e64. pmid:16683038
  82. 82. Asthana S, Schmidt S, Sunyaev S. A limited role for balancing selection. Trends Genet. 2005;21:30–2. pmid:15680511
  83. 83. Kelly JK, Willis JH. Deleterious mutations and genetic variation for flower size in Mimulus guttatus. Evolution. 2001;55:937–42. pmid:11430654
  84. 84. Poissant J, Wilson AJ, Coltman DW. Sex-specific genetic variance and the evolution of sexual dimorphism: A systematic review of cross-sex genetic correlations. Evolution. 2010;64:97–107. pmid:19659596
  85. 85. Connallon T, Clark AG. Evolutionary inevitability of sexual antagonism. Proc R Soc B Biol Sci. 2014;281:2013–123.
  86. 86. Connallon T, Clark AG. The resolution of sexual antagonism by gene duplication. Genetics. 2011;187:919–37. pmid:21220356
  87. 87. Ruzicka F (2019). Data from: Genome-wide sexually antagonistic variants reveal longstanding constraints on sexual dimorphism in fruit flies. Zenodo digital repository. Openly available via and
  88. 88. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8. pmid:21653522
  89. 89. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75. pmid:17701901
  90. 90. Speed D, Hemani G, Johnson MR, Balding DJ. Improved heritability estimation from genome-wide SNPs. Am J Hum Genet. 2012;91:1011–21. pmid:23217325
  91. 91. Hadfield JD. MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Softw. 2010;33:1–22.
  92. 92. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, et al. Common SNPs explain a large proportion of heritability for human height. Nat Genet. 2010;42:565–9. pmid:20562875
  93. 93. Astle W, Balding DJ. Population structure and cryptic relatedness in genetic association studies. Stat Sci. 2009;24:451–71.
  94. 94. Yang J, Weedon MN, Purcell S, Lettre G, Estrada K, Willer CJ, et al. Genomic inflation factors under polygenic inheritance. Eur J Hum Genet. 2011;19:807–12. pmid:21407268
  95. 95. Aulchenko YS, Stephan R, Isaacs A, van Duijn CM. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007;23:1294–6. pmid:17384015
  96. 96. Nicod J, Davies RW, Cai N, Hassett C, Goodstadt L, Cosgrove C, et al. Genome-wide association of multiple complex traits in outbred mice by ultra-low-coverage sequencing. Nat Genet. 2016;48:912–8. pmid:27376238
  97. 97. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995;57:289–300.
  98. 98. Casper J, Zweig AS, Villarreal C, Tyner C, Speir ML, Rosenbloom KR, et al. The UCSC Genome Browser database: 2018 update. Nucleic Acids Res. 2018;46:D762–9. pmid:29106570
  99. 99. McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26:2069–70. pmid:20562413
  100. 100. Cabrera CP, Navarro P, Huffman JE, Wright AF, Hayward C, Campbell H, et al. Uncovering networks from genome-wide association studies via circular genomic permutation. G3 Genes, Genomes, Genet. 2012;2:1067–75.
  101. 101. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009;10:48. pmid:19192299
  102. 102. Chintapalli VR, Wang J, Dow JAT. Using FlyAtlas to identify better Drosophila melanogaster models of human disease. Nat Genet. 2007;39:715–20. pmid:17534367
  103. 103. Gramates LS, Marygold SJ, Dos Santos G, Urbano JM, Antonazzo G, Matthews BB, et al. FlyBase at 25: Looking to the future. Nucleic Acids Res. 2017;45:D663–71. pmid:27799470
  104. 104. Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T, Keane JA, et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genomics. 2016;2:e000056.
  105. 105. Elyashiv E, Sattath S, Hu TT, Strutsovsky A, McVicker G, Andolfatto P, et al. A genomic map of the effects of linked selection in Drosophila. PLoS Genet. 2016;12(8):e1006130. pmid:27536991
  106. 106. Pfeifer B, Wittelsburger U, Ramos-Onsins SE, Lercher MJ. PopGenome: An efficient swiss army knife for population genomic analyses in R. Mol Biol Evol. 2014;31:1929–36. pmid:24739305
  107. 107. Comeron JM, Ratnappan R, Bailin S. The many landscapes of recombination in Drosophila melanogaster. PLoS Genet. 2012;8:e1002905. pmid:23071443
  108. 108. Cruickshank TE, Hahn MW. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol Ecol. 2014;23:3133–57. pmid:24845075
  109. 109. Darling AE, Mau B, Perna NT. Progressivemauve: Multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE. 2010;5(6):e11147. pmid:20593022
  110. 110. RStudio Team. RStudio: Integrated Development for R. RStudio [Internet]. RStudio, Inc., Boston, MA. 2015. Available from: