Extraordinarily wide genomic impact of a selective sweep associated with the evolution of sex ratio distorter suppression

Symbionts that distort their host’s sex ratio by favouring the production and survival of females are common in arthropods. Their presence produces intense Fisherian selection to return the sex ratio to parity, typified by the rapid spread of host ‘suppressor’ loci that restore male survival/development. In this study, we investigated the genomic impact of a selective event of this kind in the butterfly Hypolimnas bolina. Through linkage mapping we first identified a genomic region that was necessary for males to survive Wolbachia-induced killing. We then investigated the genomic impact of the rapid spread of suppression that converted the Samoan population of this butterfly from a 100:1 female-biased sex ratio in 2001, to a 1:1 sex ratio by 2006. Models of this process revealed the potential for a chromosome-wide selective sweep. To measure the impact directly, the pattern of genetic variation before and after the episode of selection was compared. Significant changes in allele frequencies were observed over a 25cM region surrounding the suppressor locus, alongside generation of linkage disequilibrium. The presence of novel allelic variants in 2006 suggests that the suppressor was introduced via immigration rather than through de novo mutation. In addition, further sampling in 2010 indicated that many of the introduced variants were lost or had reduced in frequency since 2006. We hypothesise that this loss may have resulted from a period of purifying selection - removing deleterious material that introgressed during the initial sweep. Our observations of the impact of suppression of sex ratio distorting activity reveal an extraordinarily wide genomic imprint, reflecting its status as one of the strongest selective forces in nature.

Symbionts that distort their host's sex ratio by favouring the production and survival of females are 23 common in arthropods. Their presence produces intense Fisherian selection to return the sex ratio to 24 parity, typified by the rapid spread of host 'suppressor' loci that restore male survival/development. 25 In this study, we investigated the genomic impact of a selective event of this kind in the butterfly 26 Hypolimnas bolina. Through linkage mapping we first identified a genomic region that was 27 necessary for males to survive Wolbachia-induced killing. We then investigated the genomic impact 28 of the rapid spread of suppression that converted the Samoan population of this butterfly from a 29 100:1 female-biased sex ratio in 2001, to a 1:1 sex ratio by 2006. Models of this process revealed 30 the potential for a chromosome-wide selective sweep. To measure the impact directly, the pattern of 31 genetic variation before and after the episode of selection was compared. Significant changes in 32 allele frequencies were observed over a 25cM region surrounding the suppressor locus, alongside 33 generation of linkage disequilibrium. The presence of novel allelic variants in 2006 suggests that 34 the suppressor was introduced via immigration rather than through de novo mutation. In addition, 35 further sampling in 2010 indicated that many of the introduced variants were lost or had reduced in 36 frequency since 2006. We hypothesise that this loss may have resulted from a period of purifying 37 selection -removing deleterious material that introgressed during the initial sweep. Our 38 observations of the impact of suppression of sex ratio distorting activity reveal an extraordinarily 39 wide genomic imprint, reflecting its status as one of the strongest selective forces in nature. 40 41 42 Introduction 57 In 1930, Fisher noted that the strength of selection on the sex ratio was frequency dependent [1]. As 58 a well-mixed outbreeding population progressively deviates from a 1:1 sex ratio, selection on 59 individuals to restore the sex ratio to parity becomes stronger. In natural populations, a principle 60 cause of population sex ratio skew is the presence of sex ratio distorting elements, in the form of 61 either sex chromosome meiotic drive [2], or cytoplasmic symbionts [3]. In some cases, these 62 elements can reach very high prevalence, distorting population sex ratios to as much as 100 females 63 per male [4], and producing intense selection for restoration of individual sex ratio to 1 female per 64 male. The most common consequence of this selection pressure is the evolution of systems of 65 suppression -host genetic variants that prevent the sex ratio distorting activity from occurring. Female H. bolina can carry a maternally inherited Wolbachia symbiont, wBol1, which kills male 72 hosts as embryos [7]. The species also carries an uncharacterised dominant zygotically acting 73 suppression system that allows males to survive infection [5]. Written records and analysis of 74 museum specimens indicate this symbiont was historically present, and active as a male-killer, 75 across much of the species range, from Hong Kong and Borneo through to Fiji, Samoa and parts of 76 French Polynesia [8]. Evidence from museum specimens also indicates that host suppression of 77 male-killing had a very restricted incidence in the late 19 th century, with infected male hosts (the 78 hallmark of suppression) being found in the Philippines but not in other localities tested. By the late 79 20 th Century, suppression of male-killing was found throughout SE Asia, but not in Polynesian 80 populations where the male-killing phenotype remained active [9]. The most extreme population 81 was that of Samoa, where 99% of female H. bolina were infected with male-killing Wolbachia, 82 ! 7! developed, all of which followed the presumed pattern of inheritance of the suppressor -that of 135 presence in all 16 sons and half daughters. 136 137 A linkage map for this chromosome was then constructed, with the region within it required for 138 male survival identified by the exclusion of recombinants. This was achieved by examining 139 segregation of alleles from sons of the SE Asia x Moorea cross above that were mated to 140 Wolbachia-infected Moorea (non-suppressor) females (creating a male-informative family). Over 141 300 recombinant daughters were obtained that were used to create a linkage map of the 12 142 suppressor-linked markers. These markers covered a 41cM recombination distance and were 143 syntenic with B. mori (Fig. 2). The suppressor locus was localized to a region within this 144 chromosome by excluding linked loci where the SE Asia derived paternal allele was absent in one 145 or more sons (indicating the genomic region containing the SE Asia allele was not necessary for 146 male survival). Three suppressor-linked alleles, all in the +11 to +12 region, were retained in all 60 147 sons, whereas the 9 markers proximal and distal to these were excluded by the presence of one or 148 more recombinants (Fig. 2). Binomial sampling rejected the null hypothesis of no association 149 between the +11/+12 genomic region and male survival (p=(1/2) 60 ). Thus we posit that the 150 suppressor lies between marker C at +8 (excluded by one recombinant) and marker G at +17 151 (excluded by two recombinants during embryogenesis, such that the presence of infected males (generated through suppressor 180 action) reduces the fitness of uninfected females relative to infected females (who experience no 181 such reduction in offspring viability following mating with infected males). Despite losing its 182 male-killing ability, the induction of CI through infected males allows Wolbachia to remain at or 183 near fixation. In consequence selection on the suppressor is also maintained. Indeed fixation for 184 Wolbachia in males and females is observed after the spread of the suppressor is present [10]. 185

186
The expected dynamics of the suppressor locus in this system is given in Figure 3a, following the 187 trajectory given for r = 0 (zero recombination with the suppressor). We elaborated the model of 188 suppressor spread to quantify expected effects on linked loci under the assumption that the 189 suppressor and linked alleles were cost-free, and that spread was occurring through an unstructured, 190 panmictic population (see Text S1 for full details). We then modified this model to the allele that had 'swept' alongside the suppressor. An expectation for a selective sweep is that 262 allelic distributions will be disturbed by an increase in frequency of one (or sometimes two) alleles 263 initially associated with the target of selection, which would be paralleled in a co-ordinated decline 264 in frequency of the other alleles. Because decline is spread across multiple alleles, the expectation is 265 that a swept allele should be the greatest contributor to heterogeneity between allelic frequency 266 distributions, and thus identifiable by the largest standardized residual in heterogeneity tests. We 267 performed this analysis of residuals at 10 of the loci where heterogeneity between samples was 268 observed (one locus was not suitable for analysis in this manner, as there were only two allelic The pattern of LD also reflected an episode of selection in this genomic region. Our model above 285 predicted that LD between the swept loci and the target of selection (global LD) would exist only in 286 the very early phases of the sweep (10-20 generations), but that the sweep would create multiple 287 local associations between closely linked loci that would be retained over longer periods (local LD) 288 ( Fig. 3c, 3d). In accord with this, there was little evidence of LD between loci in the 2001 pre-289 suppressor sweep sample, but strong associations between variants at closely linked loci (r=0.01-290 0.02) in the 2006 sample (Fig. 6). thus they can be treated as independent (Fig. 6). We believe it is very unlikely that all four alleles 308 existed in the 2001 population but were not sampled. We reason that derivation of alleles selected 309 by correlation to a de novo suppressor mutation would occur in proportion to their frequency in the 310 pre-sweep population. There is an upper confidence limit for the frequency of alleles absent in the 311 2001 sample of 0.0307 (binomial sampling n=96 alleles, 95% confidence interval). Taking the 312 conservative assumption that each absent allele existed at the upper confidence level for frequency, 313 four or more unsampled alleles being present in a sample of 11 as targets of selection occurs with a 314 chance of p=0.00025 (calculated from binomial sampling distribution: 315 from continued recombinational erosion, the extent of LD was reduced (Fig. 6c). In terms of allelic 345 profile, the pattern observed was heterogeneous ( Fig. 4; Fig. 5 has been widely conjectured that the evolution of sex determination systems might occur in 398 response to the presence of sex ratio distorting microbes, and our data indicates that dramatic 399 changes with associated pleiotropic deleterious effects may be nevertheless be favoured if they 400 rescue males. A pressing area for research is to establish the actual nature of the suppressor 401 mutation (e.g. whether it is part of the sex determination cascade), whether there is a cost of 402 carrying a suppressor in the absence of its benefit from rescuing males, and whether any 403 compensatory evolution has occurred where the suppressor has been present for a longer period. 404 405 Beyond the locus under selection, linked variants may also be deleterious. This is one explanation 406 for the gain followed by loss of introgressed material. Our hypothesis is that some introgressed 407 alleles were deleterious on the Samoan genetic background, associated with negative epistasis 408 (Dobzhansky-Muller incompatability) generated during isolation on Samoa. Selection against 409 linked deleterious recessive alleles provides an alternative explanation, although the complete loss 410 observed for some loci suggests that selection against the material is maintained when rare, arguing 411 against recessive effects. Whatever the precise cause of the loss of introgressed material, an 412 important conclusion from our data is that the initial strength of selection would be seriously 413 underestimated from genetic variability data obtained after the final phase of the sweep alone (e.g. 414 2010 data). Thus, for cases of strong selection over a wide region, caution is needed when 415 estimating the strength of selection from post-sweep genetic data alone. 416 417 Finally, our data highlight the profound loss of evolutionary capacity associated with isolation. In 418 Samoa, at least 100 years of extreme sex ratio bias was resolved by a migratory origin of the 419 suppressor that likely involved individual movement over very long distances across the Pacific, 420 rather than in situ mutation. One question that still requires answer is 'why 100 years'. This the suppressor (ss), to create suppressor-heterozygous Wolbachia-infected offspring (Ss) (Fig. 1). 483 F2 males (n = 60) from this cross were analysed using the same 12 markers. Absence of 506 recombinants in a core subset of markers, flanked by markers with an increasing numbers of 507 recombinants, indicated the position of the suppressor locus (Fig. 2). frequency distributions between pairs of time points was estimated using a G test based on allelic 529 frequency distribution. Where allele distributions were heterogeneous, we ascertained the allele 530 whose frequency change made the greatest contribution to heterogeneity as that with the largest 531 standardized residual within the heterogeneity test. This allele was then removed (it was an allele 532 increasing in frequency in each case), and the data retested to ascertain if the population samples 533 were then homogeneous, or whether there was evidence for a second allele that changed in 534 frequency (a second allele was identified in three cases). We additionally used Fst standardized 535 population genetic differentiation to quantify the magnitude of change between allelic frequency 536 distributions between the two samples. In each case, the rare individuals where sequence could not 537 be obtained for particular alleles, or not inferred accurately, were coded as missing information.  Text S1: Modelling the spread of the suppressor

The model
We coded the model in Mathematica.
We model two linked loci, each with two alleles. The first locus can have the wildtype allele s, or the male-killing suppression gene S. The second locus is linked to the first, and has two selectively neutral alternative alleles denoted A and a. The model tracks the change in gametic frequencies from one generation to the next. There are four different basic gamete types: AS, As, aS, and as. Our individuals are diploid, so these four basic gamete types give nine possible basic individual genotypes: AASS, AaSS, aaSS, AASs, AaSs, aaSs, AAss, Aass, and aass.
The population is infected with Wolbachia at a frequency J. The infection is vertically transferred from mothers to offspring with 100% efficiency. Thus all of the offspring of an infected mother are infected, and none of the offspring of an uninfected mother are infected. Since there is this difference between males and females, we have to distinguish between male and female gametes. We also have to add infection status to the gametic genotypes to give a total of sixteen gamete types: four basic types, each from a male or female, and either infected or uninfected. Infection status and sex are also added to the individual genotypes to give thirty-six possible genotypes (the nine genotypes from above can be either infected or uninfected, and male or female). Note that for notational ease we still refer to these as "genotypes" although infection status is not a property of the individual's genome.
The life cycle consists of mating, then selection. The gametic frequencies we begin with give rise to offspring genotypic frequencies. These offspring frequencies undergo selection, and thus result in adult genotypic frequencies. The adults produce gametes, which give us the gametic frequencies for the next generation. Along the way we can record the adult genotypic frequencies and consequently make predictions about the allelic frequencies observed in the real world.
In our model it is not generally the case that there are equal numbers of males and females, because Wolbachia affects the two differently. However, there are always equal numbers of male and female gametes (because each mating involves exactly one male and one female). Therefore the female gamete frequencies sum to one, and so do the male gamete frequencies.
Mating is assumed to be at random, and the mating step in our model consists of the transformation of the gametic frequencies into offspring genotypic frequencies. Given a female gamete at frequency f, and a male gamete at frequency m, the frequency of offspring resulting from combining these two gamete types will be the product of their frequencies fm. An added complication arises due to cytoplasmic incompatibility (CI). This is when mating between infected males and uninfected females result in fewer offspring than other matings. For simplicity, we assume that CI is total, so that fusions between infected male gametes and uninfected female gametes result in no offspring. We then renormalise the offspring frequencies appropriately so that they still sum to one for both males and females.
Once we have the offspring genotypes, selection occurs, transforming the offspring genotypic frequencies into adult genotypic frequencies. In our model, the only element of selection is male-killing: any infected males lacking the S suppressor gene (i.e. those with genotype ss) are killed by the Wolbachia. Males heterozygous at the suppressor locus (i.e. with the genotype Ss) are killed by their infection 50% of the time. Males homozygous for the suppressor (i.e. those with genotype SS) are not killed. For simplicity we assume no other selective effects. Notably this means that we are modelling the situation in which neither the suppressor nor the linked allele impose costs on their bearers.
Because females are unaffected by Wolbachia, there are now more females than males in our population. This will affect the observed allelic frequencies, since the allelic frequencies in males and females will differ (s genes, for example, will be more common in females because they are selectively neutral to a female, while males bearing s genes are more likely to die through male-killing). To account for this fact we renormalise the post-selection genotypic frequencies so that the sum of both male and female genotypic frequencies is one. This gives us the adult genotypic frequencies. From this data we get the model's predictions for observed frequency of A and S alleles.
To complete the generation, we finally transform the adult genotypic frequencies into gametic frequencies. This is trivial in the case of most of the genotypes. However, for genotypes AaSs, things are more complicated. This is because AaSs individuals could have been formed by AS x as crosses, or by As x aS crosses. Denote the probability that a randomly-chosen AaSs individual was formed by an AS x as cross by μ, and the probability of recombination between the two loci of interest by r. Then the frequency of gamete types from infected AaSs individuals is AS: ½(μ(1-r)+(1-μ)r) aS: ½(1-μ(1-r)-(1-μ)r) As: ½(1-μ(1-r)-(1-μ)r) as: ½(μ(1-r)+(1-μ)r). If we denote the probability that a randomly-chosen uninfected AaSs individual was formed by an AS x as cross by ν we can produce similar frequencies. It remains only to find μ and ν. But since we know the previous generation's gamete frequencies this is trivial.
In the initial population of size N we suppose that the A and S alleles are absent. With a proportion J of the population infected with Wolbachia, the gamete types and frequencies in the population are therefore infected female as (frequency J), uninfected female as (frequency 1 -J), and uninfected male as (frequency 1). At this point the sex ratio is (1 -J)/(2 -J). We then introduce an immigrant of known sex and genotype into the population. To do this, we multiply the male gamete frequencies by N(1 -J)/(2 -J), and the female gamete frequencies by N/(2 -J) to give a "gamete mass" measurement. Then, given the immigrant's genotype we get the probability that it produces a gamete of each type (we assume for simplicity that μ = ν = ¼). We then add the immigrant gamete probabilities to the gamete mass measurement, and renormalise so that female gamete frequencies sum to one, and so do male gamete frequencies. This gives us the initial gamete frequencies. As an example, in the case of a single infected male immigrant of genotype AASS, the initial gametic frequencies are: