Figures
Abstract
Dobzhansky and Muller proposed a general mechanism through which microevolution, the substitution of alleles within populations, can cause the evolution of reproductive isolation between populations and, therefore, macroevolution. As allopatric populations diverge, many combinations of alleles differing between them have not been tested by natural selection and may thus be incompatible. Such genetic incompatibilities often cause low fitness in hybrids between species. Furthermore, the number of incompatibilities grows with the genetic distance between diverging populations. However, what determines the rate and pattern of accumulation of incompatibilities remains unclear. We investigate this question by simulating evolution on holey fitness landscapes on which genetic incompatibilities can be identified unambiguously. We find that genetic incompatibilities accumulate more slowly among genetically robust populations and identify two determinants of the accumulation rate: recombination rate and population size. In large populations with abundant genetic variation, recombination selects for increased genetic robustness and, consequently, incompatibilities accumulate more slowly. In small populations, genetic drift interferes with this process and promotes the accumulation of genetic incompatibilities. Our results suggest a novel mechanism by which genetic drift promotes and recombination hinders speciation.
Author summary
As geographically isolated populations evolve, genetic incompatibilities accumulate between them. Eventually, these incompatibilities may cause the populations to become different species. What determines how quickly species are formed in this way remains unclear. We investigate this question using computer simulations and find that genetic incompatibilities accumulate more slowly among populations with individuals that are robust to genetic perturbations. We identify two factors that influence the accumulation rate via their effects on genetic robustness: the size of the populations and how much recombination takes place in them. Small populations with rare recombination accumulate incompatibilities more quickly.
Citation: Kalirad A, Burch CL, Azevedo RBR (2024) Genetic drift promotes and recombination hinders speciation on holey fitness landscapes. PLoS Genet 20(1): e1011126. https://doi.org/10.1371/journal.pgen.1011126
Editor: Lindi Wahl, University of Western Ontario, CANADA
Received: April 3, 2023; Accepted: January 6, 2024; Published: January 22, 2024
Copyright: © 2024 Kalirad et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All code and data are available at https://github.com/Kalirad/HoleyPopulationSpeciation.
Funding: This work was funded by grants from the National Science Foundation DEB-2014566 to RBRA and DEB-2014943 to CLB (https://www.nsf.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors declare no competing interests.
Introduction
The rates of origination of new species vary extensively across the tree of life. For example, some lineages of cichlids are estimated to produce new species approximately 70× faster than some lineages of hawks [1]. A major challenge for evolutionary biology is to explain this macroevolutionary pattern through the action of microevolutionary processes operating within populations. Stanley [2] speculated that speciation is “a largely random process” and that, therefore, “macroevolution is decoupled from microevolution”. There is growing support for the alternative view that macroevolution and microevolution are, in fact, mechanistically coupled (reviewed in [3, 4]). For example, Jablonski and Roy [5] found that gastropod species with broader geographic ranges had lower speciation rates, consistent with the hypothesis that high dispersal ability both promotes broader geographic ranges and hinders speciation. Here we investigate the potential roles of genetic drift and recombination in determining the rate of speciation.
Speciation results from the build up of reproductive isolation (RI) between populations. Thus, variation in the rates at which RI increases between populations as they diverge should be one of the causes of variation in speciation rates. A simple modeling framework used to study the evolution of RI is the holey fitness landscape [6]. In such a landscape, genotypes are either viable or inviable. Many types of holey fitness landscapes have been proposed. For example, Gavrilets et al. [7, 8] analyzed a model where individuals differing at a threshold number of loci were considered incapable of producing viable offspring (i.e., were reproductively isolated).
Dobzhansky and Muller proposed that, as allopatric populations diverge, genetic incompatibilities accumulate between them and generate postzygotic RI [9, 10]. Such Dobzhansky–Muller incompatibilities (DMIs) have been shown to cause low fitness in hybrids between closely related species [11–13]. A particular kind of holey fitness landscape, also known as a neutral network [14, 15], is especially suitable for modeling the evolution of RI by DMIs. A neutral network is a network of viable genotypes connected by mutational accessibility. Two genotypes are mutationally accessible if one genotype can be obtained from the other through a single mutation. Fig 1C shows a neutral network defined by five diallelic loci (yellow, orange, and purple nodes). Inviable genotypes carry combinations of alleles displaying DMIs (gray nodes in Fig 1C). If two populations evolving in allopatry have diverged at k diallelic loci (Fig 1A), then up to k − 1 divergent alleles from one population will not have been tested by natural selection in the genetic background of the other population (see S1 Text). DMIs can be inferred by introgressing each of these k − 1 divergent alleles from one population into the other and counting the number of incompatible introgressions, IIs (Fig 1B).
(A) Two populations diverge in allopatry. Both populations are initially fixed for lowercase alleles (open circles) at five diallelic loci, A–E. Derived alleles are indicated by uppercase letters (closed circles). The first two substitutions (closed circles) were of alleles A and B in population 1 (left, green); the next three substitutions were of alleles C, D, and E in population 2 (right, blue). (B) Genetic incompatibilities can be inferred by introgressing individual divergent alleles from one population into the other and counting the number of incompatible introgressions (IIs). This is illustrated by the introgression of the divergent alleles from population 1 (ABcde) into population 2 (abCDE) to create the four combinations that have not been tested by natural selection. We ignore the introgression of the e allele because it results in the transitional abCDe genotype in population 2, which is known to be viable (red arrow in (A); see S1 Text). Two alleles from population 1 (A and c) are deleterious in the genetic background of population 2 (grey boxes). We refer to these as IIs. Each II implies the existence of one or more DMIs. (C) Holey fitness landscape of the 32 genotypes at loci A–E. Nodes (circles) represent genotypes. Lines connect genotypes that can be reached by a mutation at a single locus. Each genotype has 5 mutational neighbors. 7 of the 32 genotypes (22%) are inviable, shown in light grey. The genotypes generated by the two IIs shown in (B) are marked by arrows. The remaining 25 genotypes (78%) are viable, shown in colors from yellow to blue, and define a neutral network. The different colors indicate the proportion ν of the mutational neighbors of a genotype that are viable, a measure of mutational robustness. The evolutionary trajectories of the populations shown in (A) are indicated by thick blue and green lines.
Consider the simple holey fitness landscape (neutral network) model introduced by Gavrilets [6, 16, 17] in which genotypes are randomly and independently assigned to be either viable or inviable with probabilities p and 1 − p, respectively—dubbed the Russian roulette model. The value of the p parameter determines the global structure of the neutral network [16]. If p is below a critical value pc, known as the percolation threshold, the neutral network is composed of many small disconnected subnetworks. A subnetwork forms a cluster of genotypes that can evolve from each other by a series of single mutations; evolution is more difficult between subnetworks. In contrast, if p > pc there is one giant subnetwork containing most viable genotypes. If genotypes are haploid and viability is determined by L diallelic loci, the percolation threshold is pc ≈ 1/L.
In such a landscape, the number of IIs is expected to increase linearly with the genetic divergence between populations (k > 0), according to
(1)
where k − 1 is the number of divergent alleles from one population that have not been tested by natural selection in the genetic background of the other population (see S1 Text for more details),
L is the number of diallelic loci affecting viability, and ν is the proportion of the L mutational neighbors of a genotype that are viable, a measure of the genotype’s mutational robustness;
corrects for the fact that one of the mutational neighbors is known to be viable (see S1 Text for more details). Eq 1 indicates that the rate of accumulation of genetic incompatibilities depends on genetic robustness. This makes intuitive sense because introgression is a perturbation of the recipient genotype. Thus, microevolutionary processes that cause genetic robustness to evolve have the potential to influence the macroevolutionary rate of speciation.
If mutation is sufficiently weak that the time between the appearance of new neutral mutations (1/(NU), where N is the population size and U is the genomic mutation rate) is much longer than the time it takes for a neutral mutation to fix (2N), that is, if 2N2U ≪ 1, then evolution can be modelled as a “blind ant” random walk [18] (see Materials and methods). The expected value of ν in a population evolving under the Russian roulette model with Weak Mutation is simply p, the probability that a genotype is viable and explains the accumulation of IIs under this model (Fig 2).
(A) Under the Russian roulette model incompatible introgressions (IIs) accumulate linearly with genetic distance at a rate inversely proportional to the genetic robustness of genotypes, ν. In all simulations, populations evolved under the Weak Mutation regime, fitness was determined by L = 30 diallelic loci, and genotypes were viable with probability p. Plotted values are means of 1,000 replicate simulation runs at each p calculated for k = 0, 1, …12 genetic differences between populations. Genetic distance was measured by D = k/L. In each run, the numbers of IIs and the values of ν of the two populations were averaged at each time step. If there were multiple observations for a given k, they were then averaged. Divergence was allowed to proceed until k = 15 but data for k > 12 were discarded. The lines are the expectations according to Eq 1 with ν = p. (B) We fit the linear regression model in Eq 3 to each replicate simulation when D > 0. Plotted values are mean values of b/L and ν for the 1,000 simulation runs. The gray line shows as a function of ν with ν = p (Eq 1). In both panels, the 95% CIs are hidden by the points.
The relationship between ν and p changes when mutation is not weak. On a holey fitness landscape, all viable genotypes have the same viability but, in the presence of mutation, they do not necessarily all have the same fitness. If organisms reproduce asexually, the fitness of genotype i with robustness νi is proportional to 1 − U(1 − νi). Thus, if there is variation in ν between genotypes and mutation is not weak (i.e., if 2N2U ≳ 1), then the population will experience selection for higher mutational robustness [18] (Fig 3: U → ν). For example, under the Russian roulette model, genotypes show a variance in ν of Var[ν] = p(1 − p)/L. Typically, recombination further strengthens selection for mutational robustness [19–21] (Fig 3: r → ν). The ability of populations to respond to this selection depends on factors that determine the relative strengths of selection on genetic robustness and genetic drift: mutation rate, recombination rate, and population size (Fig 3: U, r, N → ν) [22].
Ellipses contain the following properties of populations: U, genomic mutation rate; N, population size; r, recombination probability or rate; π, sequence diversity; ν, genetic robustness; IIW and IIB, genetic incompatibilities within and between populations, respectively. Red lines ending in arrows indicate positive effects; blue lines ending in bars indicate negative effects. The thickness of the line indicates the strength of the effect.
Here we use the Russian roulette model to investigate whether and why populations differ in the rate at which they accumulate genetic incompatibilities. We then validate our results using a computational model of RNA secondary structure that includes intrinsic differences in fitness between viable genotypes. We chose the folding of RNA sequences as a simple but nonrandom model of a genotype-phenotype map grounded in biophysical reality [23]. Crucially, this map naturally incorporates epistasis and has been used to study its evolutionary consequences (e.g., for robustness [18, 24], evolvability [25, 26], and molecular evolution [27]). Because introgression is a perturbation of the genotype, we predict that the probability that an introgression is incompatible will be inversely related to the recipient’s genetic robustness, its ability to maintain viability in the face of perturbations to the genotype. We test this prediction by experimentally manipulating population genetic parameters known to promote the evolution of genetic robustness.
Materials and methods
Experimental design
To test our prediction that the rate at which IIs accumulate between populations will be inversely related to their genetic robustness, we first experimentally evolved populations under combinations of population sizes and recombination rates expected to result in different equilibrium levels of genetic robustness. Populations evolved under an individual-based Wright-Fisher model with a selection-recombination-mutation life cycle, with a constant population size (between N = 10 and N = 2000) and recombination rate (between no recombination and free recombination). Our use of the Wright-Fisher model differs from most theoretical research into the accumulation of DMIs [28–31] in that it does not impose a Weak Mutation evolutionary regime. Our implementation of the Wright-Fisher model ensures that multiple mutations segregate simultaneously (2N2U ≳ 1 in all of our experimental populations). This difference in approach has two important consequences for this work. It enables (1) recombination and (2) selection for genetic robustness, neither of which are effective under the Weak Mutation regime [18].
Once these populations reached an approximate mutation-recombination-selection- drift equilibrium, we made exact copies of each population to obtain pairs of identical ancestral populations. These population pairs were then allowed to diverge allopatrically, again evolving under a Wright-Fisher model with a selection-recombination-mutation life cycle and using the same population size and recombination rate used to generate the ancestral populations. We then investigated the effect that the differences in genetic robustness among the ancestral populations had on the rate at which genetic incompatibilities accumulated between the diverging population pairs.
Genotype and phenotype
Russian roulette model: The genotype is a binary sequence of length L. In all simulations summarized in this paper we used L = 30 diallelic sites and assumed that genotypes are haploid. In all simulations reported in this paper we set p ≫ pc ≈ 0.033, the percolation threshold.
RNA folding model: The genotype is an RNA sequence of length L. In all simulations summarized in this paper we used L = 100 nucleotides and assumed that genotypes are haploid. The phenotype associated with a particular genotype is its minimum free-energy secondary structure, as determined using the ViennaRNA package version 2.5.1 with default parameters [32].
Selection
Russian roulette model: A genotype is randomly assigned to be either viable or inviable with probabilities p or 1 − p, respectively [6]. This defines a holey fitness landscape [6] or neutral network [14, 15].
RNA folding model: The fitness of a sequence i is defined as
(2)
where βi is the number of base pairs in the secondary structure of sequence i, δi is the base-pair distance between the secondary structure of sequence i and a reference secondary structure, σ is the strength of selection against δi, and α is an arbitrary threshold. This means that the population experiences direct selection to resemble the reference (optimal) structure and indirect selection to stay away from the holes (defined by α). In all simulations summarized in the main text of this paper we used σ = 0.025 and α = 12.
Kalirad and Azevedo [31] investigated an RNA folding model without intrinsic differences in fitness between genotypes (σ = 0) in the Weak Mutation regime. They found that changes to α had only small effects on the mean genetic robustness of genotypes [31]. Genetic robustness in the RNA folding model does not depend on σ. We present the results of simulations on the model with σ = 0 in S1 and S2 Figs.
Recombination
From a population of size N, we randomly sample with replacement two sets of N viable genotypes, S1 and S2. Genotypes from S1 and S2 are paired, such that the ith genotypes from each set recombine.
Free recombination: In the free recombination regime, every offspring is recombinant. Crossovers occur independently with probability 0.5 at each of the L − 1 positions between sites in the genome.
Limited recombination: In the limited (not free) recombination regime, offspring are recombinant with probability r resulting from a single crossover located at random at one of the L − 1 positions between sites in the genome.
No recombination: If there is no recombination, the offspring is a copy of a randomly selected parent.
Mutation
Mutations arise according to a binomial process where each site mutates with probability equal to the per-site mutation rate μ in each generation. All types of point mutations occur with equal probability. Insertions and deletions are not considered. In most simulations summarized in this paper we used a genome-wide mutation rate of U = μL = 0.1.
Ancestor
Russian roulette model: We generate a random viable genotype with L = 30 and define it as the ancestor.
RNA folding model: We generate a random RNA sequence with L = 100 nucleotides and define its minimum free-energy secondary structure as the reference, provided it is viable (βi > α in Eq 2). We define this sequence as the ancestor.
Evolution
Weak Mutation simulations: Evolution is modelled as a “blind ant” random walk [18]. The population is represented by a single genotype. At the next time step, one of its n mutational neighbors is chosen at random. If the mutant is viable, the population “moves” to the mutant genotype; otherwise, the population remains at the current genotype for another time step. The genetic divergence between populations, k, can decrease as well as increase.
Wright-Fisher simulations: Initially, the ancestor is cloned N times to create a population. We allow this population to evolve for as long as it takes to reach an approximate mutation-recombination-selection-drift equilibrium for mutational robustness and average sequence diversity (evaluated based on the average of 200 replicate populations). We then make an exact copy of the resulting population to obtain two identical ancestral populations. These populations are allowed to evolve under a Wright-Fisher model following a selection-recombination-mutation life cycle for as long as it takes to reach a predetermined level of genetic distance (measured using Jost’s D [33]).
Incompatible introgressions
Two viable genotypes, i and j, differ at k sites. The number of incompatible introgressions [31] (IIs) from genotype i to genotype j, IIij, was measured by introgressing each allele at a divergent site from genotype i to genotype j, one at a time, and counting the number of inviable introgressed genotypes (w = 0, see Eq 2). IIij is not necessarily equal to IIji. As populations diverged, we periodically estimated IIij in a particular direction (e.g., from population 1 to population 2; Fig 3: IIB) from 100 (or N when N < 100) randomly chosen pairs of individuals from the two populations, sampled without replacement.
Rate of II accumulation
We fit the following linear regression model to each replicate simulation:
(3)
where D is the genetic distance between populations (k/L in Fig 2A or Jost’s D [33] in Figs 4A and 5A, etc). The II counts at the start of divergence (D = 0) were not considered in the regressions (see Eq 1). The rates of II accumulation summarized in Figs 2B, 4B and 5B, etc, were calculated by averaging estimates of b from individual replicate runs.
(A) Incompatible introgressions (IIs) accumulated linearly with genetic distance in sexual populations of different sizes (N) evolving under the Russian roulette model. Genetic distance between divergent populations was measured by Jost’s D [33]. In all simulations, the proportion of viable genotypes over the entire holey fitness landscape was p = 0.5, the genome length was L = 30 sites, and the mutation rate was U = 0.1 per genome per generation; populations experienced random mating and free recombination. (B) IIs accumulated faster in smaller populations because they evolved lower genetic robustness, ν. We fit the linear regression model in Eq 3 to each replicate simulation when D > 0. Plotted values are mean values of b/L and ν. The gray line shows b/L = 1 − ν (see Eq 1). (C) Large populations diverged more slowly. Values show the rate of increase in D per generation. The gray line shows a power law with exponent −1.6. Plotted values in all panels are means of 200 replicate simulation runs at each N. Error bands in (A) and bars in (B) are 95% CIs (in (C) they are hidden by the points).
(A) IIs accumulated linearly with D in sexual populations of different sizes (N). In all simulations, populations evolved under the Russian roulette model (with p = 0.5 and L = 30) and experienced random mating and free recombination. U was changed such that NU = 10. (B) IIs accumulated faster in smaller populations because they evolved lower ν. The gray line shows b/L = 1 − ν (see Eq 1). (C) Smaller populations accumulated more segregating IIs and sequence diversity (defined as the within-population heterozygosity [42]). (D) Large populations diverged more slowly. The gray line shows a power law with exponent −1.9. Plotted values in all panels are means of 200 replicate simulation runs at each N. Error bands in (A) and bars in (B) and (C) are 95% CIs (in (D) they are hidden by the points). See Fig 4 for more details.
Segregating IIs
We also counted IIs within populations as they diverged (Fig 3: IIW). We used the same approach as for IIs between populations, except that the 100 (or N when N < 100) pairs of individuals were sampled randomly without replacement from the same population.
Genetic robustness
The mutational robustness of a genotype was measured as the proportion ν of its single mutations that did not render the genotype inviable (Fig 1C). The robustness of each replicate population was estimated as the average ν of 100 individuals sampled at random without replacement (or N individuals when N < 100).
Reproductive isolation
The level of postzygotic reproductive isolation (RI) between two populations is defined as:
where wi is the fitness of a genotype resulting from recombination between the ith genotypes of each population.
Results
Genetic drift promotes the accumulation of genetic incompatibilities
We allowed sexual populations of different sizes—between 10 and 1000 individuals—to evolve under the Russian roulette model with p = 0.5, U = 0.1, and free recombination until they reached an approximate equilibrium in genetic robustness and sequence diversity. We then made a copy of each resulting population and allowed each pair of populations to diverge in allopatry. Both populations in each pair experienced the same selection regime (see Materials and methods)—a mutation-order speciation scenario [41]. Fig 4 shows the pattern of accumulation of IIs following the population split. Increasing population size caused the accumulation of IIs to slow down (Fig 4A). Fitting the linear regression model in Eq 3 to the data at each population size revealed a reduction of the slope b with population size (Fig 4B). For example, populations of N = 500 accumulated IIs at ∼75% of the rate of populations of N = 10 individuals.
In larger populations genetic drift is expected to be weaker allowing selection for genetic robustness to operate more efficiently (Fig 3: N → ν). Indeed, larger populations evolved higher genetic robustness, ν, before populations began diverging from each other (Fig 4B). The level of genetic robustness in diverging populations explained the variation in the rate of II accumulation in populations of different sizes: b/L ≈ 1 − ν (see Eq 1; Fig 4B, gray line).
Population size had another effect on the rate of speciation: larger populations diverged more slowly from each other (Fig 4D). The rate of increase in D per generation was approximately proportional to 1/N2. Populations of N = 1000 individuals diverged even more slowly and plateaued at a genetic distance of D = 0.07±0.016 (mean and 95% CI based on 100 simulations over 2 × 105 generations). In the large population limit (N → ∞) both populations will be exactly identical and will never diverge (D = 0). As a result, any IIs between populations will also be IIs segregating within populations. Both effects of N on the rate of speciation act in the same direction.
The results summarized in Fig 4 do not disentangle the effects of genetic drift, N, and mutational supply, NU, because the mutation rate was constant (U = 0.1). Furthermore, high U is also expected to promote the evolution of high genetic robustness [22, 43] (Fig 3: U → ν). Fig 5 shows the results of similar evolutionary simulations where N was changed while keeping a constant NU = 10. The results show that N has an effect on the evolution of genetic robustness independent of mutational supply: increasing N while reducing U selects for higher ν, which leads to slower accumulation of IIs (Fig 5B). This effect appears to saturate when N ≈ 1000. We conclude that genetic drift promotes the accumulation of genetic incompatibilities.
The RNA folding model
One limitation of the Russian roulette model is that it is unrealistic in at least two ways: real fitness landscapes are not completely random and viable genotypes can display intrinsic differences in fitness. To address this limitation we have investigated evolution under a more complex genotype-phenotype map, in which genotypes are RNA sequences and phenotypes are the minimum free-energy secondary structures of those sequences. The fitness of a genotype varies quantitatively with the difference between its phenotype and an optimal phenotype (Eq 2; Fig 6, inset). Genotypes with phenotypes that differ too much from the optimum are inviable. Populations evolving under this model evolve realistic distributions of mutational effects (DFEs). For example, Fig 6 (main) shows the DFE of sexual populations of N = 100 individuals after 1,000 generations of evolution with a mutation rate of U = 0.1 and free recombination. The mean effect of a nonlethal mutation was . This is consistent with estimates from mutation accumulation experiments in a wide variety of organisms [44]. For example, the average mutational effect in 6 species of bacteriophage was
[45]. In the rest of the paper we investigate this model.
(Inset) Fitness landscape. Genotypes are RNA sequences of length L = 100. The fitness of a sequence is defined by the base-pair distance δ between its minimum free-energy secondary structure and a reference structure (Eq 2). A perfect structural match (δ = 0) yields the maximum fitness (w = 1). Fitness declines linearly with δ until a threshold is reached (δ = 12). Larger structural distances (δ > 12) are lethal (w = 0). (Main) The histogram shows the DFE based on the effects of 106 individual single nucleotide substitutions. One hundred replicate populations of N = 100 individuals were allowed to evolve for 1,000 generations under U = 0.1, random mating, and free recombination. Each individual in each population was then subject to 100 random mutations. Mutational effects are defined as s = w′/w − 1, where w′ and w are the fitnesses of the mutant and unmutated genotypes, respectively. The majority of mutations (71.1%) were deleterious (s < 0); 12.3% were lethal (s = −1); 26.5% were neutral (s = 0); 2.4% were beneficial (s > 0). The largest beneficial effect was s = 0.429 but fewer than 0.1% of mutations had effects s > 0.2. The mean effect of a mutation was ; excluding lethals, it was
.
In Fig 7 we repeat the analysis in Fig 4 but with the RNA model. Broadly, the results are consistent with those from the Russian roulette model with two important differences. First, N has a stronger effect on the evolution of genetic robustness in the RNA model (Fig 7B). Genetic robustness at equilibrium was 240% higher at N = 500 than at N = 10 in the RNA model, compared to only 23% higher in the Russian roulette model. These differences in ν cause similarly dramatic differences in the rate of II accumulation (a 780% increase in b/L in the RNA model compared to a 33% increase in the Russian roulette model with the change from N = 500 to N = 10; Fig 7B). Second, the rate of II accumulation (b/L) in the RNA model is lower than the expectation from Eq 1 of 1 − ν. The likely reason is that mutational effects are correlated in similar genotypes in the RNA model, unlike the Russian roulette model. Thus, derived alleles that are viable in the genetic background in which they originated are more likely to also be viable in similar genetic backgrounds independently of genetic robustness.
(A) IIs accumulated approximately linearly with D in sexual populations of different sizes (N). In all simulations, populations evolved under the RNA folding model (with σ = 0.025 and L = 100) and experienced U = 0.1, random mating, and free recombination. (B) IIs accumulated faster in smaller populations because they evolved lower ν. The gray line shows b/L = 1 − ν (see Eq 1). (C) Large populations diverged more slowly. The gray line shows a power law with exponent −0.8. Plotted values in all panels are means of 200 replicate simulation runs at each N. Error bands in (A) and bars in (B) are 95% CIs (in (C) they are hidden by the points). See Fig 4 for more details. See S1 Fig for the results of simulations on the RNA model with σ = 0.
We tested that the genetic incompatibilities that evolved in the RNA model generated postzygotic reproductive isolation (RI) between them by measuring the fitness of hybrids between diverged populations. We found that populations evolved at different population sizes evolved RI proportional to the number of IIs between them. For the data shown in Fig 7 at the end of divergence, adding an II between populations caused an increase in postzygotic RI of approximately 4.5%. The results from the RNA model confirm that genetic drift promotes speciation.
Recombination hinders the accumulation of genetic incompatibilities
Changing the recombination probability in populations of the same size (N = 100) allows us to isolate the causal role of recombination in determining the rate of II accumulation. Populations experiencing stronger recombination both evolved higher genetic robustness and accumulated IIs more slowly (Fig 3: r → ν; Fig 8A and 8B). For example, populations in which all matings experienced one crossover (r = 1) evolved 24% higher robustness and accumulated IIs at 55% the rate of populations in which only 1% of matings experienced one crossover (r = 0.01). Increasing the amount of recombination from one crossover to free-recombination caused a further increase in robustness of 5% and a further reduction in the rate of II accumulation of 18%.
(A) IIs accumulated approximately linearly with D in populations of N = 100 individuals experiencing different recombination probabilities. In all simulations, populations evolved under the RNA folding model (with σ = 0.025 and L = 100) and experienced U = 0.1 and random mating. The values of the recombination probability (r = 0.01, …1) indicate the expected proportion of matings resulting in a recombinant progeny containing one crossover event. Under free recombination, every progeny is a recombinant (r = 1) with an expected 49.5 crossover events. (B) IIs accumulated faster in populations experiencing low recombination probability because they evolved lower ν. The gray line shows b/L = 1 − ν (see Eq 1). (C) Populations experiencing low recombination probability accumulated more segregating IIs. (D) Populations experiencing high r diverged more slowly. Values show the rate of increase in D per generation. Plotted values in all panels are means of 200 replicate simulation runs at each r. Error bands in (A) and bars in (B)–(D) are 95% CIs (some are hidden by the points). See Fig 4 for more details. See S2 Fig for the results of simulations on the RNA model with σ = 0.
The previous results indicate that recombination hinders the accumulation of genetic incompatibilities. But how does recombination do that? To begin to answer that question we consider the accumulation of IIs in the absence of recombination. In sufficiently large populations, genetic variation builds up. Because of epistastic interactions, some of these variants constitute within-population IIs (Fig 3: π → IIW). In the absence of recombination, however, these segregating genetic incompatibilities will only rarely be expressed—and, therefore, be exposed to natural selection—because they can only occur in the same individual through multiple mutations, which are rare. Therefore, cryptic IIs are expected to accumulate within asexual populations.
We hypothesize that the extent to which a mutation is involved in an II within a population is an important determinant of whether that mutation will cause an II between populations if it spreads in the population (Fig 3: IIW → IIB). To test this hypothesis we isolated from each asexual population (r = 0) the first mutation to rise to high frequency and cause an II among populations and the first mutation to rise to high frequency and not cause an II. We then went back to the generations in which these mutations arose, placed them in every individual present in the population at the time, and noted the proportion of these “backcrossed” individuals that remained viable (Fig 9A). As predicted, we found that the proportion of viable backcrossed individuals was much lower for mutations that caused IIs (0.75±0.03, mean and 95% CI; Fig 9B, r = 0 filled circle) than for mutations that did not (0.91±0.02; Fig 9B, r = 0 open circle). In other words, mutations that caused IIs among asexual populations were likely to have started out causing cryptic IIs within populations.
(A) Illustration of the experimental design used to generate data in (B). Two populations of N = 4 individuals diverge in allopatry over the holey fitness landscape shown in Fig 1C. After some time, population 2 (blue) fixes derived allele D. Introgressing D from population 2 to population 1 (arrow I) results in an inviable genotype (ABcDe), rendering 50% of introgressed sequences in population 1 inviable (gray box). Backcrossing D into every genotype present in population 2 (arrow II) when the D allele arose by mutation reveals an inviable genotype (abcDE), and renders 50% of the sequences in that population inviable too (gray box). Thus, D is involved in IIs both within and among populations. (B) We repeated simulations like those summarized in Fig 8 (but including an asexual treatment, r = 0) and isolated the first mutation to occur in a population, rise to a frequency of at least 50%, and be involved in an II with the other population. We then went back to the generation in which the mutation arose and backcrossed it into every individual present within its own population at that time. Panel (B) shows the proportion of viable backcrossed individuals made from populations experiencing different recombination probabilities (colored filled circles). As a control, we did the same calculation for the first mutation to rise to high frequency that does not cause an II among populations (open circles). Black and gray circles show the mean genetic robustness of the populations when the mutations arise and when they reach high frequency, respectively. Values are means of 200 replicate simulation runs at each recombination probability. Error bars are 95% CIs.
We propose that recombination slows down the accumulation of IIs between populations because it suppresses the accumulation of IIs within populations. We found that sexual populations with more genetic variation within populations displayed greater numbers of segregating IIs (Fig 5C). In the presence of recombination, segregating IIs are not cryptic; they are expressed in recombinant individuals. This suppresses the accumulation of IIs within populations in two ways. First, it causes selection against mutations involved in IIs within populations (Fig 3: r ⊣ IIW). Second, it generates recombination load which, in turn, selects for genetic robustness [19, 46–50]. Genetically robust individuals are less likely to express IIs (Fig 3: r → ν ⊣ IIW). We observe this suppression of segregating IIs by recombination in our simulations (Fig 8C) and believe the reduced supply of segregating IIs slows down the accumulation of IIs among populations (Fig 3: IIW → IIB).
We tested this hypothesis by conducting backcross simulations at different recombination rates (Fig 9B). As recombination rate and robustness increased, we observed a higher proportion of viable backcrossed individuals for mutations that caused IIs among populations (Fig 9B, orange filled circles). Thus, in the presence of recombination, even mutations that are ultimately destined to participate in genetic incompatibilities among populations are less likely to result in within-population genetic incompatibilities (Fig 3). That is because recombination selects for genotypes that are more compatible with the genetic backgrounds of other individuals in the population and, therefore, with mutations currently segregating in the population.
Drosophila species with higher levels of neutral polymorphism and higher recombination rate accumulate postzygotic RI more slowly
Our results lead to the prediction that the rate with which postzygotic RI accumulates with genetic distance—RI velocity [51]—should be negatively correlated with both population size and recombination rate. We tested these predictions using published estimates of postzygotic RI velocity in Drosophila species groups [51]. We estimated the level of neutral polymorphism at the Adh locus—a proxy for effective population size—within 38 species belonging to 8 of the species groups and found a significant negative correlation between RI velocity and the average level of neutral polymorphism in a species group (linear regression: R2 = 0.71, P = 0.008; Fig 10B). Correcting for phylogenetic autocorrelation only strengthened the effect (P = 0.002).
(A) Phylogenetic relationships between 8 Drosophila species groups [51] (ananassae, immigrans, melanogaster, montium, obscura, repleta, virilis, willistoni). (B) The rates with which postzygotic RI accumulates with genetic distance in the species groups shown in (A) were estimated by Rabosky and Matute [51] based on the proportion of sterile or inviable F1 offspring in interspecific crosses [52, 53]. Positive and negative values represent faster- and slower-than-average RI velocities. The levels of neutral polymorphism in individual species were measured by the synonymous nucleotide diversity (πs) at the Adh locus (see S1 Table). Values are the mean πs for the species or subspecies in each Drosophila species group. Multiple estimates for a single species were averaged before calculating the overall mean for the species group. The line shows a fit by linear ordinary least-squares regression of RI velocity against πs (OLS: slope and 95% CI = −105±67, R2 = 0.71, P = 0.008). The results of a phylogenetic generalized least-squares regression analysis assuming Brownian motion were almost identical (PGLS: R2 = 0.82, P = 0.002). (C) Residuals in RI velocity from the regression in (B) in the species groups with either shorter or longer total map lengths than average. The point areas in (B) and (C) are proportional to the mean total map length in cM in the species group (see S2 Table). Adding map length to the regression model did not improve fit in a statistically significant manner (multipleregressionOLS: coefficientand 95% CI = −0.0016±0.0021, R2 = 0.84, P = 0.11; PGLS:R2 = 0.89, P = 0.13). None of the traits showed a significant phylogenetic signal based on either the K or λ statistics [54, 55] (see S3 Table).
We also estimated the genome-wide recombination rate of 15 species in the same 8 species groups and found a negative correlation between postzygotic RI velocity and the average recombination rate in a species group, after removing the effect of the level of neutral polymorphism (Fig 10C). Including recombination rate in the linear model increased the proportion of variance in RI velocity explained from R2 = 0.71 to 0.84. However, this improvement was not statistically significant (P = 0.11).
Differences in the level of neutral polymorphism of two species explain asymmetry in postzygotic RI between them
Another prediction from our work is that differences in the size of diverging populations will cause asymmetric accumulation of genetic incompatibilities: the expected number of IIs will be greater in introgressions from a large population to a small population than in the opposite direction. Asymmetric postzygotic RI has been observed between reciprocal crosses of many closely related species—a phenomenon termed Darwin’s corollary to Haldane’s rule [56]. We propose that this kind of asymmetry might be explained by differences in robustness to introgression between the recipient populations. Yukilevich [53] surveyed RI asymmetry between many Drosophila species. We were able to determine the level of neutral polymorphism within both species of 8 species pairs in which he found asymmetric postzygotic RI. The pattern of asymmetry in RI between these species pairs showed a statistically significant trend in the direction we predicted (linear regression through the origin: R2 = 0.52, P = 0.028; Fig 11).
Postzygotic RI asymmetry for the sp1/sp2 pair was estimated [53] by taking the RI score from a ♀ sp1 × ♂ sp2 cross and subtracting the RI score from the reciprocal cross. The x-axis shows the difference in the synonymous nucleotide diversity (πs1 − πs2) at the Adh locus (except the rec/sub pair for which we used the Adhr locus; see S4 Table). The colors of the points match the species groups shown in Fig 10A except that the quinaria group (sister group to immigrans) is represented by orange: montium group: auraria, quadraria, subauraria, triauraria; obscura group: bogotana, pseudoobscura; quinaria group: recens, subquinaria; repleta group: arizonae, mojavensis; willistoni group: equinoxialis, paulistorum, willistoni. The line shows a fit by linear ordinary least-squares regression through the origin (slope and 95% CI = 20.9±17.8, R2 = 0.52, P = 0.028).
Discussion
Rosenzweig [57] noted that “no one denies that population size influences speciation rate. However, the direction of its effect is in doubt.” In his classic neutral theory of biodiversity, Hubbell [58] assumed that all species have the same speciation rate per capita and that, therefore, more abundant species will have higher speciation rates. Similarly, Marzluff and Dial [59] proposed that a broad geographic distribution—a correlate of population size—promotes speciation. In contrast, Stanley [60] hypothesized that species whose populations contract and fragment due to, for example, predation or environmental deterioration, tend to experience high rates of speciation.
Some attempts to study this issue using explicit theoretical models have reached the same conclusion as ours, that low abundance promotes speciation. However, the models they considered differ from ours in important aspects. Mayr [61] proposed that speciation often results from “genetic revolutions” precipitated by founder events. Others have developed similar models [62]. Most of them modeled ecological speciation and did not consider genetic incompatibilities as the basis for RI. Gavrilets and Hastings [63] did consider several DMI scenarios and showed that founder effect speciation is “plausible” under some of them, but did not consider the accumulation of multiple DMIs of arbitrary complexity.
Maya-Lastra and Eaton [64] modelled genetically variable populations evolving in the presence of multiple pairwise DMIs like those envisioned by Orr and, like us, observed that smaller populations accumulated more DMIs. They incorporated recombination in their model but did not investigate its effect. They assumed that mutations involved in DMIs were beneficial but that the DMIs were weak, unlike in our models.
Khatri and Goldstein showed that genetic incompatibilities accumulated more quickly in smaller populations in several models of gene regulation [65–67]. Like our models, their models included strong incompatibilities that create holes in the fitness landscape. Furthermore, the fitness of a viable genotype is a continuous function of the phenotype in their models, like in our RNA folding model. Khatri and Goldstein have argued that their results arise because smaller populations evolve a greater drift load and occupy regions of genotype space closer to the holes in the fitness landscape. We believe that the same mechanism operates in our RNA folding model. However, that is not the case in the Russian roulette model, indicating that additional mechanisms may be at play. One limitation of the Khatri and Goldstein studies is that they considered only the Weak Mutation regime and, therefore, were not able to evaluate the effect of recombination. Furthermore, under Weak Mutation, neither genetic drift nor recombination would have any effect on the rate of accumulation of incompatibilities in our models. Therefore, our results must also involve evolutionary mechanisms not captured in the models studied by Khatri and Goldstein. We believe the ingredient they missed is segregating variation at loci involved in genetic incompatibilities. Such variation has been known to exist in natural populations of several species [68, 69]. Corbett-Detig et al. [69] found evidence that multiple pairwise DMIs are currently segregating within natural populations of D. melanogaster. They surveyed a large panel of recombinant inbred lines (RILs) and found 22 incompatible pairs of alleles at unlinked loci in the RILs; of the 44 alleles, 27 were shared by two or more RILs, indicating that multiple DMIs are polymorphic within natural populations. They also found evidence for multiple DMIs in RIL panels in Arabidopsis and maize [69]. Corbett-Detig et al. [69] did not attempt to identify DMIs among linked loci or higher-order DMIs and therefore are likely to have underestimated the actual number of segregating DMIs in the RILs. These observations suggest that the conditions for the effects reported here may occur in nature.
Not all previous modeling studies agreed with ours. Gavrilets et al. [7] studied the effect of population size on the rate of speciation in a holey fitness landscape model using simulations. Individuals differing at a threshold number, K, of diallelic loci were considered incapable of producing viable offspring. They found that when selection was strong (low K) small population size promoted speciation; in contrast, when selection was weak (high K) large population size promoted speciation. Their model differs from ours in two important ways. First, it does not distinguish between pre- and postzygotic RI [7]. Two genotypes differing at more than K loci might simply be incapable of mating with each other. Second, it does not consider the possibility that pairs of genotypes at a certain genetic distance may display different levels of postzygotic RI, a crucial feature of our models.
Another contrary result was obtained by Orr and Orr [70]. They modeled a population of N individuals divided into d geographically isolated subpopulations of equal size (N/d). They assumed: (1) that the subpopulations evolve according to Orr’s [28] combinatorial model (2) in the Weak Mutation regime; (3) that all DMIs occur between pairs of alleles; (4) that there is a fixed probability, q, that a new substitution in one subpopulation is incompatible with each of the k − 1 divergent alleles in another subpopulation; and (5) that two subpopulations become different species once the cumulative number of DMIs between them reaches some threshold. Orr and Orr [70] calculated the expected number of substitutions, required for speciation to occur. They found that, when the DMIs were caused by neutral substitutions, subpopulation size had no effect on
. In contrast, when the DMIs were caused by beneficial substitutions,
was inversely related to subpopulation size; in other words, larger subpopulation size promoted speciation. Both of these results contradict the findings of our paper. We believe that the discrepancy is explained by the fact that Orr and Orr [70] assumed both Weak Mutation and that the DMI probability q was constant over different levels of population subdivision. These assumptions preclude the possibility that q might evolve because of changes in genetic robustness as in our models.
The results from our study lead to a clear prediction that postzygotic RI should build up more quickly in smaller populations. There have been multiple attempts to test this prediction experimentally by subjecting populations to founder events, sometimes repeatedly. In one of these studies, Ringo et al. [71] subjected 8 experimental populations of D. simulans to repeated bottlenecks of a single pair of individuals every 6 months. Between bottlenecks these “drift” lines were allowed to expand to large size. Another 8 populations were maintained at a larger size and subjected to a variety of artificial selection regimes. The drift lines experienced stronger inbreeding than the “selection” lines. An ancestral “base” population was kept at larger size than either treatment group throughout. Ringo et al. found that after 6 bottleneck/expansion cycles the drift lines evolved stronger postzygotic RI relative to the base population than did the selection lines (28% vs 4% reduction in hybrid fitness), in agreement with our prediction. However, the results from three similar experiments on other species of Drosophila were inconclusive [72–75].
There is also comparative evidence supporting our prediction. Huang et al. [76] tested for correlations between a measure of the level of neutral polymorphism within species—a proxy for effective population size, Ne—and speciation rates estimated from phylogenies in three genera of lichen-forming fungi. They found a significantly negative correlation in one genus, in agreement with our prediction, but no significant correlations in the other two genera. A larger phylogenetic study found significant negative correlations between speciation rates and a different proxy for Ne—geographic range size—in both birds and mammals [64]. These results, while encouraging, have the limitation that they have used overall speciation rate, which may not be driven primarily by postzygotic RI in all lineages. Furthermore, the estimates of speciation rate may themselves be problematic because many different temporal patterns of speciation and extinction rates can have equal probability for a given phylogeny [77, 78]. To address these limitations we tested for a correlation between the level of neutral polymorphism within species (πs) and the rate with which postzygotic RI accumulates with genetic distance in Drosophila [51]. We found a negative correlation (Fig 10B). One potential limitation of this analysis is that we averaged πs across species within a Drosophila species group and some species groups (e.g., the melanogaster group) show considerable variation in πs. We also found that differences in the level of neutral polymorphism of two species explained asymmetry in postzygotic reproductive isolation between them (Fig 11). Note, however, that these comparative correlations could have alternative explanations. For example, reductions in population size might be a consequence of speciation, rather than its cause [79].
As with population size, no one denies that recombination influences speciation rate. But, again, the direction of its effect is disputed. Butlin [80] argued that “speciation can be viewed as the evolution of restrictions on the freedom of recombination”. In general, low recombination is found to promote speciation in the presence of gene flow between diverging populations because it restricts introgression. For example, chromosomal rearrangements often suppress recombination along a large region of the genome. Navarro and Barton [81] showed that such chromosomal rearrangements facilitate the accumulation of DMIs in the presence of gene flow. Interestingly, genetic drift can help maintain genetic differentiation within inversions between populations coming into secondary contact [82]. However, recombination does not always hinder speciation. Bank et al. studied the effect of recombination on the maintenance of a DMI involving two loci in a haploid, with gene flow from a continental to an island population [83]. They showed that recombination can have opposing effects on the maintenance of the DMI depending on the fitnesses of the genotypes at the DMI loci. In some scenarios, the DMI was maintained more easily through selection against immigrants when recombination was low. But in other scenarios, a DMI was maintained more easily through selection against hybrids when recombination was high. Founder effect speciation models have also tended to conclude that high recombination should promote speciation [84]. In contrast to the results from earlier models, we found that recombination within populations can hinder speciation in the absence of gene flow. Our results indicate that recombination suppresses the accumulation of genetic incompatibilities within populations, which in turn slows down their accumulation between populations.
Our results lead to the prediction that recombination in large populations impedes speciation; such populations would maintain extensive genetic variation but even highly divergent genotypes would remain compatible with each other. This prediction is consistent with the observation that recombination between highly divergent genotypes is common in viruses, including influenza A [85, 86], HIV-1 [87, 88], hepatitis B [89] and hepatitis E [90] viruses, rhinoviruses [91, 92], coronaviruses [93–99], and geminiviruses [100]. Such recombination events have often contributed to the emergence of viral diseases. For example, the virus that caused the 1918 influenza pandemic is believed to have arisen when a human virus containing an H1 hemagglutinin acquired an N1 neuraminidase from an avian virus [86].
We also obtained limited comparative evidence for the prediction that recombination slows down the accumulation of postzygotic RI. We found a negative correlation (not statistically significant) between the genome-wide recombination rate and the rate with which postzygotic RI accumulates with genetic distance in Drosophila species groups [51], after removing the effect of the level of neutral polymorphism (Fig 10C).
We found that microevolution and macroevolution were coupled in our models. This coupling is also a feature of our result that genetic drift promotes speciation. In both cases, processes acting within populations to alter genetic robustness explain variation in speciation rates, contrary to Stanley’s [2] contention that speciation is largely random. The size of a population is also predicted to be inversely related to its probability of extinction [101–104]. Thus, if our results are correct, depending on the balance of the effects of population size on speciation and extinction, random genetic drift within species could be a driver of deterministic species selection [3], the process championed by Stanley [2].
We identified a crucial role of genetic robustness in the accumulation of genetic incompatibilities in our models. We showed that high recombination rate selected for genetic robustness in the RNA folding model. This result was obtained previously using simulations on: models of RNA folding [20] and gene networks [19, 48, 50, 105], the Russian roulette model [21], and digital organisms [47]; it was also derived analytically in a modifier model [46]. We also showed that large population size selected for genetic robustness in both the RNA and Russian roulette models. This result is more surprising. Theory and simulations using a quasispecies model [106] and simulations using digital organisms [107] showed that large populations tended to evolve lower genetic robustness. In contrast, simulations using a gene network model showed that large populations evolved higher genetic robustness [105]. The difference may have stemmed from a trade-off between fitness and genetic robustness, that was explicit in the quasispecies model [106] and emergent in the digital organism simulations [107], but did not occur in our models or in the gene network model [105]. The extent to which genetic robustness carries costs in real organisms remains an open question.
Environmental stochasticity selects directly for environmental robustness and indirectly for genetic robustness [22, 24]. Therefore, we predict that temporal variation in environmental conditions should hinder speciation. This prediction is consistent with the observation that speciation rates are higher in tropical lineages than in temperate lineages of amphibians [108], birds [109], and mammals [110]. Furthermore, Yukilevich [111] found that postzygotic RI (but not prezygotic RI) accumulated more than twice as quickly between tropical species pairs of Drosophila than between temperate and subtropical species pairs of Drosophila. However, many other hypotheses have been proposed to explain these patterns [112, 113].
A striking feature of our results is that the simulations in the RNA folding model resulted in approximately linear patterns of II accumulation. Fitting a nonlinear model of the form II = a + b(LD − 1)c to the 9 sets of simulations summarized in Figs 7 and 8 yielded values of the exponent c between 1.19 and 1.47. An exploration of the RNA folding model without intrinsic fitness differences (σ = 0) in the Weak Mutation regime resulted in an exponent of c = 1.35 [31]. These values of c are mostly higher than those predicted under the Russian roulette model (c = 1). This is not surprising because the fitness landscape in the RNA folding model is not random. However, the observed values of c are also lower than predicted by the combinatorial model proposed by Orr [28] according to which the exponent should be c ≳ 2—a pattern he called “snowballing”. We believe that the mismatch is caused by the fact that a central assumption of Orr’s model is violated in the RNA folding model: that DMIs, once they have arisen, should persist indefinitely regardless of future evolutionary change [31].
The extent to which our results can be generalized to real organisms requires further investigation. Our models make several strong assumptions, such as, that strong epistatic interactions are common, haploidy, and that populations of constant size diverge in allopatry under the same selection regime (i.e., mutation-order speciation [41]). The effect of each of these assumptions can be investigated within the framework of our models. Such an exploration has the potential to serve as the foundation for a new theory of the coupling between microevolution and macroevolution.
Supporting information
S1 Fig. Genetic drift promotes the accumulation of genetic incompatibilities in the RNA folding model without intrinsic fitness differences between genotypes.
(A) IIs accumulated approximately linearly with D in sexual populations of different sizes (N). In all simulations, populations evolved under the RNA folding model (with σ = 0 and L = 100) and experienced U = 0.1, random mating, and free recombination. (B) IIs accumulated faster in smaller populations because they evolved lower ν. The gray line shows b/L = 1 − ν (see Eq 1). (C) Large populations diverged more slowly. The gray line shows a power law with exponent −0.5. Plotted values in all panels are means of 200 replicate simulation runs at each N. Error bands in (A) and bars in (B) are 95% CIs (in (C) they are hidden by the points). See Fig 7 for more details.
https://doi.org/10.1371/journal.pgen.1011126.s001
(PDF)
S2 Fig. Recombination hinders the accumulation of genetic incompatibilities in the RNA folding model without intrinsic fitness differences between genotypes.
(A) IIs accumulated approximately linearly with D in populations of N = 100 individuals experiencing different recombination probabilities. In all simulations, populations evolved under the RNA folding model (with σ = 0 and L = 100) and experienced U = 0.1 and random mating. (B) IIs accumulated faster in populations experiencing low recombination probability because they evolved lower ν. The gray line shows b/L = 1 − ν (see Eq 1). (C) Populations experiencing low recombination probability accumulated more segregating IIs. (D) Populations experiencing high r diverged more slowly. Values show the rate of increase in D per generation. Plotted values in all panels are means of 200 replicate simulation runs at each r. Error bands in (A) and bars in (B)–(D) are 95% CIs (some are hidden by the points). See Fig 8 for more details.
https://doi.org/10.1371/journal.pgen.1011126.s002
(PDF)
S3 Fig. Number of divergent alleles from one population that have not been tested by natural selection in the genetic background of another population.
(A) Example of scenario A. Two populations diverge in allopatry. Both populations are initially fixed for lowercase alleles (open circles) at four loci (abcd). Derived alleles are indicated by uppercase letters (closed circles). The donor population undergoes four substitutions, fixing the ABCD genotype. The recipient population does not undergo any substitutions, retaining the ancestral genotype abcd. Arrows indicate introgressions of divergent alleles from the donor population to the recipient population. Three introgressed genotypes have not been tested by natural selection (aBcd, abCd, and abcD; solid arrows) but one has (Abcd, dashed arrow). (B) Example of scenario B. Each population undergoes two substitutions, the recipient population fixing the AbCd genotype and the donor population fixing the aBcD genotype. Three introgressed genotypes have not been tested by natural selection (abCd, ABCd, and AbCD; solid arrows) but one has (Abcd, dashed arrow).
https://doi.org/10.1371/journal.pgen.1011126.s003
(PDF)
S1 Text. Number of divergent alleles from one population that have not been tested by natural selection in the genetic background of another population.
https://doi.org/10.1371/journal.pgen.1011126.s004
(PDF)
S1 Table. Synonymous nucleotide diversity (πs) at the Adh locus for different species of Drosophila.
https://doi.org/10.1371/journal.pgen.1011126.s005
(PDF)
S2 Table. Total recombination map length for different species of Drosophila.
https://doi.org/10.1371/journal.pgen.1011126.s006
(PDF)
S3 Table. Phylogenetic signal for each trait analyzed in Fig 10.
https://doi.org/10.1371/journal.pgen.1011126.s007
(PDF)
S4 Table. Asymmetry in synonymous nucleotide diversity (πs) at the Adh locus and in postzygotic reproductive isolation (RI) for different species of Drosophila.
https://doi.org/10.1371/journal.pgen.1011126.s008
(PDF)
Acknowledgments
The authors thank D. Wiernasz, M. Servedio, and O. Martin for helpful discussions. We used the Opuntia and Sabine clusters from the Hewlett Packard Enterprise Data Science Institute at the University of Houston. We thank the advanced support from the Research Computing Data Core at the University of Houston.
References
- 1. Cooney CR, Thomas GH. Heterogeneous relationships between rates of speciation and body size evolution across vertebrate clades. Nat Ecol Evol. 2021;5(1):101–110. pmid:33106601
- 2. Stanley SM. A theory of evolution above the species level. Proc Natl Acad Sci U S A. 1975;72(2):646–650. pmid:1054846
- 3. Jablonski D. Species selection: theory and data. Annu Rev Ecol Evol Syst. 2008;39:501–524.
- 4. Harvey MG, Singhal S, Rabosky DL. Beyond reproductive isolation: demographic controls on the speciation process. Annu Rev Ecol Evol Syst. 2019;50(1):75–95.
- 5. Jablonski D, Roy K. Geographical range and speciation in fossil and living molluscs. Proc R Soc B. 2003;270(1513):401–406. pmid:12639320
- 6. Gavrilets S. Evolution and speciation on holey adaptive landscapes. Tr Ecol Evol. 1997;12(8):307–312. pmid:21238086
- 7. Gavrilets S, Li H, Vose MD. Rapid parapatric speciation on holey adaptive landscapes. Proc R Soc B. 1998;265(1405):1483–9. pmid:9744103
- 8. Gavrilets S. A dynamical theory of speciation on holey adaptive landscapes. Am Nat. 1999;154(1):1–22. pmid:29587497
- 9.
Dobzhansky T. Genetics and the Origin of Species. New York: Columbia Univ. Press; 1937.
- 10. Muller HJ. Isolating mechanisms, evolution and temperature. Biol Symp. 1942;6:71–125.
- 11. Presgraves DC. The molecular evolutionary basis of species formation. Nat Rev Genet. 2010;11(3):175–180. pmid:20051985
- 12. Rieseberg LH, Blackman BK. Speciation genes in plants. Ann Bot. 2010;106(3):439–455. pmid:20576737
- 13. Maheshwari S, Barbash DA. The genetics of hybrid incompatibilities. Annu Rev Genet. 2011;45(1):331–355. pmid:21910629
- 14. Lipman DJ, Wilbur WJ. Modelling neutral and selective evolution of protein folding. Proc R Soc B. 1991;245(1312):7–11. pmid:1682931
- 15. Schuster P, Fontana W, Stadler PF, Hofacker IL. From sequences to shapes and back: a case study in RNA secondary structures. Proc R Soc B. 1994;255(1344):279–284. pmid:7517565
- 16. Gavrilets S, Gravner J. Percolation on the fitness hypercube and the evolution of reproductive isolation. J Theor Biol. 1997;184(1):51–64. pmid:9039400
- 17.
Gavrilets S. Fitness Landscapes and the Origin of Species. Princeton, N.J.: Princeton Univ. Press; 2004.
- 18. van Nimwegen E, Crutchfield JP, Huynen M. Neutral evolution of mutational robustness. Proc Natl Acad Sci U S A. 1999;96(17):9716–9720. pmid:10449760
- 19. Azevedo RBR, Lohaus R, Srinivasan S, Dang KK, Burch CL. Sexual reproduction selects for robustness and negative epistasis in artificial gene networks. Nature. 2006;440(7080):87–90. pmid:16511495
- 20. Szöllősi GJ, Derényi I. The effect of recombination on the neutral evolution of genetic robustness. Math Biosci. 2008;214(1-2):58–62. pmid:18490032
- 21. Klug A, Park SC, Krug J. Recombination and mutational robustness in neutral fitness landscapes. PLoS Comput Biol. 2019;15(8):e1006884. pmid:31415555
- 22. de Visser JAGM, Hermisson J, Wagner GP, Meyers LA, Bagheri-Chaichian H, Blanchard JL, et al. Perspective: Evolution and detection of genetic robustness. Evolution. 2003;57(9):1959–1972. pmid:14575319
- 23. Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999;288(5):911–940. pmid:10329189
- 24. Ancel LW, Fontana W. Plasticity, evolvability, and modularity in RNA. J Exp Zool (Mol Dev Evol). 2000;288(3):242–283. pmid:11069142
- 25. Wagner A. Robustness and evolvability: a paradox resolved. Proc R Soc B. 2008;275(1630):91–100. pmid:17971325
- 26. Draghi JA, Parsons TL, Wagner GP, Plotkin JB. Mutational robustness can facilitate adaptation. Nature. 2010;463(7279):353–355. pmid:20090752
- 27. Draghi JA, Parsons TL, Plotkin JB. Epistasis increases the rate of conditionally neutral substitution in an adapting population. Genetics. 2011;187(4):1139–1152. pmid:21288876
- 28. Orr HA. The population genetics of speciation: the evolution of hybrid incompatibilities. Genetics. 1995;139(4):1805–1813. pmid:7789779
- 29. Orr HA, Turelli M. The evolution of postzygotic isolation: accumulating Dobzhansky-Muller incompatibilities. Evolution. 2001;55(6):1085–1094. pmid:11475044
- 30. Livingstone K, Olofsson P, Cochran G, Dagilis A, MacPherson K, Seitz KA Jr. A stochastic model for the development of Bateson–Dobzhansky–Muller incompatibilities that incorporates protein interaction networks. Math Biosci. 2012;238(1):49–53. pmid:22465838
- 31. Kalirad A, Azevedo RBR. Spiraling complexity: a test of the snowball effect in a computational model of RNA folding. Genetics. 2017;206(1):377–388. pmid:28007889
- 32. Lorenz R, Bernhart SH, Höner zu Siederdissen C, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011;6:26. pmid:22115189
- 33. Jost L. GST and its relatives do not measure differentiation. Mol Ecol. 2008;17(18):4015–4026. pmid:19238703
- 34. Revell LJ. phytools: an R package for phylogenetic comparative biology (and other things). Meth Ecol Evol. 2012;3(2):217–223.
- 35. Pennell MW, Eastman JM, Slater GJ, Brown JW, Uyeda JC, FitzJohn RG, et al. geiger v2.0: an expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics. 2014;30(15):2216–2218. pmid:24728855
- 36.
Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N, et al. caper: comparative analyses of phylogenetics and evolution in R. R package version 1.0.1. 2018.
- 37. Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020;585(7825):357–362. pmid:32939066
- 38. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17:261–272. pmid:32015543
- 39. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–2830.
- 40.
Team RC. R: A Language and Environment for Statistical Computing; 2021. Available from: https://www.R-project.org/.
- 41. Schluter D. Evidence for ecological speciation and its alternative. Science. 2009;323(5915):737–741. pmid:19197053
- 42. Nei M. Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci U S A. 1973;70(12):3321–3323. pmid:4519626
- 43. Wilke CO, Wang JL, Ofria C, Lenski RE, Adami C. Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature. 2001;412(6844):331–333. pmid:11460163
- 44. Halligan DL, Keightley PD. Spontaneous mutation accumulation studies in evolutionary genetics. Annu Rev Ecol Evol Syst. 2009;40(1):151–172.
- 45. Domingo-Calap P, Cuevas JM, Sanjuán R. The fitness effects of random mutations in single-stranded DNA and RNA bacteriophages. PLoS Genet. 2009;5(11):e1000742. pmid:19956760
- 46. Gardner A, Kalinka AT. Recombination and the evolution of mutational robustness. J Theor Biol. 2006;241(4):707–715. pmid:16487979
- 47. Misevic D, Ofria C, Lenski RE. Sexual reproduction reshapes the genetic architecture of digital organisms. Proc R Soc B. 2006;273(1585):457–464. pmid:16615213
- 48. Martin OC, Wagner A. Effects of recombination on complex regulatory circuits. Genetics. 2009;183:673–684. pmid:19652184
- 49. Kim KJ, Fernandes VM. Effects of ploidy and recombination on evolution of robustness in a model of the segment polarity network. PLoS Comput Biol. 2009;5(2):e1000296. pmid:19247428
- 50. Lohaus R, Burch CL, Azevedo RBR. Genetic architecture and the evolution of sex. J Hered. 2010;101(suppl. 1):S142–S157. pmid:20421324
- 51. Rabosky DL, Matute DR. Macroevolutionary speciation rates are decoupled from the evolution of intrinsic reproductive isolation in Drosophila and birds. Proc Natl Acad Sci U S A. 2013;110(38):15354–15359. pmid:24003144
- 52. Coyne JA, Orr HA. Patterns of speciation in Drosophila. Evolution. 1989;43(2):362–381. pmid:28568554
- 53. Yukilevich R. Asymmetrical patterns of speciation uniquely support reinforcement in Drosophila. Evolution. 2012;66(5):1430–1446. pmid:22519782
- 54. Pagel M. Inferring the historical patterns of biological evolution. Nature. 1999;401(6756):877–884. pmid:10553904
- 55. Blomberg SP, Garland T Jr, Ives AR. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 2003;57(4):717–745. pmid:12778543
- 56. Turelli M, Moyle LC. Asymmetric postmating isolation: Darwin’s corollary to Haldane’s rule. Genetics. 2007;176(2):1059–1088. pmid:17435235
- 57. Rosenzweig ML. Loss of speciation rate will impoverish future diversity. Proc Natl Acad Sci U S A. 2001;98(10):5404–5410. pmid:11344286
- 58.
Hubbell SP. The Unified Neutral Theory of Biodiversity and Biogeography. Princeton, N.J.: Princeton Univ. Press; 2001.
- 59. Marzluff JM, Dial KP. Life history correlates of taxonomic diversity. Ecology. 1991;72(2):428–439.
- 60. Stanley SM. Population size, extinction, and speciation: the fission effect in Neogene Bivalvia. Paleobiology. 1986;12(1):89–110.
- 61.
Mayr E. Change of genetic environment and evolution. In: Huxley J, Hardy AC, Ford EB, editors. Evolution as a Process. Allen & Unwin; 1954. p. 157–180.
- 62. Carson HL, Templeton AR. Genetic revolutions in relation to speciation phenomena: the founding of new populations. Annu Rev Ecol Syst. 1984;15:97–131.
- 63. Gavrilets S, Hastings A. Founder effect speciation: a theoretical reassessment. Am Nat. 1996;147(3):466–491.
- 64. Maya-Lastra CA, Eaton DAR. Genetic incompatibilities do not snowball in a demographic model of speciation. bioRxiv. 2021.
- 65. Khatri BS, Goldstein RA. Simple biophysical model predicts faster accumulation of hybrid incompatibilities in small populations under stabilizing selection. Genetics. 2015;201(4):1525–1537. pmid:26434721
- 66. Khatri BS, Goldstein RA. A coarse-grained biophysical model of sequence evolution and the population size dependence of the speciation rate. J Theor Biol. 2015;378:56–64. pmid:25936759
- 67. Khatri BS, Goldstein RA. Biophysics and population size constrains speciation in an evolutionary model of developmental system drift. PLoS Comput Biol. 2019;15(7):e1007177. pmid:31335870
- 68. Cutter AD. The polymorphic prelude to Bateson–Dobzhansky–Muller incompatibilities. Tr Ecol Evol. 2012;27(4):209–218. pmid:22154508
- 69. Corbett-Detig RB, Zhou J, Clark AG, Hartl DL, Ayroles JF. Genetic incompatibilities are widespread within species. Nature. 2013;504(7478):135–137. pmid:24196712
- 70. Orr HA, Orr LH. Waiting for speciation: the effect of population subdivision on the time to speciation. Evolution. 1996;50(5):1742–1749. pmid:28565607
- 71. Ringo J, Wood D, Rockwell R, Dowse H. An experiment testing two hypotheses of speciation. Am Nat. 1985;126(5):642–661.
- 72. Powell JR. The founder-flush speciation theory: an experimental approach. Evolution. 1978;32(3):465–474. pmid:28567948
- 73. Kilias G, Alahiotis SN, Pelecanos M. A multifactorial genetic investigation of speciation theory using Drosophila melanogaster. Evolution. 1980;34(4):730–737. pmid:28563991
- 74. Dodd DMB, Powell JR. Founder-flush speciation: an update of experimental results with Drosophila. Evolution. 1985;39(6):1388–1392. pmid:28564258
- 75. Galiana A, González-Candelas F, Moya A. Postmating isolation analysis in founder-flush experimental populations of Drosophila pseudoobscura. Evolution. 1996;50(2):941–944. pmid:28568951
- 76. Huang JP, Leavitt SD, Lumbsch HT. Testing the impact of effective population size on speciation rates—a negative correlation or lack thereof in lichenized fungi. Sci Rep. 2018;8:5729. pmid:29636516
- 77. Louca S, Pennell MW. Extant timetrees are consistent with a myriad of diversification histories. Nature. 2020;580(7804):502–505. pmid:32322065
- 78. Louca S, Pennell MW. Why extinction estimates from extant phylogenies are so often zero. Curr Biol. 2021;31(14):3168–3173. pmid:34019824
- 79. Glazier DS. Toward a predictive theory of speciation: the ecology of isolate selection. J Theor Biol. 1987;126(3):323–333.
- 80. Butlin RK. Recombination and speciation. Mol Ecol. 2005;14(9):2621–2635. pmid:16029465
- 81. Navarro A, Barton NH. Accumulating postzygotic isolation genes in parapatry: a new twist on chromosomal speciation. Evolution. 2003;57(3):447–459. pmid:12703935
- 82. Rafajlović M, Rambla J, Feder JL, Navarro A, Faria R. Inversions and genomic differentiation after secondary contact: when drift contributes to maintenance, not loss, of differentiation. Evolution. 2021;75(6):1288–1303. pmid:33844299
- 83. Bank C, Bürger R, Hermisson J. The limits to parapatric speciation: Dobzhansky–Muller incompatibilities in a continent–island model. Genetics. 2012;191(3):845–863. pmid:22542972
- 84. Templeton AR. The reality and importance of founder speciation in evolution. BioEssays. 2008;30(5):470–479. pmid:18404703
- 85. Lindstrom SE, Cox NJ, Klimov A. Genetic analysis of human H2N2 and early H3N2 influenza viruses, 1957-1972: evidence for genetic divergence and multiple reassortment events. Virology. 2004;328(1):101–119. pmid:15380362
- 86. Worobey M, Han GZ, Rambaut A. Genesis and pathogenesis of the 1918 pandemic H1N1 influenza A virus. Proc Natl Acad Sci U S A. 2014;111(22):8107–8112. pmid:24778238
- 87. Robertson DL, Sharp PM, McCutchan FE, Hahn BH. Recombination in HIV-1. Nature. 1995;374(6518):124–126. pmid:7877682
- 88. Robertson DL, Hahn BH, Sharp PM. Recombination in AIDS viruses. J Mol Evol. 1995;40(3):249–259. pmid:7723052
- 89. Simmonds P, Midgley S. Recombination in the genesis and evolution of hepatitis B virus genotypes. J Virol. 2005;79(24):15467–15476. pmid:16306618
- 90. van Cuyck H, Fan J, Robertson DL, Roques P. Evidence of recombination between divergent hepatitis E viruses. J Virol. 2005;79(14):9306–9314. pmid:15994825
- 91. Huang T, Wang W, Bessaud M, Ren P, Sheng J, Yan H, et al. Evidence of recombination and genetic diversity in human rhinoviruses in children with acute respiratory infection. PLoS ONE. 2009;4(7):e6355. pmid:19633719
- 92. Wisdom A, Kutkowska AE, Leitch ECM, Gaunt E, Templeton K, Harvala H, et al. Genetics, recombination and clinical features of human rhinovirus species C (HRV-C) infections; interactions of HRV-C with other respiratory viruses. PLoS ONE. 2009;4(12):e8518. pmid:20041158
- 93. Lau SKP, Feng Y, Chen H, Luk HKH, Yang WH, Li KSM, et al. Severe acute respiratory syndrome (SARS) coronavirus ORF8 protein is acquired from SARS-related coronavirus from greater horseshoe bats through recombination. J Virol. 2015;89(20):10532–10547. pmid:26269185
- 94. Sabir JSM, Lam TTY, Ahmed MMM, Li L, Shen Y, Abo-Aba SEM, et al. Co-circulation of three camel coronavirus species and recombination of MERS-CoVs in Saudi Arabia. Science. 2016;351(6268):81–84. pmid:26678874
- 95. Dudas G, Rambaut A. MERS-CoV recombination: implications about the reservoir and potential for adaptation. Virus Evol. 2016;2(1):vev023. pmid:27774293
- 96. Tang X, Wu C, Li X, Song Y, Yao X, Wu X, et al. On the origin and continuing evolution of SARS-CoV-2. Natl Sci Rev. 2020;7(6):1012–1023. pmid:34676127
- 97. Boni MF, Lemey P, Jiang X, Lam TTY, Perry BW, Castoe TA, et al. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol. 2020;5(11):1408–1417. pmid:32724171
- 98. Bobay LM, O’Donnell AC, Ochman H. Recombination events are concentrated in the spike protein region of Betacoronaviruses. PLoS Genet. 2020;16(12):e1009272. pmid:33332358
- 99. Zhou H, Ji J, Chen X, Bi Y, Li J, Wang Q, et al. Identification of novel bat coronaviruses sheds light on the evolutionary origins of SARS-CoV-2 and related viruses. Cell. 2021;184(17):4380–4391. pmid:34147139
- 100. Padidam M, Sawyer S, Fauquet CM. Possible emergence of new geminiviruses by frequent recombination. Virology. 1999;265(2):218–25. pmid:10600594
- 101.
MacArthur RH, Wilson EO. The Theory of Island Biogeography. Princeton, N.J.: Princeton University Press; 1967.
- 102. Richter-Dyn N, Goel NS. On the extinction of a colonizing species. Theor Popul Biol. 1972;3(4):406–433. pmid:4667096
- 103. Pimm SL, Jones HL, Diamond J. On the risk of extinction. Am Nat. 1988;132(6):757–785.
- 104. Lande R. Risks of population extinction from demographic and environmental stochasticity and random catastrophes. Am Nat. 1993;142(6):911–927. pmid:29519140
- 105. Whitlock AOB, Peck KM, Azevedo RBR, Burch CL. An evolving genetic architecture interacts with Hill–Robertson interference to determine the benefit of sex. Genetics. 2016;203(2):923–936. pmid:27098911
- 106. Krakauer DC, Plotkin JB. Redundancy, antiredundancy, and the robustness of genomes. Proc Natl Acad Sci U S A. 2002;99(3):1405–9. pmid:11818563
- 107. Elena SF, Wilke CO, Ofria C, Lenski RE. Effects of population size and mutation rate on the evolution of mutational robustness. Evolution. 2007;61(3):666–674. pmid:17348929
- 108. Pyron RA, Wiens JJ. Large-scale phylogenetic analyses reveal the causes of high tropical amphibian diversity. Proc R Soc B. 2013;280(1770):20131622. pmid:24026818
- 109. Ricklefs RE. Estimating diversification rates from phylogenetic information. Tr Ecol Evol. 2007;22(11):601–610. pmid:17963995
- 110. Rolland J, Condamine FL, Jiguet F, Morlon H. Faster speciation and reduced extinction in the tropics contribute to the mammalian latitudinal diversity gradient. PLoS Biol. 2014;12(1):e1001775. pmid:24492316
- 111. Yukilevich R. Tropics accelerate the evolution of hybrid male sterility in Drosophila. Evolution. 2013;67(6):1805–1814. pmid:23730771
- 112. Mittelbach GG, Schemske DW, Cornell HV, Allen AP, Brown JM, Bush MB, et al. Evolution and the latitudinal diversity gradient: speciation, extinction and biogeography. Ecol Lett. 2007;10(4):315–331. pmid:17355570
- 113. Schluter D. Speciation, ecological opportunity, and latitude. Am Nat. 2016;187(1):1–18. pmid:26814593