Skip to main content
  • Loading metrics

What Is Speciation?


Concepts and definitions of species have been debated by generations of biologists and remain controversial. Microbes pose a particular challenge because of their genetic diversity, asexual reproduction, and often promiscuous horizontal gene transfer (HGT). However, microbes also present an opportunity to study and understand speciation because of their rapid evolution, both in nature and in the lab, and small, easily sequenced genomes. Here, we review how microbial population genomics has enabled us to catch speciation “in the act” and how the results have challenged and enriched our concepts of species, with implications for all domains of life. We describe how recombination (including HGT and introgression) has shaped the genomes of nascent microbial, animal, and plant species and argue for a prominent role of natural selection in initiating and maintaining speciation. We ask how universal is the process of speciation across the tree of life, and what lessons can be drawn from microbes? Comparative genomics showing the extent of HGT in natural populations certainly jeopardizes the relevance of vertical descent (i.e., the species tree) in speciation. Nevertheless, we conclude that species do indeed exist as clusters of genetic and ecological similarity and that speciation is driven primarily by natural selection, regardless of the balance between horizontal and vertical descent.

How Many Species and How Much Speciation?

Conservatively assuming there are ~107 different species on Earth, not counting most bacteria and archaea [1], and a single origin of life ~4x109 years ago, this gives an average diversification rate of 0.0025, or one new species every 400 years. This estimate is very rough and does not account for extinction events or “bursts” of speciation, and it is likely a severe underestimate because microbes are undercounted. More impressive than the number of species is the number of intermediate forms—Darwin’s “doubtful cases” [2]—suggesting that speciation is a continuous process that happens all the time [3]. This apparent fluidity has led us and others to propose that most organisms can probably be placed somewhere along a “spectrum” of speciation [4,5]. Of course, speciation may not happen at all, or at least not go to completion. Here, we are less concerned with the number and exact definition of species and more with why speciation happens (or not) and the nature of the speciation process.

A Brief History of Species Thinking

Here, we consider species in the vernacular sense, as clusters of individuals that show ecological and genetic similarities. We tend to know them when we see them—although microbial species are more difficult to “see” than those of multicellular eukaryotes (referred to here as macrobes). Given that species evolve from common ancestors (an evolutionary and phylogenetic species definition, e.g., [6]), the big question is not so much what species are, but what evolutionary forces make them (and keep them) distinct?

Darwin emphasized the role of natural selection and competition in shaping species and keeping them in separate ecological niches. Dobzhansky [7] and Mayr [8] emphasized the importance of reproductive isolation in maintaining the genetic distinctness of species; this “biological species concept” (BSC) based on sexual isolation does not easily apply to asexually reproducing organisms, including most bacteria and archaea (Box 1). Simpson [9] suggested more generally that distinct species must have separate evolution, and Van Valen [10] argued that this separateness is mainly due to ecological distinctness, not to reproductive isolation. Throughout this article–while acknowledging that reproductive isolation also involves selection (e.g., negative selection against Dobzhansky–Muller incompatibilities [11]–we use the term “natural selection” or simply “selection” to mean differential selection in different ecological niches. We also refer generically to "gene flow" or "genetic exchange," whether it involves the exchange of different alleles of homologous genes (similar to meiotic sex [12]) or the acquisition of brand new genes by nonhomologous recombination.

Box 1. Glossary

Allopatric: a set of sampled isolates or genomes from different geographic areas, where barriers to migration and gene flow are significant.

Biological species concept (BSC): a species concept based on restricted gene flow, in which genes are exchanged by recombination within but not between species. In sexual species, this is equivalent to sexual or reproductive isolation. In asexually reproducing (clonal) species, a version of this concept could apply when there is more HGT within than between species.

Clonal frame: the portion of the genome transmitted by vertical (clonal) evolution, unimpacted by HGT. Mutations in the clonal frame should all fall parsimoniously on a single phylogenetic tree. The concept of clonal frame is related to, but not identical to, the concept of core genome, which is the portion of the genome that is present (or in practice, that can be aligned) in all of a given set of sequenced isolates or metagenomes. The core genome is not necessarily vertically inherited and is therefore not necessarily part of the clonal frame.

CRISPR: Clustered, regularly interspaced short palindromic repeats in the genome, which, along with associated protein-coding genes, confer many bacteria and archaea with a type of adaptive immunity to mobile genetic elements.

Darwinian Threshold: the transition from mostly horizontal to mostly vertical transmission of genetic material, allowing the possibility of a branching tree structure relating species.

Exaptation: the process in which DNA or genes originally selected for one function (or originally selectively neutral) are selected for a new and different function.

Gene flow: exchange of genes by homologous or nonhomologous recombination

Gene-specific selective sweep: the process in which a selected gene or allele spreads in a population by recombination faster than by clonal expansion. The result is that the selected variant is present in more than a single clonal background, and diversity is not purged genome-wide when the selected gene reaches fixation.

Genetic drift: the tendency for units (mutations, genes, or individuals) to change in frequency because of random sampling in a population of finite size.

Genome-wide selective sweep: the process in which a selected gene or allele spreads in a population by clonal expansion of the genome that first acquired it. The result is that diversity is purged genome-wide, and the selected variant is linked in the same clonal frame as the rest of the genome.

Hologenome: the total set of genomes contained in a host and its symbionts (e.g., an animal's nuclear and mitochondrial genome, plus the genomes of its symbiotic microbiota).

Horizontal gene transfer (HGT): the incorporation of foreign DNA into a genome. Incorporation can be mediated by either homologous recombination or nonhomologous recombination of DNA that enters a cell via transformation, transduction, conjugation, or other mechanisms. In bacteria and archaea, all gene transfer is horizontal (i.e., always unidirectional from donor to recipient, rather than reciprocal). Horizontal transmission occurs within a generation, as opposed to vertical transmission of DNA from one generation to the next.

Homologous recombination: a mechanism of DNA integration requiring at least short tracts of identity between the genome and the foreign DNA, mediated by RecA and mismatch-repair machinery. The integrated DNA can result in single-nucleotide changes and, in some cases, addition or loss of hundreds to thousands of base pairs.

Hybridization: in sexual organisms, the process in which two individuals from distinct (but typically closely related) populations or species form viable progeny (hybrids) harboring a combination of both parental genomes.

Introgression (or introgressive hybridization): in sexual organisms, the process in which genes or portions of the genome are transferred from one population (or one species) to another by hybridization, followed by successive backcrosses with parental genomes.

Macrobe: a multicellular eukaryote.

Microbe: a microscopic single-celled bacterium, archaean, or eukaryote.

Mobile genetic element: a piece of DNA that is frequently transferred horizontally, either within or between genomes, and often encodes its own replication and transfer (e.g., plasmids, phages, transposons, integrative conjugative elements).

Natural selection: differential survival and reproduction of units (mutation, genes, or individuals) from one generation to the next.

Negative frequency-dependent selection (NFDS): a type of natural selection that favors rare phenotypes in a population.

Niche: a specific set of ecological parameters (environments, resources, physical and chemical characteristics, biotic interactions, etc.) to which an organism is adapted. This does not necessarily imply (but does not exclude) physical separation between niches.

Nonhomologous recombination: integration of DNA with no homologous allele already present in the genome, often mediated by phage and integrative elements. This results in the acquisition of entirely new genes.

Population: a group of individuals sharing genetic and ecological similarity and coexisting in a sympatric setting.

Species: a group of genetically and ecologically similar individuals that may be named with a Linnean binomial to aid communication. Species are recognizable as distinct clusters, based on genetic similarity across the genome and differences from other species. In most cases, distinct genetic clusters imply distinct ecology between clusters, otherwise clusters will not form or persist. These genetic clusters can be large (encompassing a great deal of genetic diversity) or small, and may contain ecological diversity that may eventually drive speciation (separation of one cluster into two) or may not (gene ecology).

Sympatric: a set of sampled isolates or genomes from the same geographic area, where barriers to migration and gene flow are low or nonexistent.

Taxon: a group of biological entities (species, genera, class, etc.) deriving from the same ancestor, defined by shared characteristics inherited from this ancestor.

Van Valen went on to speculate, “It may well be that Quercus macrocarpa in Quebec exchanges many more genes with local Q. bicolor than it does with Q. macrocarpa in Texas.” His idea—that gene exchanges (whether mediated by homologous or nonhomologous recombination) occur more frequently according to ecology and local geography than according to species boundaries—has been supported in genomic surveys of natural microbial populations. For example, we could simply replace some nouns in Van Valen’s quote to produce the following statement: Vibrio cholerae in the United States exchanges more genes with local V. metecus (a sister species) than it does with V. cholerae in Bangladesh [13]. Similar examples are found in animals such as Heliconius butterflies [14]. However, only a certain subset of genes is shared along geographic and/or ecological lines, while the rest of the genome evolves according to established (named) species boundaries. V. cholerae and V. metecus are therefore “good” species, recognizable as distinct genetic and ecological clusters despite exchanging genes for local adaptation. As we will see below, earlier stages of speciation are often characterized by the opposite genomic signature: only a subset of genes are diverged between species while the rest of the genome is freely recombined across species.

Van Valen also coined the term “multispecies”: a set of broadly sympatric species that exchange genes in nature. This term should resonate with microbial ecologists familiar with the famous trope, “Everything is everywhere [i.e., sympatric], but the environment selects” [15]. As the potential for global dispersal and widespread horizontal gene transfer (HGT) becomes increasingly apparent, it is not implausible to consider all bacteria, or even all life on Earth, as a sort of multispecies. Van Valen did not go quite so far, but did suggest that there could be taxa without species and that the family Enterobacteriaceae, for example, might constitute one such multispecies unit. We disagree that there are taxa without species. However, if a pair of putative species is discovered to form a single genetic cluster (for example, if unable to be distinguished in an assignment test such as BAPS [16] or STRUCTURE [17]), we should conclude that there is one species, rather than no species or multispecies. Our perspective implies that some species may contain much more genetic diversity than others and that a simple operational cutoff of percent DNA identity would not be appropriate for species delimitation.

Finally, Van Valen observed that “multispecies seem to occur less commonly among metazoans than elsewhere” and suggested that this could be due to increased complexity and precise mating systems in metazoa. This concept of speciation as a byproduct of biological complexity rather than ecology was explored and elaborated in Woese’s idea of “Darwinian Threshold” [18], referring to the transition from a precellular soup with rampant HGT to a mostly tree-like pattern of distinct species that undergo distinguishable speciation events. According to Woese, once the complex machinery of replication and protein translation had evolved, it became “locked in place” by coadaptation, and its individual components could not be easily horizontally transferred from cell to cell because they would be incompatible with divergent recipient cell machinery. Consistent with the logic of the complexity hypothesis [19], HGT is more common among genes that function at the periphery rather than the highly interconnected core of bacterial metabolic networks [19,20]. However, the cumulative impact of HGT on the tree of life is much greater than imagined by Woese. The tree of life has been criticized as “the tree of one percent” [21] because only about 1% of genes support this tree [22]. While HGT has not obscured all traces of vertical descent across the tree of life [23,24], much of any organism’s genome may not have crossed the Darwinian Threshold, and may never do so.

In contrast to Van Valen’s selection-driven ecological speciation (and Cohan’s subsequent ecotype models [25]), neutral speciation involves only genetic drift. A typical neutral scenario would be a population that becomes geographically separated, allowing the two sub-populations to diverge genetically, such that they become reproductively incompatible if and when they meet again. Speciation affected by neutral processes is expected to be more common in macrobes because of populations with strong biogeography (limited dispersal) and smaller effective population sizes that favor drift over natural selection. Lynch and Conery [26] even suggest that drift was the major factor leading to evolutionary diversification in macrobes, with the neutral accumulation of noncoding DNA leading to increasing genome expansion, allowing complex gene regulation and cell specialization and in turn leading to exaptation of ecological novelties.

According to the "everything everywhere" dogma, most microbes form populations large enough to accumulate mutations that could be beneficial in a broad range of environments and to migrate so efficiently that few genetic incompatibilities have a chance to fix via drift within populations. Speciation in the microbial world is therefore expected to involve little drift and geographical separation. However, drift plays an important role in the evolution of microbial symbionts and pathogens that undergo population bottlenecks during transmission from host to host [27]. Drift may therefore play a dominant role in the evolution of endosymbionts such as Buchnera [28], but this does not necessarily exclude a role of natural selection in their speciation. Some microbes also have strongly constrained geographic distributions. For example, thermophilic archaea diverge genetically with geographic separation [29,30]. Some yeasts also experience limited migration across continents [31,32] and population size fluctuations [33,34], both of which may contribute to the emergence of species. However, strong selection, for instance driven by domestication [31] or local climatic adaptation [35], can either reinforce or mitigate speciation in yeast. Hence, as in macrobes, speciation in microbes will be driven by a balance between drift and selection, with macrobes likely experiencing more drift because of smaller population sizes and limited dispersal.

More broadly, the species problem can be viewed as a specific instance of the “levels of selection” problem [36,37]: how do natural selection and drift act on units at different levels of organization—ranging from genes, to protein complexes, to cells, to populations, to communities—to yield cooperation and cohesiveness within units but boundaries between units? It also raises the question, what are species made of? The Neo-Darwinian perspective (resulting from the Modern Synthesis of Darwinism and Mendelian genetics [38,39]) is that species differ genetically across their whole genomes, and speciation is caused by "speciation genes"—some combination of genes that cause reproductive isolation and/or adaptation to different ecological niches (Fig 1). Traditionally, populations of organisms have been viewed as the units undergoing speciation, with whole-genome isolation developing between them. However, the lack of support for a cleanly branching organismal phylogeny has suggested to some that we should think of speciation as applying only to parts of the genome, not the whole genome—the “genic view” of speciation [4]. In essence, different parts of the genome may speciate at different rates or not at all [40], such that variable sets of genes are the elements that truly speciate (Fig 1). In the genic view, speciation still occurs but is driven by natural selection on genes, while reproductive isolation can remain incomplete. Taken to an extreme, this becomes “gene ecology” [25,41], and speciation does not occur. Rather, a set of genes or alleles inhabits the ecological niches to which they are best adapted without driving isolation of the rest of the genome. For example, vancomycin resistance genes might inhabit the hospital niche, and otherwise identical strains of Staphylococcus aureus may differ only in the presence or absence of these genes [42]. We might not classify these strains as separate species, but with time, their ecological differences could be followed by genetic differentiation and speciation. Symbiotic microbes might also maintain species boundaries, leading to the concept of holobionts: species that are made of multiple genomes, including host and symbionts [4345]. Holobiont concepts are still in their infancy [46], and the extent of their contribution to speciation will surely become clearer in the coming years. Thus, the populations we call species can vary widely in what fractions of their genomes and hologenomes are isolated and how they emerge and remain isolated.

Fig 1. Units of species and speciation.

The Neo-Darwinian view of the Modern Synthesis is that "speciation genes" are the units driving speciation across the genome. Alternatively, if gene sets (including consortia of genes like plasmids or other mobile genetic elements) are sufficiently decoupled from their host genomes, this will lead to "gene ecology," in which gene sets, not species, determine reproductive isolation and/or adapt to ecological niches. Speciation could also be maintained (or potentially driven) by microbial symbionts or by host genes that select for particular symbionts, resulting in hologenome species. All of these speciation mechanisms can potentially be driven by selection or drift, and the list of units and mechanisms (arrows) is not exhaustive.

Are Eukaryotes Fuzzy Like Bacteria?

Since Dobzhansky and Mayr, the prevailing dogma has been that bacteria are “messy” because they don't easily fit the BSC. Recent findings are challenging this dogma, showing that while species are indeed messy in bacteria, they can be almost as messy in eukaryotes [12]. In other words, bacteria may fit the BSC better than we had thought [5,47,48] and eukaryotes may fit it worse. Eukaryotic genomes are impacted by HGT from viruses, bacteria, and even other eukaryotes [49,50]. Mobile genetic elements make up about two-thirds of the human genome, and their origins are often due to HGT [5153]. HGT in eukaryotes, even if rare, can be important in the gain of new functions and, potentially, in speciation. Even without invoking interdomain HGT, gene flow by sexual hybridization across eukaryotic species boundaries (introgression) can be strong enough to obscure species branching events in large regions of the genome. In some cases, introgressive gene flow can bring new traits to a species, potentially giving rise to new varieties or even new species [34]. For example, HGT among close (introgression) or distant species of fungi, and even between fungi and bacteria, together with chromosomal rearrangements, have substantially shuffled fungal genomes and contributed to the emergence of new phytopathogenic [54,55] and brewing species [33,56]. In other cases, introgression (usually between closely related species pairs) has the potential to merge two species into one (e.g., [57]). It can be difficult to distinguish whether introgression is leading to genome-wide species convergence or simply the exchange of a few loci in the genome. For example, two species of Campylobacter were proposed to be converging [58], but the convergence may be at a very early stage or may simply involve the exchange of a few environmentally adaptive genes [59].

Although species boundaries are generally considered less fuzzy in macrobes, gene transfers by introgression among related species were revealed by fuzzy phylogenetic signals in genomic regions containing genes involved in mimicry in Heliconius [60,61] and in altitude adaptation in humans [62]. Hybridization and introgression may occur among non-sister species as well as well as between sister species, especially during rapid adaptive radiations. For example, in Heliconius, the "melpomene-silvaniform" clade consists of around 15 species. Most of these are "good" species that co-occur over large sympatric regions and are somewhat interfertile with other members of the clade. However, hybrids and backcrosses across the entire group occur in the wild and in captivity, suggesting the possibility that a slow trickle of introgression may be constantly occurring among both close and distant relatives [63]. In mosquito species, only a small fraction of the genome, mainly on the X chromosome, has not crossed species boundaries [64]. Yet, these mosquito species still form clear and distinct genetic clusters, thus fitting the criteria of “fuzzy species,” as originally proposed for macrobes [65] and microbes [66]. This is not to say that all eukaryotes form fuzzy species, nor all bacteria—rather, fuzzy species may emerge across the entire tree of life, given the right regime of recombination (HGT or gene flow).

The Islands Debate

Most of the initial research and theory on speciation focused on plant and animal populations, with one of the major debates centered on the relative importance of sympatric and allopatric speciation. Under the BSC, allopatry (physical separation, e.g., by islands or mountain ranges) provides a simple mechanism of reproductive isolation (Fig 2). Sympatric speciation, in the absence of barriers to gene flow, was initially thought to be rare, but more and more examples are being found in eukaryotes, either involving hybrid speciation [67,68] or not [6972].

Fig 2. Models of speciation under different regimes of selection and recombination.

In all models, a single population of chromosomes (circles) splits into two nascent species, distinguishable by sets of genetic differences. At each time point, the most frequent multilocus genotype is shown, but other chromosomes could be segregating in the population at lower frequencies. Different haplotypes (or clonal frames) are shown as black or white circles. The ancestral niche is shown in blue and a new niche in orange. Gene flow (recombination) between species is indicated by horizontal connections between branches. (A) In the simplest model of speciation with gene flow, a single mutation controlling sexual isolation (but not under selection) is the only divergent locus (yellow square), with other loci experiencing gene flow between incipient species. (B) Selection during speciation can produce a pattern of genetic diversity across the genome very similar to (A), but species are expected to be longer-lived. Mutations under selection at early and later stages of speciation are shown as orange stars. (C) Allopatric speciation with a population bottleneck and neutral divergence of species. As in (A), competitive exclusion should lead to the extinction of one species if they come back into contact. (D) Without gene flow, the mutation under selection between species (orange star) will purge diversity genome-wide as it sweeps through one population, resulting in genome-wide divergence from the other population.

Genomic comparisons of putative sympatric species pairs have revealed so-called genomic “islands of speciation,” parts of the genome that are highly divergent between species, while the rest of the genome is undifferentiated. Islands are thought to contain genes driving reproductive isolation [73]. As a result, islands are resistant to gene flow during speciation, while the rest of the genome is more likely to acquire genes across incipient species boundaries. The “speciation-with-gene-flow” model has been criticized as a potential artefact of a measure of genetic differentiation used to detect islands, and islands might appear because of lowered levels of polymorphism rather than as a result of any gene flow between species [74,75]. In the simplest model with gene flow but without selection, incipient species inhabit the same ecological niche (Fig 2A). As a result of competitive exclusion, one species will eventually go extinct [76] and speciation will fail. For speciation to succeed in the longer term, there should be at least some ecological differentiation between species, and islands should contain genes under divergent natural selection (Fig 2B).

Islands in Bacteria

Genomic regions akin to islands of speciation have also been described in natural microbial populations (reviewed in detail in [5]). Briefly, both Sulfolobus archaea [48] and Vibrio bacteria [47] have parts of their genomes that are strongly differentiated along ecological lines, whereas the rest of the genome remains undifferentiated and freely recombined between ecologically distinct strains. However, both Vibrio and Sulfolobus show a recent and increasing tendency for gene flow within rather than between ecological populations—a pattern reminiscent of the BSC. In Sulfolobus, the differentiated regions (defined as having high relative divergence) encompass approximately one-third of the genome, making them more analogous to continents than islands. In Vibrio, the islands occupy only about one percent of the genome and were defined as regions of high absolute divergence between ecological populations. The Vibrio islands were likely acquired from HGT from another Vibrio species, analogous to speciation by introgression in macrobes [60,77].

At first glance, these observations support some flavor of the speciation-with-gene-flow model for Vibrio because of its small islands of high absolute divergence (Fig 2A). For Sulfolobus, with its large continents of high relative divergence, distinguishing among models is more difficult. The two Sulfolobus populations could potentially have diverged in allopatry (e.g., in separate hotsprings) before encountering each other and exchanging genes in the hotspring from which they were sampled (Fig 2C). However, the Sulfolobus populations had different growth dynamics in the lab, suggesting ecological differences and a role for natural selection in keeping them separate [48].

In the BSC, speciation is initiated by boundaries to gene flow, perhaps followed by divergent natural selection. In the genic view, speciation is initiated by natural selection on genes, and reduced gene flow is a by-product, not a driver [4]. In the Vibrio populations, the island genes do not directly encode gene flow boundaries but likely provide adaptations to different ecological niches [78], resulting in divergent natural selection. Therefore, ecological speciation [79] might apply: islands arise because of divergent natural selection during speciation (Fig 2B). In this model, gene flow boundaries emerge later—as a consequence of less frequent encounters between strains with different ecological niches—or not at all. If complete boundaries to gene flow take some time to emerge, we can think of gene sets rather than whole genomes as the units that inhabit ecological niches. If gene flow boundaries never emerge, speciation does not occur (i.e., we are left with one species, not two) and this corresponds to the gene ecology model.

Gene Sweeps Versus Genome-Wide Sweeps

With relatively high rates of recombination (r), individual genes will “sweep” to fixation in ecological niches to which they are adapted, and this will occur without affecting genetic diversity elsewhere in the genome. When rates of recombination are relatively low compared to selective coefficients (s) within niches, entire genomes will sweep to fixation before they can be shuffled by recombination. The s >> r regime is well described in the Stable Ecotype Model [25], which predicts that most of the genome will follow a single “clonal frame” phylogeny (Fig 2D).

Gene-specific selective sweeps were initially thought to be unlikely because recombination rates in microbes are estimated to be low (r < 10−6 per locus per generation) relative to selection (s > 10−3) [25]. However, recent modeling work [80] has shown that gene sweeps can occur when r is either very high or—counter-intuitively—when r is very low, but only in the presence of negative frequency-dependent selection (on other loci in the genome, in addition to positive selection on an ecologically adaptive locus). Such frequency-dependent selection, liable to be common in nature, might be imposed by viral (phage) predation of bacteria, providing a selective advantage to rare alleles of phage receptor genes, for example [81,82].

Additional sampling and sequencing from natural populations will be required to assess the prevalence of gene sweeps. One recent study described a “quasi-sexual” cyanobacterial population, in which virtually every gene in the genome was unlinked by recombination, with each sampled genome being a random combination of alleles [83]. Some of these alleles showed evidence of natural selection, suggesting the action of gene sweeps within a single cohesive population (i.e., gene ecology not leading to speciation).

Open Questions

These recent models [80] and empirical work [83] have made some headway in resolving the paradox of gene sweeps but also raise new questions. How common are gene-sweeps relative to the genome-wide sweeps predicted by the Stable Ecotype Model? On what time scales do sweeps occur, and how does this affect speciation rates?

More generally, can all life on Earth, including microbes and macrobes, be viewed on the same universal speciation spectrum? Early stages on the spectrum involve natural selection and drift within a single population, in which diversity arises from mutation and/or recombination of both small [84] and large [85] pieces of both homologous and nonhomologous DNA. This genetic diversity can be neutral or selfish, consisting of mobile elements that could potentially (but not necessarily) be exapted for species-level adaptation. Later stages of speciation involve divergent natural selection and barriers to gene flow. The extent to which these barriers are ecological, behavioral, physical, or genetic remains an open research question. Evidence from comparative genomics has shown that purely genetic barriers such as CRISPR may provide effective barriers over short (within-species) time scales [86] but not over longer evolutionary time scales [87]. Therefore, gene flow barriers will always be leaky—in both microbes and macrobes.

Here, we have argued that selection, except in special cases of sustained allopatry, is almost certainly required for the long-term success of speciation. More examples will be needed to test its generality, but our model is as follows. Selection drives speciation and is followed by genome-wide divergence, due to reduced gene flow (in recombining populations) or mutational divergence (in clonal populations). If genome-wide divergence does not follow, speciation does not occur (or is stalled at a very early stage) and we are left with gene ecology. Just how much selection (on how many genes) and how much divergence across the genome is needed for speciation is an open question. Another important question is, for a given sample of organisms, what fraction of the genome is shaped by selection or drift within the individual, the species, or the multispecies [37]? In asking (and eventually answering) this question, we begin to appreciate that not only does speciation occur along a spectrum, but species can be placed within a spectrum of biological diversity, from the molecule to the biosphere.


  1. 1. Mora C, Tittensor DP, Adl S, Simpson AGB, Worm B. How Many Species Are There on Earth and in the Ocean? PLoS Biol. 2011;9: e1001127. pmid:21886479
  2. 2. Darwin C. The Origin of Species. London: John Murray; 1859.
  3. 3. Mallet J. Hybridization, ecological races and the nature of species: empirical evidence for the ease of speciation. Philos Trans R Soc Lond, B, Biol Sci. 2008;363: 2971–2986. pmid:18579473
  4. 4. Wu C-I. The genic view of the process of speciation. J Evol Biol. 2001;14: 851–865.
  5. 5. Shapiro BJ, Polz MF. Ordering microbial diversity into ecologically and genetically cohesive units. Trends Microbiol. 2014;22: 235–247. pmid:24630527
  6. 6. Hennig, W. 1968. Elementos de una Sistemática Filogenética (Translation of Grundzüge einer Theorie der phylogenetischen Systematik). Editorial Universitaria de Buenos Aires, Buenos Aires.
  7. 7. Dobzhansky T. A Critique of the Species Concept in Biology. Philosophy of Science. 1935;2: 344–355.
  8. 8. Mayr E. Systematics and the Origin of Species. New York: Columbia University Press; 1942.
  9. 9. Simpson GG. Principles of Animal Taxonomy. New York: Columbia University Press; 1961.
  10. 10. Van Valen L. Ecological species, multispecies, and oaks. Taxon. 1976;25: 233–239.
  11. 11. Orr H. A., and Turelli M.. 2001. The evolution of postzygotic isolation: accumulating Dobzhansky-Muller incompatibilities. Evolution 55: 1085–1094. pmid:11475044
  12. 12. Mallet J., Besansky N., & Hahn M. W. (2016). How reticulated are species? BioEssays 38(2):140–9. pmid:26709836
  13. 13. Boucher Y, Cordero OX, Takemura A, Hunt DE, Schliep K, Bapteste E, et al. Local Mobile Gene Pools Rapidly Cross Species Boundaries To Create Endemicity within Global Vibrio cholerae Populations. mBio. 2011;2: e00335–10. pmid:21486909
  14. 14. Martin SH, Dasmahapatra KK, Nadeau NJ, Salazar C, Walters JR, Simpson F, et al. Genome-wide evidence for speciation with gene flow in Heliconius butterflies. Genome Research. 2013;23: 1817–1828. pmid:24045163
  15. 15. Baas-Becking LGM. Geobiologie of Inleiding Tot de Milieukunde. The Hague, Netherlands: W.P. Van Stockum & Zoon; 1934.
  16. 16. Corander J, Waldmann P, Marttinen P, Sillanpää MJ. BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics. 2004;20: 2363–2369. pmid:15073024
  17. 17. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes. 2007;7: 574–578. pmid:18784791
  18. 18. Woese CR. On the evolution of cells. Proc. Natl. Acad. Sci. USA 2002; 99: 8742–8747. pmid:12077305
  19. 19. Jain R., Rivera M. C., and Lake J. A.. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96: 3801–3806. pmid:10097118
  20. 20. Pal C, Papp B, Lercher MJ. Adaptive evolution of bacterial metabolic networks by horizontal gene transfer. Nature Genetics. 2005;37: 1372–1375. pmid:16311593
  21. 21. Dagan T, Martin W. The tree of one percent. Genome Biol. 2006;7: 118. pmid:17081279
  22. 22. Ciccarelli FD, Doerks T, Mering Von C, Creevey CJ, Snel B, Bork P. Toward automatic reconstruction of a highly resolved tree of life. Science. 2006;311: 1283–1287. pmid:16513982
  23. 23. Abby S. S., Tannier E., Gouy M., and Daubin V.. 2012. Lateral gene transfer as a support for the tree of life. Proc. Natl. Acad. Sci. USA 109: 4962–4967. pmid:22416123
  24. 24. Lassalle F., Muller D., and Nesme X.. 2015. Ecological speciation in bacteria: reverse ecology approaches reveal the adaptive part of bacterial cladogenesis. Research in Microbiology 166: 729–741. pmid:26192210
  25. 25. Wiedenbeck J, Cohan FM. Origins of bacterial diversity through horizontal genetic transfer and adaptation to new ecological niches. FEMS Microbiology Reviews. 2011;35: 957–976. pmid:21711367
  26. 26. Lynch M, Conery JS. The origins of genome complexity. Science. 2003;302: 1401–1404. pmid:14631042
  27. 27. Achtman M, Wagner M. Microbial diversity and the genetic nature of microbial species. Nature Reviews Microbiology. 2008;6: 431–440. pmid:18461076
  28. 28. Herbeck JT, Funk DJ, Degnan PH, Wernegreen JJ. A conservative test of genetic drift in the endosymbiotic bacterium Buchnera: slightly deleterious mutations in the chaperonin groEL. Genetics. 2003;165: 1651–1660. pmid:14704156
  29. 29. Whitaker RJ, Grogan DW, Taylor JW. Geographic barriers isolate endemic populations of hyperthermophilic archaea. Science. 2003;301: 976–978. pmid:12881573
  30. 30. Reno ML, Held NL, Fields CJ, Burke PV, Whitaker RJ. Biogeography of the Sulfolobus islandicus pan-genome. Proc Natl Acad Sci USA. 2009;106: 8605–8610. pmid:19435847
  31. 31. Liti G, Carter DM, Moses AM, Warringer J, Parts L, James SA, et al. Population genomics of domestic and wild yeasts. Nature. 2009;458: 337–341. pmid:19212322
  32. 32. Charron G, Leducq J-B, Landry CR. Chromosomal variation segregates within incipient species and correlates with reproductive isolation. Mol Ecol. 2014;23: 4362–4372. pmid:25039979
  33. 33. Almeida P, Barbosa R, Zalar P, Imanishi Y, Shimizu K, Turchetti B, et al. A population genomics insight into the Mediterranean origins of wine yeast domestication. Mol Ecol. 2015; 24(21):5412–27 pmid:26248006
  34. 34. Leducq J-B, Nielly-Thibault L, Charron G, Eberlein C, Verta J-P, Samani P, et al. Speciation driven by hybridization and chromosomal plasticity in a wild yeast. Nature Microbiology. 2015; 1:15003.
  35. 35. Leducq J-B, Charron G, Samani P, Dubé AK, Sylvester K, James B, et al. Local climatic adaptation in a widespread microorganism. Proceedings of the Royal Society Biological Sciences Series B. The Royal Society; 2014;281: 20132472.
  36. 36. Okasha S. Evolution and the Levels of Selection. Oxford: Oxford University Press; 2006.
  37. 37. Brunet TDP, Doolittle WF. Multilevel Selection Theory and the Evolutionary Functions of Transposable Elements. Genome Biology and Evolution. 2015;7: 2445–2457. pmid:26253318
  38. 38. Dobzhansky T. Genetics and the Origin of Species. Columbia Univ. Press, New York; 1937
  39. 39. Huxley J. Evolution: The Modern Synthesis. MIT Press, Cambridge; 1942.
  40. 40. Retchless AC, Lawrence JG. Phylogenetic incongruence arising from fragmented speciation in enteric bacteria. Proc Natl Acad Sci USA. 2010;107: 11453–11458. pmid:20534528
  41. 41. Shapiro BJ. Signatures of natural selection and ecological differentiation in microbial genomes. Advances in experimental medicine and biology. 2014;781: 339–359. pmid:24277308
  42. 42. Courvalin P. Vancomycin resistance in gram-positive cocci. Clin Infect Dis. 2006;42 Suppl 1: S25–34. pmid:16323116
  43. 43. Bapteste E, Lopez P, Bouchard F, Baquero F, McInerney JO, Burian RM. Evolutionary analyses of non-genealogical bonds produced by introgressive descent. Proceedings of the National Academy of Sciences. 2012;109: 18266–18272.
  44. 44. Brucker RM, Bordenstein SR. Speciation by symbiosis. Trends Ecol Evol. 2012;27: 443–451. pmid:22541872
  45. 45. Bordenstein SR, Theis KR. Host Biology in Light of the Microbiome: Ten Principles of Holobionts and Hologenomes. PLoS Biol. 2015;13: e1002226. pmid:26284777
  46. 46. Moran N. a., & Sloan D. B. (2015). The Hologenome Concept: Helpful or Hollow? PLoS Biol, 13(12), e1002311. pmid:26636661
  47. 47. Shapiro BJ, Friedman J, Cordero OX, Preheim SP, Timberlake SC, Szabo G, et al. Population Genomics of Early Events in the Ecological Differentiation of Bacteria. Science. 2012;336: 48–51. pmid:22491847
  48. 48. Cadillo-Quiroz H, Didelot X, Held NL, Herrera A, Darling A, Reno ML, et al. Patterns of Gene Flow Define Species of Thermophilic Archaea. PLoS Biol. 2012;10: e1001265. pmid:22363207
  49. 49. Keeling PJ, Palmer JD. Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet. 2008;9: 605–618. pmid:18591983
  50. 50. Soucy SM, Huang J, Gogarten JP. Horizontal gene transfer: building the web of life. Nat Rev Genet. 2015;16: 472–482. pmid:26184597
  51. 51. Kidwell MG. Transposable elements and the evolution of genome size in eukaryotes. Genetica. 2002;115: 49–63. pmid:12188048
  52. 52. Sela N, Kim E, Ast G. The role of transposable elements in the evolution of non-mammalian vertebrates and invertebrates. Genome Biol. 2010;11: R59. pmid:20525173
  53. 53. de Koning APJ, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. PLoS Genet. 2011;7: e1002384. pmid:22144907
  54. 54. Hane JK, Rouxel T, Howlett BJ, Kema GH, Goodwin SB, Oliver RP. A novel mode of chromosomal evolution peculiar to filamentous Ascomycete fungi. Genome Biol. 2011;12: R45. pmid:21605470
  55. 55. de Jonge R, van Esse HP, Maruthachalam K, Bolton MD, Santhanam P, Saber MK, et al. Tomato immune receptor Ve1 recognizes effector of multiple fungal pathogens uncovered by genome and RNA sequencing. Proc Natl Acad Sci USA. 2012;109: 5110–5115. pmid:22416119
  56. 56. Libkind D, Hittinger CT, Valério E, Gonçalves C, Dover J, Johnston M, et al. Microbe domestication and the identification of the wild genetic stock of lager-brewing yeast. Proc Natl Acad Sci USA. 2011;108: 14539–14544. pmid:21873232
  57. 57. Behm J. E., Ives A. R., and Boughman J. W.. 2010. Breakdown in postmating isolation and the collapse of a species pair through hybridization. American Naturalist 175: 11–26. pmid:19916869
  58. 58. Sheppard S. K., Mccarthy N. D., Falush D., & Maiden M. C. J. (2008). Convergence of Campylobacter Species: Implications for Bacterial Evolution. Science, 320(5873), 237–239. pmid:18403712
  59. 59. Caro-Quintero A., Rodriguez-Castano G. P., & Konstantinidis K. T. (2009). Genomic Insights into the Convergence and Pathogenicity Factors of Campylobacter jejuni and Campylobacter coli Species. Journal of Bacteriology, 191(18), 5824–5831. pmid:19617370
  60. 60. Dasmahapatra KK, Walters JR, Briscoe AD, Davey JW, Whibley A, Nadeau NJ, et al. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature. 2012; 487: 94–98. pmid:22722851
  61. 61. Pardo-Diaz C, Salazar C, Baxter SW, Merot C, Figueiredo-Ready W, Joron M, et al. Adaptive Introgression across Species Boundaries in Heliconius Butterflies. PLoS Genet. 2012;8: e1002752. pmid:22737081
  62. 62. Huerta-Sánchez E, Jin X, Asan , Bianba Z, Peter BM, Vinckenbosch N, et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 2014;512: 194–197. pmid:25043035
  63. 63. Mallet J, Beltrán M, Neukirchen W, Linares M. Natural hybridization in heliconiine butterflies: the species boundary as a continuum. BMC Evolutionary Biology. 2007;7: 28–28. pmid:17319954
  64. 64. Fontaine MC, Pease JB, Steele A, Waterhouse RM, Neafsey DE, Sharakhov IV, et al. Mosquito genomics. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science. 2015;347: 1258524. pmid:25431491
  65. 65. Mallet J. A Species Definition for the Modern Synthesis. Trends Ecol Evol. 1995;10: 294–299. pmid:21237047
  66. 66. Hanage WP, Fraser C, Spratt BG. Fuzzy species among recombinogenic bacteria. BMC Biology. 2005;3: 6. pmid:15752428
  67. 67. Rieseberg LH, Van Fossen C, Desrochers AM. Hybrid Speciation Accompanied by Genomic Reorganization in Wild Sunflowers. Nature. 1995;375: 313–316.
  68. 68. Lukhtanov VA, Shapoval NA, Anokhin BA, Saifitdinova AF, Kuznetsova VG. Homoploid hybrid speciation and genome evolution via chromosome sorting. Proc Biol Sci. 2015;282: 20150157. pmid:25925097
  69. 69. Papadopulos A. S. T., Kaye M., Devaux C., Hipperson H., Lighten J., Dunning L. T., Hutton I., Baker W. J., Butlin R. K., and Savolainen V.. 2014. Evaluation of genetic isolation within an island flora reveals unusually widespread local adaptation and supports sympatric speciation. Philosophical Transactions of the Royal Society B 369: 20130342.
  70. 70. Barluenga M., Stölting K. N., Salzburger W., Muschick M., & Meyer A. (2006). Sympatric speciation in Nicaraguan crater lake cichlid fish. Nature, 439(7077), 719–723. pmid:16467837
  71. 71. Seehausen O., Terai Y., Magalhaes I. S., Carleton K. L., Mrosso H. D. J., Miyagi R., et al. (2008). Speciation through sensory drive in cichlid fish. Nature, 455(7213), 620–626. pmid:18833272
  72. 72. Malinsky M., Challis R. J., Tyers A. M., Schiffels S., Terai Y., Ngatunga B. P., et al. (2015). Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake. Science, 350, 1493–1498. pmid:26680190
  73. 73. Turner T, Hahn M, Nuzhdin S. Genomic islands of speciation in Anopheles gambiae. PLoS Biol. 2005;3: 1572–1578.
  74. 74. Noor M, Bennett SM. Islands of speciation or mirages in the desert? Examining the role of restricted recombination in maintaining species. Heredity. 2009; 103(6):439–44. pmid:19920849
  75. 75. Cruickshank TE, Hahn MW. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol Ecol. 2014;23: 3133–3157. pmid:24845075
  76. 76. Gause GF. The Struggle For Existence. Baltimore: Williams & Williams; 1934.
  77. 77. Baack EJ, Rieseberg LH. A genomic view of introgression and hybrid speciation. Current opinion in genetics & development. 2007;17: 513–518.
  78. 78. Yawata Y, Cordero OX, Menolascina F, Hehemann JH, Polz MF, Stocker R. Competition-dispersal tradeoff ecologically differentiates recently speciated marine bacterioplankton populations. Proc Natl Acad Sci USA. 2014;111: 5622–5627. pmid:24706766
  79. 79. Schluter D. Evidence for ecological speciation and its alternative. Science. 2009;323: 737–741. pmid:19197053
  80. 80. Takeuchi N, Cordero OX, Koonin EV, Kaneko K. Gene-specific selective sweeps in bacteria and archaea caused by negative frequency-dependent selection. BMC Biology. 2015;13: 20. pmid:25928466
  81. 81. Cordero OX, Polz MF. Explaining microbial genomic diversity in light of evolutionary ecology. Nature Reviews Microbiology. 2014;12: 263–273. pmid:24590245
  82. 82. Rodriguez-Valera F., Martin-Cuadrado A.-B., Rodriguez-Brito B., Pašić L., Thingstad T. F., Rohwer F., & Mira A. (2009). Explaining microbial population genomics through phage predation. Nature Reviews Microbiology, 7(11), 828–836. pmid:19834481
  83. 83. Rosen MJ, Davison M, Bhaya D, Fisher DS. Fine-scale diversity and extensive recombination in a quasisexual bacterial population occupying a broad niche. Science. 2015;348: 1019–1023. pmid:26023139
  84. 84. Overballe-Petersen S, Harms K, Orlando LAA, Mayar JVM, Rasmussen S, Dahl TW, et al. Bacterial natural transformation by highly fragmented and damaged DNA. Proc Natl Acad Sci USA. 2013;110: 19860–19865. pmid:24248361
  85. 85. Naor A, Lapierre P, Mevarech M, Papke RT, Gophna U. Low Species Barriers in Halophilic Archaea and the Formation of Recombinant Hybrids. Curr Biol. 2012;22: 1444–1448. pmid:22748314
  86. 86. Palmer KL, Gilmore MS. Multidrug-Resistant Enterococci Lack CRISPR-cas. mBio. 2010;1: e00227–10. pmid:21060735
  87. 87. Gophna U, Kristensen DM, Wolf YI, Popa O, Drevet C, Koonin EV. No evidence of inhibition of horizontal gene transfer by CRISPR-Cas on evolutionary timescales. The ISME Journal. 2015;9: 2021–2027. pmid:25710183