Fig 1.
The Protracted Birth Death (PBD) model of the speciation process [26, 27] implemented in DELINEATE separately models lineage splitting and completion of speciation events, such that “speciation” is an extended process.
For example, considering a phylogeny of population lineages inferred under the MSC [1, 28], the lineage splitting events correspond to the formation of new isolated population lineages (not species) through restrictions in gene flow in an ancestral population (e.g., V1). These lineages may themselves give rise to other population lineages (V2 through V9), or go extinct (X1 through X3). Population lineages develop into an independent species at a fixed background rate, providing they are not otherwise lost (i.e., there is duration between the initiation and completion of speciation). Changes in status from incipient to full or good species are marked by speciation completion events, shown by the blue bars. Under the PBD, a “species” is thus made up of one or more population lineages not separated from one another by a speciation event. In this example, five speciation completion events divide the seven extant populations into four species: {A, B}, {C}, {D, E}, and {F, G}.
Fig 2.
Overview of speciation-based delimitation using DELINEATE.
Starting with genomic data (a), a lineage tree (b) is inferred under the multispecies coalescent (MSC) model using any of a number of programs, such as BP&P [28] or *BEAST [30]. The inferred lineages, which are consistent with a Wright-Fisher population under the MSC model (i.e., cannot be divided any further), are (c) organized into sets of one or more species in DELINEATE, with each possible organization (referred to as a “partition”) representing a different hypothesis about species boundaries. Partitions can range from a single species (i.e. all lineages assigned to the same species) to as many species as their are lineages (i.e., there are no population lineages, only different species). The probability of each of the different partitions is calculated and reported by DELINEATE, The partition with the highest probability is the maximum likelihood estimate, but investigators have at their disposable all the partitions in the 95% confidence interval as well if they wish to summarize support for particular results as well.
Fig 3.
Accuracy of species delimitation under different levels of species constraints and dataset sizes (i.e., number of lineages in the tree).
Simulations span differing speciation-completion rates (indicated by color gradient with darker colors representing lower rates and lighter colors representing higher rates). Even with inferring the speciation-completion rate from the data, (a) recovery of the correct number of species is extremely reliable across a broad range of conditions, comparing the true number of species with the the inferred number of species; each dot corresponds to the analysis of one replicate dataset. However, whether the (b) identity of species is accurately inferred differs depending upon the size of the data set (i.e., number of lineages), the constraint level (i.e., the number of lineages with designations set a priori; e.g., “30/40” corresponds to a tree with 40 lineages, 30 of those with known species identities, and 10 lineages with inferred identities), and the particular speciation-completion rate the data were simulated under (note that this rate was inferred during the analyses). Shown are the proportion of 100 replicates for each set of conditions in which the partition with the highest probability corresponded to the correct assignments of all species identities.
Fig 4.
Accuracy of DELINEATE maximum likelihood estimates of speciation-completion rate compared with the true speciation-completion rate; the ribbon shows the minimum and maximum of the 95% confidence interval ranges for the various estimates under that true rate.
Note that the population isolation rate was fixed at 0.1, so the range of speciation-completion rates, from 0.001 to 0.1 spans rates from one hundred times slower to equal to the population isolation rate.