Figure 1.
Evolutionary and genomic views of three genomes involving introgression.
Hybridization between species B and C results in individuals of species B with genomes that are mosaics with regions of "vertical" descent from B and others of introgressive descent from C. Walking along the genomes from left to right, local genealogies are observed, and when a recombination breakpoint is crossed, the local genealogy changes. (Here, the term local genealogy refers to the local tree describing the evolutionary history of a single site in the alignment.) Switching of local genealogies of unlinked (broken by recombination) loci is known as incomplete lineage sorting (ILS). Further, the walk enters regions of introgressive descent (II and IV), where the genealogies switch due to hybridization. The complexity of the model stems from the co-occurrence of ILS and introgression, and the need to tease them apart. Within the phylogenetic network of the species (leftmost), three possible local genealogies are shown: one that agrees with how species split and diverged (red), one that is reflective of the introgression event (blue), and another that is a signature of ILS (brown).
Figure 2.
Local genealogies and parental species trees.
The set of genomes (a) have a reticulate evolutionary history, where individuals in B have some genetic material from the common ancestor of B and A, and other genetic material from C (b). In particular, the "blue locus" in the genomes has
as its local genealogy and the "red locus" in the genomes has
as its local genealogy (c). Further, genealogy
for the blue site evolved within the parental species tree
, whereas genealogy
for the red locus evolved within the parental species tree
.
Figure 3.
The structure of the HMM (only states are shown) that PhyloNet-HMM builds for the simple scenario of one individual sampled per species in Fig. 2. The three states correspond to genomic regions whose evolution follows the parental tree
, and there is a state for each of the three possible local genealogies. The three
states correspond to genomic regions whose evolution follows the parental tree
, and there is a state for each of the three possible local genealogies.
is the start state. See text for emission and transition probabilities.
Figure 4.
From a phylogenetic network to a MUL-tree.
Illustration of the conversion from a phylogenetic network to a MUL-tree, along with all allele mappings associated with the case in which single alleles ,
,
and
were sampled from each of the four species
,
,
and
, respectively.
Figure 5.
Model used for simulation of introgression.
Migration from population B to population A proceeds at rate , beginning at time
and ending at time
. Times
and
correspond to the split of populations A and B and the split of the outgroup population from the ancestral population of A and B, respectively.
Table 1.
Previously reported population genetic estimates upon which our simulation parameter settings were based.
Table 2.
Mouse samples and data sets.
Figure 6.
Comparison of the percentage of introgressed sites inferred by PhyloNet-HMM versus two lower bounds on simulated data sets.
The percentage of sites is the number of sites for which
, based on Eq. (2), is
, divided by the total number of sites in the simulated genomes, which is 100,000. The lower bounds on the true percentage of introgressed sites are based on the frequency that one of the two lineages from population A coalesced with lineages in population B between times
and
. (See Materials and Methods for additional discussion.) Six model conditions are shown, encompassing three migration rates and two different dates of migration. A migration rate
corresponds to a pure isolation model, whereas a migration rate
corresponds to an isolation-with-migration model. Standard error bars are shown, and the number of replicates for each model is
.
Figure 7.
Empirical base frequencies inferred by PhyloNet-HMM on simulated data sets.
Panels (a) and (b) show model conditions with migration times and
, respectively, and different migration rates. Standard error bars are shown, and
.
Figure 8.
Empirical substitution rates inferred by PhyloNet-HMM on simulated data sets.
Otherwise, figure layout and description match Fig. 7.
Figure 9.
The phylogenetic network used in our analyses and the two parental trees.
The phylogenetic network (a) captures introgression from M. spretus to M. m. domesticus. The red and blue lines illustrate two possible gene genealogies involving no introgression (blue) and introgression (red). The parental tree in (b) captures genomic regions with no introgression, while the parental tree in (c) captures genomic regions of introgressive descent.
Figure 10.
Introgression scans of chromosome 7 from the Mus musculus domesticus data set.
Results in panels (a) through (c) are based on posterior decoding (Eq. 2). Panel (a) gives the probability that PhyloNet-HMM is in one of the introgressed () states. Panel (b) shows the probability that PhyloNet-HMM is in an introgressed (
) state corresponding to a particular gene genealogy, where each gene genealogy is displayed in a separate row and pixel intensity varies from white to blue to represent probabilities from
to
. Panel (c) is identical to panel (b) except that non-introgressed (
) states are shown. Results in panels (d) through (f) are based upon a Viterbi-optimal trajectory. In panel (d), genomic regions are classified as having introgressed origin or not based on the hidden state that the Viterbi-optimal trajectory is in (either an
or
state, respectively). Panel (e) show the rooted gene genealogy inferred for each locus classified as introgressed in panel (d). Each distinct rooted gene genealogy is represented using a distinct color and row. Panel (f) shows the rooted gene genealogy inferred for the remaining loci (which were not classified as introgressed). Panel (g) shows loci sampled by the Mouse Diversity Array [36], which we used to genotype our samples. The dashed vertical line indicates the location of the Vkorc1 gene, which was shown by [2] to be a driver gene in an introgression event between ( M. m. domesticus and Mus spretus) and leading to the spread of rodenticide resistance in the wild. The grey bars indicate regions with missing data that were approximately 100 kb or longer.
Figure 11.
Introgression scans of chromosome 7 from the Mus musculus musculus data set.
Figure layout and description are otherwise identical to Fig. 10.