Leveraging shared ancestral variation to detect local introgression

Introgression is a common evolutionary phenomenon that results in shared genetic material across non-sister taxa. Existing statistical methods such as Patterson’s D statistic can detect introgression by measuring an excess of shared derived alleles between populations. The D statistic is effective to detect genome-wide patterns of introgression but can give spurious inferences of introgression when applied to local regions. We propose a new statistic, D+, that leverages both shared ancestral and derived alleles to infer local introgressed regions. Incorporating both shared derived and ancestral alleles increases the number of informative sites per region, improving our ability to identify local introgression. We use a coalescent framework to derive the expected value of this statistic as a function of different demographic parameters under an instantaneous admixture model and use coalescent simulations to compute the power and precision of D+. While the power of D and D+ is comparable, D+ has better precision than D. We apply D+ to empirical data from the 1000 Genome Project and Heliconius butterflies to infer local targets of introgression in humans and in butterflies.


Introduction
Analyses of both modern and ancient DNA have revealed that introgression is a common evolutionary process in the history of many species.Introgression has been found in swordtail fish [1], Heliconius butterflies [2,3], and from Neanderthals and Denisovans to modern-day non-African populations [4][5][6][7][8] as well as many other systems.These observations suggest that introgression is pervasive and thus determining its relative contribution to the evolution of a species is of evolutionary interest [9].Therefore, detecting and quantifying introgressed segments in the genome is necessary to begin measuring its biological importance.Introgression may introduce both adaptive and deleterious variation in the recipient population.For example, Tibetans inherited a beneficial haplotype at the EPAS1 gene from Denisovans through gene flow that facilitated high altitude adaptation to the hypoxic environment in the Tibetan plateau [10][11][12][13] which is an example of adaptive introgression-positive selection acting on introgressed variants [10,[14][15][16].Similarly, purifying selection has also acted on introgressed variation [17][18][19][20] to remove deleterious introgressed variants and under specific conditions can mimic signatures of adaptive introgression [18,21].
The most widely-used method to detect introgression using data from one or more individuals from each of four populations is the ABBA-BABA statistic, also known as Patterson's D statistic [4,5].This statistic has been used to detect introgression from Neanderthals and Denisovans into modern humans [4,22,23] as well as other systems.The D statistic uses species tree and gene tree discordances within a 4-population tree with two potential targets of introgression defined as population 1 (P 1 ) and population 2 (P 2 ); a donor population (P 3 ) as the source of gene flow to P 1 or P 2 , and an outgroup population (P 4 , see Fig 1A and 1B).The patterns of biallelic single nucleotide polymorphisms (SNP) generated by these gene trees (dotted lines in Fig 1A and 1B) provide information on the shared ancestry between lineages in each population.The D-statistic looks at patterns when the gene tree does not match the species/population tree, which can be due to chance through Incomplete Lineage Sorting (ILS) or gene flow from the donor population into P 1 or P 2 .While ILS will generate an equal number of discordant sites shared between P 3 and P 1 and P 3 and P 2 , introgression will result in an excess of shared sites between P 3 and either P 1 or P 2 .D is a measure of this excess number of shared derived alleles.
The D statistic was designed to detect genome-wide gene flow but has also been used to look for signals of gene flow in local regions of the genome.However, studies have found that D produces spurious inferences of gene flow when applied to areas of the genome with low nucleotide diversity [24,25].A previous study [25] partitioned butterfly genomes into small 5 kb windows and computed the D statistic in each window which showed that the D statistic becomes more unreliable when considering windows of low nucleotide diversity, because the variance of D is maximized in these windows.To improve inference of introgression in small windows [25] propose a new statistic, f d , that is a better estimator of the true introgression proportion.More recently [24] proposed to improve the D statistic by including the number of sites with an BBAA pattern-which is reduced in the presence of introgression-in the denominator of the D statistic.
In this study, we propose a new statistic, D + , to detect introgression in genomic windows.In addition to using the shared derived variation measured in the D statistic, D + also leverages shared ancestral variation between the donor population and the recipient population.Introgression introduces not only mutations that accrued in the donor population before the gene flow event, but also re-introduces ancestral alleles in the recipient population.Following [5], we derive the theoretical expectations for the D + statistic under a coalescent framework to study its properties as a function of the admixture proportion.We use simulations to measure its power, false positive rate and precision compared to the D statistic.We also measure its performance by applying it to humans and butterflies.We find that the D + statistic is more precise at detecting introgressed regions than the D statistic due to its lower false positive rate in small genomic regions, making it a useful statistic to identify local targets of introgression.

D + statistic
Patterson's D statistic uses species and gene tree discordance within a 4-population tree with two populations as potential targets of introgression, population 1 (P 1 ) and population 2 (P 2 ).Population 3 (P 3 ) is a source of gene flow to either P 1 or P 2 , and population 4 (P 4 ) serves as an outgroup (Fig 1).The patterns of biallelic single nucleotide polymorphisms (SNP) generated by the gene trees provide information on the shared ancestry between lineages in each population.Both the D and D + statistic look at site patterns yielded when the gene tree does not match the species tree.A mutation will convert an ancestral allele (A), determined by the allele present in the outgroup, into a derived allele (B).An ABBA site (Fig 1A ) describes a derived allele shared between P 3 and P 2 , while a BABA site occurs when a derived allele is shared between P 3 and P 1 .An ABBA or BABA site could arise due to incomplete lineage sorting (ILS) or gene flow.Under coalescent expectations, incomplete lineage sorting will generate equal numbers of gene trees with ABBA or BABA sites.An ABBA site can only be generated in a gene tree where P 3 and P 2 coalesce first before they find a common ancestor with P 1 .On the other hand, a BABA site only occurs on gene trees where P 1 and P 3 coalesce first before they find a common ancestor with P 2 .We expect an excess of ABBA sites when there is gene flow from P 3 to P 2 .
The D statistic measures an excess of ABBA or BABA sites [4,5].D is the normalized difference between ABBA and BABA sites, D ¼ . The D statistic assumes that the frequency of ABBA and BABA sites due to ILS is approximately equal.Therefore, an excess of shared derived sites between P 3 and P 2 , or ABBA sites, indicates gene flow from P 3 to P 2 as shown in Fig 1A .Conversely, an excess of BABA sites indicates gene flow from P 3 to P 1 .
We extend this idea by making use of the fact that introgressed regions are inherited in chunks that contain both shared derived alleles and ancestral alleles that are introduced into the recipient population.D + leverages the shared ancestral alleles between P 3 to P 2 to increase the amount of data about shared genetic variation in low nucleotide diversity regions.Sites where the ancestral allele is shared between P 3 and P 2 and the derived allele is only found in P 1 are BAAA sites (Fig 1B).In ABAA sites the ancestral allele is shared between P 3 and P 1 while P 2 has a derived allele.D + incorporates both shared derived alleles and ancestral alleles to strengthen our inferences of introgression.
When the sample size is bigger than one, we can write down the equation for D + as a function of the observed derived allele frequencies in populations P 1 , P 2 , P 3 or P 4 .If the frequency of the allele at site i for population j is pij , and we have L sites, While in this paper, we mostly focus on comparisons between D + and D, note that we could also define a statistic D ancestral that measures the excess of shared ancestral alleles between P 3 and P 2 in a similar manner that the D statistic measures an excess of shared derived alleles between P 3 and P 2 : D ancestral is normalized and ranges from -1 to 1, with D ancestral = 1 indicating gene flow from P 3 to P 2 and D ancestral = −1 indicating gene flow from P 3 to P 1 .D ancestral approximates zero under the null hypothesis of no gene flow.Durand et al. (2011) used a coalescent framework to derive the expectation of the D statistic under an instantaneous admixture model (IUA).The probability of getting an ABBA or BABA site is dependent on the mutation rate and the expected branch length of the branch where a mutation yields an ABBA site (T ABBA ) or the branch where a mutation yields a BABA site (T BABA ).The mutation rate μ is assumed to be constant.Therefore, the expected number of ABBA or BABA sites can be estimated by calculating the expectation of branch lengths of T ABBA and T BABA and multiplying by the mutation rate [5].Similarly, we can compute the probability of getting an ABAA or BAAA site (see S1 Appendix), and we derived the expected lengths of T BAAA and T ABAA following the same framework.The full derivation of the expectation of T BAAA and T ABAA is in S1 Appendix.We find that the analytical expectation of D + is As is true of ABBA and BABA sites, the expected number of BAAA and ABAA sites are equal when there is no gene flow.This is because, under no gene flow, we expect a similar amount of ancestral allele sharing between P 1 and P 3 and between P 2 and P 3 .In the case of the BAAA and ABAA sites, we expect a similar amount of BAAA and ABAA sites under no gene flow assuming the same mutation rate in P 1 and P 2 .As the admixture proportion from P 3 to P 2 increases, the number of BAAA sites exceeds the number of ABAA sites.The expected difference is a function of the admixture proportion f and the branch lengths of T P3 and T GF :

Simulations to verify theoretical results for n = 1
To verify the theoretical results under the demographic model depicted in Fig 2 , we calculate the expected branch lengths of T ABBA , T BABA , T BAAA and T ABAA and expectation of D and D + using mspms simulations (see Figs 3 and S4) for a range of admixture proportions (f).We ran 1,000,000 simulations of independent loci and averaged the branch lengths of T ABBA , T BABA , T BAAA and T ABAA from the Newick tree file output of each locus.The branches T ABBA , T BABA , T BAAA and T ABAA are the branches where a mutation would yield an ABBA, BABA, BAAA or ABAA site, illustrated in S1 Fig.We simulated small, independent loci with 250 sites per loci and a mutation rate of 10 −8 per bp per generation and no recombination.We also calculated D and D + from the number of ABBA, BABA, BAAA and ABAA sites per locus and averaged D and D + across all 1,000,000 loci.An example mspms simulation command for 1,000,000 independent loci with 1 sample taken from each of the 4 populations for an admixture proportion of 3% is: mspms 4 1000000 -t 0.1 -I 4 1 1 1 1 -es 0.1 2 0.97 -ej 0.1 5 3 -ej 0.25 1 2 -ej 0.5 2 3 -ej 20 3 4 -T.

Simulations to benchmark D + using n = 1 for all populations
To evaluate the precision and recall of D and D + we ran coalescent simulations using the software msprime [26].The simulations followed the model depicting the evolutionary history of modern humans (Fig 2).The African and Eurasian populations are P 1 and P 2 , respectively, and P 3 is the Neanderthal population.The African-Eurasian and Neanderthal divergence time T P3 was set 16,000 generations ago and the Eurasian and African divergence time T P2 was set 4,000 generations ago [16].The time of gene flow (T GF ) between Neanderthals and Eurasians was set 1,600 generations ago [16].We use an admixture proportion (f) of 3%.All simulations had a constant N e of 10,000, a mutation rate of 1.5*10 −8 per bp per generation and a recombination rate of 10 −8 per bp per generation following [16].We ran 100 simulations of 20 MB genomes with n = 1 for P 1 , P 2 and P 3 , and, in each run, we sampled a single haplotype to compute D + using Eq 1. D + can also be applied to populations with a sample size greater than 1.
We ran 100 msprime simulations with n = 200 genomes for P 1 and P 2 and n = 2 for P 3 and computed D + using derived allele frequencies (Eq 2).The full code for simulations can be found in a GitHub repository (https://github.com/LeslyLopezFang/Dplus).
To evaluate the performance of D + under different values for the admixture proportion, recombination rate and mutation rate under this demography, we ran 100 msprime simulations with the new parameter value for a 20 MB genome for n = 1 for P 1 , P 2 and P 3 under a model with no admixture and a model with admixture.We considered the following cases: f = 2%, 5% and 10%, a mutation rate of half the default mutation rate of 1.5*10 −8 per bp per generation, a mutation rate that was double the default mutation rate of 1.5*10 −8 per bp per generation, a recombination rate that was half the default recombination rate of 10 −8 per bp per generation, and a recombination rate that was double the default recombination rate of 10 −8 per bp per generation.

Calculating precision, recall and false positive rate in simulated human data
We ran msprime simulations described in the Methods section titled "Simulations to benchmark using n = 1 for all populations" using the parameters shown in When we sample a single lineage (n = 1) from each population, an introgressed window is defined as a window that has at least 10% of bases in the window overlapping with introgressed tracts from the chromosome sampled from P 2 .A 50 kb window would then have at least 5 kb bases that are introgressed from P 3 for the sampled chromosome from P 2 .Windows that have an overlap with introgressed tracts but that are less than 10% of the bases in the window are dropped.Most of the windows have an overlap of at least 50% of the window with introgressed tracts (S2 Fig) .When the sample size is more than 1, we have to redefine what is an introgressed window, and compute D and D + using our frequency-based definitions.
To compute recall, true positives are introgressed windows that are statistically significant, while the false negatives are introgressed windows that are not statistically significant.The false positives for the simulated data are windows that have no introgressed bases but are statistically significant.Precision measures the probability of a window truly being introgressed given that its D + value is statistically significant.Precision is the percentage of true positives out of the sum of true positives and false positives, or 1 -false discovery rate.Recall measures how many of the introgressed windows are statistically significant and is the percentage of true positives out of the sum of true positives and false negatives.Here, false positive rate is the percentage of false positives from windows without introgression (false positives and true negatives; see Figs 4 and 5).
We also compute recall when the sample size is bigger than one.However, in this case, it will be harder to detect windows with introgressed tracts at low frequency in the recipient population.Therefore, in simulations where we sample n = 200 chromosomes from P 2 , we redefine what is an introgressed window so that two conditions need to be true.First, a window needs to contain at least one introgressed tract that survives in at least 10% of the 200 chromosomes from P 2 .Second, the length of the tract (or the sum of the tract lengths if more than one tract in a window pass the first condition) is at least 10% of the window (which is 10% of 50 kb or 5kb).For an example see S3

Testing violations of a strict molecular clock
D + assumes a strict molecular clock such that all populations have the same mutation rate.When there is no gene flow, we can assume an equal number of ABBA and BABA sites, as well as an equal number of BAAA and ABAA sites.To test violations to this assumption we ran msprime simulations with n = 1 chromosome for P 1 , P 2 and P 3 where we increase the mutation rate of either P 1 or P 2 by increasing all of the divergence times T P2 , T P3 and T GF by T P2 in the model depicted in Fig 2 .For example, to double the mutation rate of P 1 , we increase all divergence times by T P2 and sample P 1 at time t = 0 and sample P 2 at T P2 .P 3 is sampled right after the modified T GF since P 3 is an archaic population.
We ran 100 msprime simulations using n = 1 chromosome for P 1 , P 2 and P 3 under the new divergence times with no introgression and with introgression (f = 3%).The performance of D + can be calculated using the null distribution with the new divergence times to assess statistical significance.Introgressed windows are windows where at least 10% of the bases in the 50 kb window are introgressed tracts for the haplotype from P 2 .We considered four cases: 1) P 1 with a mutation rate double the mutation rate of P 2 , 2) P 1 with a mutation rate ten times the mutation rate of P 2 , 3) P 2 with a mutation rate double the mutation rate of P 1 and 4) P 2 with a mutation rate ten times the mutation rate of P 1 .

Application of D + in modern-day humans
To evaluate the performance of D + at identifying introgressed regions in empirical data we apply D + to previously detected regions of Neanderthal introgression in modern-day humans.We assume that introgressed segments inferred in [7] using the Altai Neanderthal genome [27] are the true introgressed segments.From the 1000 Genomes Project [28] we used an individual from the YRI (Yoruba in Ibadan, Nigeria) population for P 1 and an individual from the GBR (British from England and Scotland) population for P 2 .P 3 is the Altai Neanderthal genome [27].The ancestral allele of each position was taken from the ancestral allele listed in the 1000 Genome Project.For the GBR individual we used a Neanderthal introgression map including all the haplotypes inferred to be Neanderthal with a probability > 90% in [7].We calculated D and D + in non-overlapping 50 kb windows using one autosomal chromosome of each individual from all three populations, discarding the first and last window of each chromosome.Note that here we assume that the phasing of the GBR individual is perfect, so we are able to compute two D + and two D values for each window in the genome.However, we use the same chromosome in the YRI individual to compute the two D + values for the GBR individual.Since phase is unavailable for the Neanderthal genome, we randomly sampled one of the Neanderthal alleles at each site.Each window had two D and D + values, one for each autosomal chromosome of the sampled GBR individual.
To find significance thresholds for the empirical data, we use all of the D and D + values from all of our windows (two per window) to build the empirical distributions for D and D + .If the maximum of the two D (or D + ) values in a window is in the top 2.5% of D and D + values for the empirical distribution, then it will be called statistically significant.To compute recall, we need to define a true introgressed window that will be called a true positive or false negative based on the empirical distribution of D and D + values.We defined a true introgressed window as a window with a set minimum percentage of bases that overlap with a Neanderthal introgressed segment (inferred in [7]).We used different values for the minimum percentage of bases needed to overlap with the Neanderthal segment for a window to be called introgressed.Using the empirical distribution, we call a window "introgressed" (or a true positive) when the maximum of its two D + values are statistically significant (i.e.values are in the top 2.5% of the distribution) and it has at least a pre-defined percentage of overlap with a Neanderthal introgressed segment (e.g. 5% to 100% in intervals of 5%, x-axis of Fig 6).Recall was then calculated by dividing the number of true positives by the total number of windows defined as true introgressed windows (based on having at least a pre-defined percentage of overlap with a Neanderthal introgressed segment).We assume that the introgression maps capture true positives or a subset of them; however, we cannot assume that regions not included in the introgression maps are true negatives.Therefore, we do not assess false positives or precision.
We also took computed recall mimicking the empirical approach in simulated data where we know what the ground truth is.Specifically, in our simulations we sampled one individual (n = 2 chromosomes) from P 1 , P 2 and P 3 and computed D and D + for each chromosome from the individual in P 2 , and also obtained the maximum D and D + value per window.In the simulated data, we only considered one value (10%) for the minimum percentage of overlap between a window and an introgressed tract.We computed precision and recall using all the D and D + values from all the windows as the null distribution.
Finally, we assess how recall is affected when phasing is unavailable.In this case, we use the same YRI, GBR and Neanderthal individuals and randomly sample a haplotype (n = 1 for all 3 populations) at every position to create a single haploid genome for each population.We partitioned the haploid genomes in non-overlapping 50 kb windows and compute D + values for each window.We do this experiment 100 times and compute recall for each experiment.

Application of D + in Heliconius butterflies
D was applied to Heliconius butterflies and found to have high variance in areas of low nucleotide diversity [25].To assess whether D + reduces variance in these areas of low nucleotide diversity we recreated Fig 3 from [25] using the same Heliconius genome data from [29].They show values of D as a function of nucleotide diversity π for P 2 (the recipient population) in non-overlapping regions of 5 kb.Only biallelic alleles were used.D was computed using derived allele frequencies and we also use the frequencies from the four populations to compute D + (Eq 2).
f d [25] and d f [24] were also computed for the 5 kb non-overlapping windows.f d was only applied to windows where D is positive.The equation for f d written in terms of derived allele frequencies with piD as the maximum of pi2 and pi3 is incorporates BBAA sites where only P 1 and P 2 share a derived allele.The equation for d f in terms of allele frequencies is Four samples were used, one each from H. melpomene aglaope (P 1 ), the recipient population H.m. amaryllis (P 2 ), the donor population H. timareta thelxinoe (P 3 ).The outgroup (P 4 ) consisted of a sample from species in the silvaniform clade including H. hecale, H. ethilla, H. paradalinus sergestus and H. pardalinus ssp.nov.The ancestral state of an allele was determined by the outgroup if the allele was fixed within the outgroup.Otherwise, it was the major allele of all four populations.The wing pattern loci HmB and HmYb are defined in [25].Code was adapted from [25] with details in GitHub repository (https://github.com/LeslyLopezFang/Dplus).
Similar to the method to find recall described in "Calculating precision, recall and false positive rate in simulated human data.",we calculate recall for D, D + , f d , d f and D ancestral .The windows that overlap the HmB and HmYb loci are considered introgressed windows.Recall here is the number of these introgressed windows that the introgression statistic identifies as introgressed.Each window has one value for each statistic, and we find the statistical thresholds for each statistic by finding the top 2.5% value of the distribution.

Theoretical results
The expectation for the values of D and D + is dependent on the lengths of the branches that produce each site pattern.T ABBA is the length of the branch starting from the time of the most recent common ancestor of P 2 and P 3 until that lineage coalesces with P 1 (which happens in the ancestral population P 123 under the instantaneous admixture model).The average length of the T ABBA branch increases with the migration rate (Fig 3 ).A mutation on this branch produces an ABBA site pattern.T BABA is then the length of the branch from the time of the most recent common ancestor of P 1 and P 3 until that lineage coalesces with P 2 .T BAAA and T ABAA are the external branches of P 1 and P 2 , respectively.When there is no gene flow, the average length of the external branches of P 1 or P 2 are equal.With gene flow between P 2 and P 3 , the external branch of P 1 will be longer than the external branch of P 2 ; therefore, the expectation of T BAAA increases with the admixture proportion f.The analytical and theoretical expectation of T ABBA , T BABA , T BAAA and T ABAA are shown in Fig 3 .The theoretical expectation of each branch takes into account all scenarios that could produce each site pattern, including gene flow and no gene flow (S1 Appendix).The simulated expected branch lengths approximate the theoretical expected branch lengths at all the admixture proportions (f) calculated.When there is no admixture, the number of ABBA sites is equal to the number of BABA sites as any sharing of derived alleles between P 3 and P 2 (or P 3 and P 1 ) is due to incomplete lineage sorting.In the case of ancestral sharing and under a model of no admixture, the number of BAAA sites and ABAA sites will be equal because we assume equal mutation rates in P 1 and P 2 .
For all values of migration between P 2 and P 3 , the expected branch lengths that can lead to a BAAA (T BAAA ) or a ABAA (T ABAA ) site are always greater than the expected branch lengths that can lead to an ABBA (T ABBA ) or BABA site (T BABA ).Therefore, if we assume a constant mutation rate, we expect to see more ABAA sites than BABA sites and more BAAA sites than ABBA sites.In Fig 3, assuming a constant mutation rate multiplied with the analytical and simulated expected branch lengths, there are 5-6 times more BAAA and ABAA sites than ABBA and BABA sites.
Interestingly, our theoretical results also show that even though the number of BAAA and ABAA is higher (than ABBA or BABA), the difference between T BAAA and T ABAA (T BAAA -T ABAA ) is equal to the difference (T ABBA -T BABA ).Therefore, for all admixture proportions between P 2 and P 3 , the expected difference of BAAA and ABAA sites (BAAA-ABAA) is equal to the expected difference of ABBA and BABA sites (ABBA-BABA).These observations suggest that leveraging ancestral shared variation can be informative about introgression and provides justification for defining D + which leverages both ancestral and derived allele sharing to maximize the number of informative sites used in a genomic window.This increase in informative sites can provide greater predictive accuracy for detecting local gene flow.D + can also be applied to sample sizes greater than one using derived allele frequencies with Eq 2. For simulations with n>1, we use admixture proportions greater than 3% (see Methods).We compute D + for simulations with admixture proportions of 10%, 20%, 30%, 40% and 50% and computed recall (S10A Fig) .Note that we have a different definition of what an introgressed window is, which we explained in the Methods section titled "Calculating precision, recall and false positive rate in simulated human data" and an example is provided in S3 Fig.
For a p-value of 0.05 D + has higher recall than D for all admixture proportions.The recall of D + increases as the admixture proportion increases.As two conditions need to be met to call a window introgressed (see Methods), we considered relaxing the first assumption involving the frequency of the introgressed tract in the recipient population (P 2 ).When we change the frequency of the introgressed tract(s) in P 2 , recall increases as the frequency of the tract increases (see S10B and S10C Fig).Furthermore, we ran additional simulations with a realistic admixture proportion of 3% and computed precision and recall for a different set of chromosome and window thresholds (see S1 Text section "Comparing performance of D and D+ for different chromosome and window thresholds").We find that for realistic p-values D+ will always have a higher precision and recall than D, which demonstrates that for a realistic admixture proportion, increased number of sampled chromosomes, and for all pairwise possibilities of chromosome and window thresholds D+ consistently outperforms D for detecting signals of introgression at a local scale (S25 and S26 Figs).

D + performs well under moderate violations of the molecular clock
Under a strict molecular clock, we expect the number of ABBA and BABA sites and the number of BAAA and ABAA sites to be equal under the null model with no gene flow.To assess the performance of D + when the mutation rate of P 1 and P 2 are not equal, we increased the mutation rate for P 1 and P 2 in comparison to each other by a factor of 2 and 10.
When P 1 has a higher mutation rate the amount of BAAA sites is greater than the amount of ABAA sites under no admixture (S11A Fig).This skews the distribution and average of D + towards 1, a signal of introgression from P 3 into P 2 .Similarly, when P 2 has a larger mutation rate, the amount of ABAA sites is greater than the amount of BAAA sites under the model of no admixture and makes the average of D + negative instead of 0, indicating gene flow from P 3 to P 1 (S11B Fig) .For a p-value of 0.05, precision of D + is 31.60%when the mutation rate of P 1 is double the mutation rate of P 2 and precision is 37.22% when the mutation rate of P 2 is double the mutation rate of P 1 (S12 Fig) .In a more extreme scenario, increasing the mutation rate of either P 1

PLOS GENETICS
or P 2 by a factor of ten decreases the precision in comparison to either P 1 or P 2 having double the mutation rate.At a p-value of 0.05, precision decreases to 23.95% when P 1 has a higher mutation rate than P 2 and precision decreases to 29.36% when P 2 has a higher mutation rate than P 1 (S12 Fig) .In general, these results suggest that precision is only mildly affected by the differences in mutation rate between P 1 and P 2 .
When P 2 is the population with double the mutation rate, recall of D + is 18.35%, and 23.77% when the mutation rate of P 1 is doubled (S12 Fig) .When P 1 is ten times the mutation rate of P 2 , recall is 16.32%, and 10.00% when P 2 is ten times the mutation rate of P 1 (S12 Fig) .It makes sense that recall is worse when the mutation rate is higher in P 2 since this increases the number of ABAA sites, so the difference (BAAA-ABAA) is smaller which means the signal of introgression is smaller.

D + identifies Neanderthal introgressed regions in modern-day humans
To investigate the behavior of D + in real data, we applied D + to modern-day humans [28] and an Altai Neanderthal [27] to find if signals of gene flow corresponded to previously identified Neanderthal introgressed regions.Unlike simulated data, in real human genomes we do not know the ground truth, and to compare the performance of D and D + , we assumed that the Neanderthal introgressed regions from [7] were the truth.We calculated D and D + windows for the two phased chromosomes of a single GBR individual from [28] to compute the recall of D and D + (Fig 6).Since the 50 kb windows will sometimes only partially contain an introgressed segment, we defined a window as introgressed if the window had a minimum percentage of bases overlapping with an introgressed segment (see Methods).Statistical significance was computed using the genome-wide distribution of D + values (or D values) as the null distribution.Recall is the number of these "true" introgressed windows that were called statistically significant over the total number of introgressed windows (see Methods).
Fig 6 shows that recall for D + was consistently better than D as a function of the minimum percentage of introgressed bases in a window.The recall decreases as the minimum overlap between an inferred introgressed segment and a window increases.This happens because the number of introgressed windows used to calculate recall decreases when we increase the amount of overlap to call a window introgressed.We note that for this analysis the D + value assigned for each window was the maximum of the two D + values for each of the two phased chromosomes in the GBR individual.We tested this method of choosing the maximum of D + values per window on simulated data (under the demographic history in Fig 2 ) and computed the precision and recall for this scenario (see S13 Fig) .We found that for a p-value of 0.05 the recall is 23.56% and precision is 44.69%.Taking the maximum of D + values per window did not affect the recall in comparison to the recall of D + when only one chromosome from P 2 is used to compute D + (see Fig 5).By comparison, the corresponding recall in the empirical data is around 59% (recall when the point on the x-axis is 10% in In the previous analysis, we assumed that the chromosomes of the GBR individual are perfectly phased.However, as the phasing is inferred, there could be phasing errors and/or often phased chromosomes may not be available.When phasing is unavailable, studies randomly sample a single allele to create a haploid genome (e.g. as in the Neanderthal genome).We tested how this works in both real and simulated data.We implemented this approach with the individuals we used for Fig 6 and compute recall 100 times and find that recall for a pvalue of 0.05 is on average 24.07%(see S14 Fig), similar to the recall in the simulated data, with recall ranging from 22.39-25.67%for all 100 runs.Therefore, we recommend that when the phase is not available or is uncertain, the user can randomly sample a chromosome from each individual as is often done in ancient DNA studies.

D + can detect introgression events in regions of low nucleotide diversity
One of the main reasons the D statistic is not useful for detecting introgression in small regions of the genome is that the variance of D is high in areas of low nucleotide diversity [25].To address this [25] proposed f d as an alternative approach to quantify and detect introgression in small genomic regions.The numerator of f d is in the same form as that of D; however, the denominator of f d replaces the derived allele frequency of P 2 and P 3 with the maximum derived allele frequency of P 2 and P 3 .This leads to f d having a lower variance in areas of low nucleotide diversity, thus reducing spurious results in comparison to D. Like f d , d f is also designed to quantify the admixture proportion of small genomic regions [24].The approach in d f is to incorporate BBAA sites as fewer sites with this pattern are expected when introgression occurs between P 2 and P 3 or between P 1 and P 3 .
Both f d , d f are estimates of the admixture proportion while D and D + are used to detect and not quantify introgression.To compare D + to f d and d f we used the same Heliconius genome data from [29].Heliconius butterflies have strong evidence for both genome-wide and adaptive introgression between species, including mimicry loci for wing patterns [14,29,30].We use these data to compute these statistics in windows as a function of nucleotide diversity, since the relationship between D and nucleotide diversity observed in [29] inspired the developments of new statistics to detect and quantify introgression in small windows of the genome.For the four populations, we use H. melpomene aglaope as P 1 , H. melpomene amaryllis as P 2 , H. timareta thelxinoe as P 3 and the H. hecale, H. ethilla, H. paradalinus sergestus and H. pardalinus ssp.nov.species in the silvaniform clade as the outgroup (P 4 ).We compute nucleotide diversity  ).Many of the windows that D + detects as introgressed correspond to the candidate regions for introgression that have been previously suggested in Heliconius butterflies and are associated with wing patterning (red and yellow points in Fig 7).In fact, D + detects approximately 68.4% of these candidate introgressed windows.In comparison, D detects 52.6%, d f detects 27.6% and f d detects 63.1%.We also computed D ancestral which only uses the ancestral shared patterns (ABAA and BAAA), and it detects 63.2% of the windows (S15 Fig), suggesting that using the ancestral site patterns alone is better behaved than the D statistic, which shows the utility of using ancestral shared variation.

Discussion
Multiple studies have found that introgression plays an important evolutionary role as it introduces new genetic variation in a population that can be targeted by natural selection; this is an accelerated process of accumulating new alleles compared to a de novo mutation process.Therefore, detecting which regions of the genome exhibit signatures of introgression is an important step to evaluate its relative contribution to evolution.To date, Patterson's D statistic is the most widely used metric for detection of introgression genome wide.While D works well at detecting introgression at the genome-wide scale, some studies have shown that D might not be the best choice to detect introgression in small regions of the genome.In this paper, we define a new statistic, D + , that leverages both sites with shared ancestral and sites with shared derived alleles to improve detection of introgression in small genomic windows.
First, we use coalescent theory to understand this statistic's theoretical properties and derive the expectation of D + as a function of gene flow.We show that the expected counts of BAAA sites and ABAA sites are equal under a model of no introgression.As the proportion of admixture increases, one of these two site patterns increases, implying that BAAA and ABAA sites are informative to detect introgression.Interestingly, our theoretical results also show that the expected difference in counts of BAAA and ABAA sites equals the expected difference of ABBA and BABA sites (  We also apply D + to detect Neanderthal introgression in a non-African individual.Unlike simulations, in real data we do not know the ground truth.Therefore, we evaluated D + by asking: if we assume the existing inferred maps [7] are the truth, how often do we call a window introgressed when it completely or partially contains an introgressed segment?Under these assumptions, we find that recall is around 59% (see Fig 6) which is similar to the simulation results under a complex demographic history (see S6 Fig) .Overall, our simulations and empirical data suggest that D + has statistical properties that make it more stable than D at detecting introgression in small genomic windows and provides an alternative method to detect introgression.
We also evaluated the performance of D + as a function of the admixture proportion, mutation rate and recombination rate, and for the values considered, our simulations show that the impact on recall or on precision is not too high (S7, S8 and S9 Figs).D + is also robust to realistic violations of the molecular clock under a human demography (S12 Fig) .We consider cases when the mutation rate differed (between P1 and P2) by a factor of two or ten.Differences in mutation rate can mimic signatures of introgression, and do affect the performance of the statistic (see S11 and S12 Figs).We find that a higher mutation rate in P 2 than P 1 would hinder the performance of D + more than a higher mutation rate in P 1 than P 2 because this produces more ABAA sites.We note, however, that in real data, it will be rare to observe differences in the mutation rate that are this extreme.
Another factor that could affect the performance of D + is differences in sequencing error in P1 vs. P2.To evaluate this, we simulated differences in sequencing errors across lineages, and they do not appear to have a large impact on the overall positive rate of D + (S16, S17 and S18 Figs; S1 Text).Thus, when D + is calculated on a window level we conclude that D + is robust to differential sequencing errors.Finally, we considered whether D + can distinguish incomplete lineage sorting (ILS) from introgression at the local level.Using the whole genome, we know that ILS leads to an equal number of BAAA and ABAA sites, so at the genome level these sites cancel out.Our simulation results show that D + is better than D at distinguishing ILS at the local level (see S19-S22 Figs; S1 Text).
There are other methods such as f d [25] and d f [24] that have been derived from Patterson's D to quantify the admixture proportion, f, in small genomic regions.f d leverages ABBA and BABA sites, d f leverages ABBA, BABA and BBAA sites, and D + leverages ABBA, BABA, BAAA and ABAA sites.To compare with these methods, we ran simulations following the demography depicted in Fig 2 and computed D, d f and D + and found that the performance of D + and d f are comparable (S10 Fig) .We also applied them to a Heliconius butterflies data set, and we found that similarly to f d and d f , the variance of D + is reduced in regions of low nucleotide diversity.This suggests that like f d and d f , D + will also not lead to a high number of false positives, especially in regions of low nucleotide diversity.Indeed, we find that many of the regions with a signal of introgression from windows contain previously identified candidate introgressed loci.All these statistics have both shared and distinct aspects in how they leverage genetic patterns, and future studies might focus on integration of these approaches to improve the detection and quantification of introgression.Specifically, probabilistic models that incorporate these site patterns as features might provide better inferences of introgression.We recognize that all these statistics have been benchmarked to detect or quantify introgression under very specific and simple demographic scenarios that may not closely reflect the true demographic histories of actual species or populations.Future studies that compare and contrast how different statistics that detect and quantify introgression [24,25,[31][32][33][34] behave under more complex demographic scenarios and under different evolutionary time scales will help characterize the behavior of these statistics and expand our understanding of the power and limitations of each method.
In summary, we have shown that ancestral shared variation between a donor and recipient population is influenced by the introgression proportion.Leveraging this ancestral sharing in the D statistic through D + can improve inferences of local introgression.D + be applied locally and on a genome-wide scale (S23 Fig) .Our results suggest that shared ancestral variation is informative for detecting introgression on both local and global scales, and might also be useful for deriving new estimators of the proportion of introgression that may help address how pervasive introgression is across the tree of life.Beyond their utility to detect introgression, there is evidence that archaic introgression may have re-introduced ancestral alleles with regulatory effects in humans [35], pointing to the importance of studying ancestral shared variation.We expect that more studies will reveal the effects and consequences of re-introducing ancestral variation, and that leveraging ancestral information may be informative on ghost admixture events from uncharacterized ghost populations [27].PLOS GENETICS

Fig 1 .
Fig 1. Species and gene trees depicting informative sites due to gene flow.(A) Shared derived allele between population 2 and population 3, or ABBA site, and (B) shared ancestral allele between population 2 and population 3, or BAAA site, due to gene flow from population 3 to population 2. The ancestral allele is denoted A and the derived allele is denoted B. T P4 is the time of divergence between population 4 and the ancestral population of population 1, population 2 and population 3. T P3 is the time of divergence between population 1 and the ancestral population of population 1 and population 2. T P2 is the time of divergence between population 1 and population 2. T GF denotes the time of gene flow from donor population to recipient population.https://doi.org/10.1371/journal.pgen.1010155.g001

Fig 2 .
Fig 2. Demographic model for msprime simulations.(P 1 ) and (P 2 ) are sister populations that are closely related to (P 3 ).P 1 and P 2 diverged at time T P2 (4,000 generations ago) and the ancestral population of P 1 and P 2 (P 12 ) diverged from P 3 at time T P3 (16,000 generations ago).There is gene flow from (P 3 ) to (P 2 ) at time T GF (1,600 generations ago) with an admixture proportion f = 3%.Divergence time of populations shown follow the demography of modern humans.https://doi.org/10.1371/journal.pgen.1010155.g002

Fig 3 .
Fig 3. Analytical and simulated expected branch lengths of T ABBA , T BABA , T BAAA and T ABAA .The analytical (lines) and simulated (dots) expected branch lengths of T ABBA , T BABA , T BAAA and T ABAA for different proportions of admixture f between P 3 and P 2 .The solutions to the analytical expectations match the simulated expectations.The branch length of T ABBA is the branch that would produce an ABBA site pattern.The expectation of T ABBA (E[T ABBA ]) can be used to calculate the expected number of ABBA sites.The same is true for T BABA , T BAAA , and T ABAA for their respective site patterns.With no admixture (f = 0) the expected branch lengths for ABBA and BABA sites are equal (E[T ABBA ] = E[T BABA ]), as are the expected branch lengths for BAAA and ABAA sites (E[T BAAA ] = E[T ABAA ]) because the number of ABBA sites equals BABA sites and the number of BAAA sites equals the number ABAA sites due to ILS.As the admixture proportion increases, the expectation of T ABBA and T ABBA increases due to excess ABBA and BAAA sites.The difference in T BAAA and T ABAA (T BAAA -T ABAA ) is equal to the difference in T ABBA and T BABA (T ABBA -T BABA ).https://doi.org/10.1371/journal.pgen.1010155.g003 Fig 2 without an instance of admixture at T GF to construct a null distribution for D and D + by sampling a genome from each population and computing D and D + in 50 kb non-overlapping windows.We take the significance threshold values for D and D + from their respective null distributions.For a p-value of 0.05, we get a signal of gene flow from P 3 to P 2 from the significance thresholds defined at the top 2.5% values from the null distribution of D and D + (see Fig 4).Undefined values (denominator divided by 0) of D or D + where no informative sites were present in the window were dropped.

Fig 4 .
Fig 4. Null distribution and false positive rate for D and D + in simulations with no gene flow.D and D + were calculated in 50 kb windows of 100 runs of a 20 MB simulated genome under a model with no admixture.(A) The average of the null distribution of D and D + is zero with a standard deviation of 0.74 for D and 0.26 for D + .The null distribution for D (red) is multi-modal at the tails with the tails (-1 and 1) accounting for 43.2% of the values of D. The null distribution of D + (blue) is centered around its average of zero.(B) False positive rates for D (red) and D + (blue) of null distribution.The p-value in the x-axis is used to set a significance threshold to get a false positive rate in the y-axis.D has a false positive rate of 43.2% with p-values less than 0.43.The false positive rate of D + is similar to the corresponding p-values up until p-value> = 0.94.https://doi.org/10.1371/journal.pgen.1010155.g004 Fig.In S10A Fig shows recall for D, D + and d f as a function of the proportion of introgression.In S10B Fig, the proportion of introgression is set to f = 10% and recall is computed as a function of the required tract-frequency in P 2 within a window (first condition necessary to define a window as introgressed).

Fig 5 .
Fig 5. Precision and recall of D and D + in simulations.The Precision-Recall of D and D + for simulations with an admixture proportion of 3. D (red) and D + (blue) were computed in non-overlapping 50 kb windows of 100 simulations of a 20 MB genome from each population with an admixture proportion of 3% (f = 0.03).(A) Precision and (B) recall are shown as a function of the p-value (0.01-1) used to get a significant threshold value of D and D + .https://doi.org/10.1371/journal.pgen.1010155.g005

Fig 6 .
Fig 6.Recall of D and D + in human data.The recall of D and D + in non-overlapping 50 kb windows.Windows overlap with Neandertal introgression maps [7] from 5% to 100%.The populations are as follows: P 1 : YRI, P 2 : GBR, P 3 : Altai Neandertal, P 4 : Ancestral Alleles.Data for humans from 1000 Genomes Project [28] and data for Altai Neandertal from [27].https://doi.org/10.1371/journal.pgen.1010155.g006 S8 and S9 Figs for a larger and smaller mutation rate and recombination rate.For a p-value of 0.05, increasing the mutation rate by a factor of two increases precision by approximately 4% and increases recall by approximately 2% (S8 Fig compared toFig 5).In contrast, when the mutation rate is decreased by a factor of two, the precision and recall drop by 4% and 7% respectively (S8 Fig).Decreasing the recombination rate decreases the number of windows that are introgressed.These windows contain more overlap with an introgressed segment since the introgressed segments are longer.By contrast, increasing the recombination rate by a factor of 2 almost doubled the number of windows that are introgressed; however, the introgressed segments within a window are shorter in length.When the recombination rate decreases by a factor of two, recall increases by 2% (S9B Fig) and precision increases by 7% (S9A Fig).When the recombination rate is doubled, recall decreased by 7% (S9E Fig) and precision increases by 2% (S9D Fig).For p-values of 0.01, 0.02, 0.03, 0.04 and 0.05, the false positive rate for a model with no gene flow remains relatively constant as the admixture proportion, mutation rate or recombination rate change (S7C, S7F, S7I, S8C, S8F, S9C and S9F Figs).

Fig 6 )
which is similar to the recall under a more complex human demographic model (S6 Fig).The complex human demographic model (shown in S5 Fig) includes a smaller effective population size for the Neanderthal population.
π, f d , d f , D and D + in non-overlapping 5 kb windows.Windows from the candidate introgressed loci responsible for the red wing pattern (HmB) and the yellow and white wing pattern (HmYb) are shown in red and yellow, respectively, in Fig 7. We find similar results as [25]; D has a high variance and a wide distribution in regions of low nucleotide diversity (Fig 7A).As nucleotide diversity increases the distribution of D narrows.f d reduces the high variance of values in areas of low nucleotide diversity (Fig 7B).d f also reduces variance with most of the d f values centered around zero, including windows with the HmB and HmYb loci (Fig 7C).D + has smaller variance with fewer outliers than D and similar variance to d f (Fig 7D).D + detects candidate regions of introgression, including windows not detected by D (see S1 Table Fig 3).However, in general there are more BAAA and ABAA sites than ABBA and BABA sites.D + is more conservative than D with a smaller expectation and variance than D in small genomic windows (Figs 4 and S1).As a result, D + has less false positives than D, likely because D + includes more informative sites (Fig 4).Therefore, D + also has better precision than D in simulated data under the Neanderthal admixture model presented in Fig 2 (Fig 5A) and under more realistic human demography models (S6 Fig).

Fig 7 .
Fig 7. Application of D, f^d , d f , and D + in Heliconius butterfly.(A) D, (B) f^d , (C) d f and (D) D + as a function of nucleotide diversity in P 2 in non-overlapping 5 kb windows.P 1 : H. melpomene aglaope, P 2 : H. melpomene amaryllis, P 3 : H. timareta thelxinoe, P 4 : H. hecale, H. ethilla, H. paradalinus sergestus and H. pardalinus ssp.nov.from the silvaniform clade.Red and yellow circles correspond to windows with introgressed loci HmB and HmYb, respectively.Methods follow Fig 3 from [25] with Helicionius genome data from [29].https://doi.org/10.1371/journal.pgen.1010155.g007 sequencing errors in both P 1 and P 2 (column D), where we simulated a genome size of 100 Mb and assumed a sequencing error rate of 1e-4.(TIFF) S19 Fig. Distribution of D values conditioned on coalescent histories.Based on 100,000 replicate simulations of unliked loci with a mutation rate of 1.5e-8 the distributions of D are shown for coalescent histories of ILS with no introgression (top row), and coalescent histories of ILS (middle row) vs introgression (bottom row) given and admixture proportion of 0.03 and the IUA demographic model described in the methods section for loci of size 10kb (column A), 20kb (column B), 30kb (column C), 40kb (column D), and 50kb (column E). (TIFF) S20 Fig. Quantile-Quantile plot corresponding to D value distributions conditioned on coalescent histories.Based on 100,000 replicate simulations of unliked loci with a mutation rate of 1.5e-8 the observed quantiles (y-axis), theoretical quantiles (x-axis), and the line of best fit with the associated coefficient of determination are shown for coalescent histories of ILS with no introgression (top row), and coalescent histories of ILS (middle row) vs introgression (bottom row) given and admixture proportion of 0.03 and the IUA demographic model described in the methods section for loci of size 10kb (column A), 20kb (column B), 30kb (column C), 40kb (column D), and 50kb (column E) to assess if the observed D distributions are normally distributed around mean 0 and scaled by the observed standard deviation of each respective distribution.(TIFF) S21 Fig. Distribution of D+ values conditioned on coalescent histories.Based on 100,000 replicate simulations of unliked loci with a mutation rate of 1.5e-8 the distributions of D+ are shown for coalescent histories of ILS with no introgression (top row), and coalescent histories of ILS (middle row) vs introgression (bottom row) given and admixture proportion of 0.03 and the IUA demographic model described in the methods section for loci of size 10kb (column A), 20kb (column B), 30kb (column C), 40kb (column D), and 50kb (column E). (TIFF) S22 Fig. Quantile-Quantile plot corresponding to D+ value distributions conditioned on coalescent histories.Based on 100,000 replicate simulations of unliked loci with a mutation rate of 1.5e-8 the observed quantiles (y-axis), theoretical quantiles (x-axis), and the line of best fit with the associated coefficient of determination are shown for coalescent histories of ILS with no introgression (top row), and coalescent histories of ILS (middle row) vs introgression (bottom row) given and admixture proportion of 0.03 and the IUA demographic model described in the methods section for loci of size 10kb (column A), 20kb (column B), 30kb (column C), 40kb (column D), and 50kb (column E) to assess if the observed D+ distributions are normally distributed around mean 0 and scaled by the observed standard deviation of each respective distribution.(TIFF) S23 Fig. Genome-wide power of D and D+ to detect introgression.Power of D (blue) D+ (green) to detect introgression from 100 replicate simulations with a genome size of 100 Mb for sample sizes of n = 1 (top row) and n = 100 (bottom row) monoploid genomes from P 1 and P 2 under the IUA model described in the methods (column A), and a realistic model of human demographic history described in Ragsdale and Gravel 2019 for the CEU (column B) and CHB (column C) populations.(TIFF)