Models of archaic admixture and recent history from two-locus statistics

We learn about population history and underlying evolutionary biology through patterns of genetic polymorphism. Many approaches to reconstruct evolutionary histories focus on a limited number of informative statistics describing distributions of allele frequencies or patterns of linkage disequilibrium. We show that many commonly used statistics are part of a broad family of two-locus moments whose expectation can be computed jointly and rapidly under a wide range of scenarios, including complex multi-population demographies with continuous migration and admixture events. A full inspection of these statistics reveals that widely used models of human history fail to predict simple patterns of linkage disequilibrium. To jointly capture the information contained in classical and novel statistics, we implemented a tractable likelihood-based inference framework for demographic history. Using this approach, we show that human evolutionary models that include archaic admixture in Africa, Asia, and Europe provide a much better description of patterns of genetic diversity across the human genome. We estimate that an unidentified, deeply diverged population admixed with modern humans within Africa both before and after the split of African and Eurasian populations, contributing 4 − 8% genetic ancestry to individuals in world-wide populations.


Introduction
The study of genetic diversity in human populations has shed light on the origins of our species and our spread across the globe. With the growing abundance of sequencing data from contemporary and ancient humans, coupled with archaeological evidence and detailed models of human demography, we continue to refine our understanding of our intricate history. Accurate demographic models also serve as a statistical foundation for the identification of loci under natural selection and the design of biomedical and association studies.
Whole-genome sequencing data are high dimensional and noisy. In order to make inferences of history and biology, we rely on summary statistics of variation across the entire genome and in many sequenced individuals. One such statistic that is commonly used for demographic inference is the distribution of SNP allele frequencies in one or more populations, called the sample or allele frequency spectrum (AFS) [1][2][3][4]. AFS-based inference has proven to be a powerful inference approach, yet it assumes independence between SNPs and therefore ignores information contained in correlations between neighboring linked loci, which is also referred to as linkage disequilibrium (LD).
While two-locus statistics have been extensively studied [13][14][15][16][17][18][19][20][21][22], most of this work focused on a single population at equilibrium demography, precluding their application to realistic demographic scenarios. Recently, approaches for computing the full two-locus sampling distribution for a single population with non-equilibrium demography were developed via the coalescent [8] or a numerical solution to the diffusion approximation [23], allowing for more robust inference of fine-scale recombination rates and single population demographic history. However, there remain significant limitations. Computing the full two-locus haplotype frequency spectrum is computationally expensive, hindering its application to inference problems that require a large number of function evaluations. Alternatively, computationally efficient low-order equations for specific LD statistics have been proposed [12,14], but these have seen limited application and only to single populations.
In this article, we show that the moment system of Hill and Robertson [14] can be expanded to compute a large family of one-and two-locus statistics with flexible recombination, population size history, and mutation models. Additionally, we show that the system can be extended to multiple populations with continuous migration and discrete admixture events, and that low order statistics can be accurately and efficiently computed for tens of populations with complex demography.
We use this moment system together with likelihood-based optimization to infer multipopulation demographic histories. We reexamine how well widely used models of human demographic history recover observed patterns of polymorphism, and find that these models underestimate LD among low frequency variants in each population, sometimes by a large amount. The inclusion of admixture from deeply diverged lineages in both Eurasian and African populations resolves these differences, and we infer an archaic lineage contributed * 6 − 8% genetic ancestry in two populations in Africa. By jointly modeling a wide range of summary statistics across human populations, we can reveal important aspects of our history that are hidden from traditional analyses using individual statistics.

Models and methods
To compute a large set of summary statistics for genetic data, we use mathematical properties of the Wright-Fisher model that are related to the look-down model of Donnelly and Kurtz [24,25]. To illustrate this process, we first build intuition through familiar equations from population genetics and then explain how these fit within a larger hierarchy of tractable models.
In this section, we therefore begin with evolution equations for heterozygosity and the frequency spectrum, then turn to recursions for low-order LD statistics and show that the classical Hill-Robertson [14] system for D 2 can be extended to arbitrary moments of D, multiple populations, and even the full sampling distribution of two-locus haplotypes. Mathematical details and expanded discussion for each result are given in S1 Appendix. Throughout this article, we assume that human populations can be described, approximately, by a finite number of randomly mating populations. We also assume an infinite-sites model in the main text and describe a reversible mutation model in Appendix S1.1.3.

Motivation: Single site statistics and the allele frequency spectrum
The most basic measure of diversity is expected heterozygosity E½H�, or the expected number of differences between two haploid copies of the genome. Given E½H� at time t, population size N(t) and mutation rate u, Wright [26] showed that enumerating all distinct ways to choose parents among two lineages leads to a recursion for E½H�, To leading order in 1/N and u, two copies of the genome are different if their parents were distinct (which has probability 1 À 1 2N ) and carried different alleles (which has probability E½H� t ), or if there was a mutation along one of two lineages (which has probability 2u).
Heterozygosity is a low-order statistic: we require only two copies of the genome to estimate E½H� genome-wide. More samples provide additional information that can be encoded in the sample AFS F n , the distribution of allele counts within a sample of size n. Specifically, F n (i) is the number (or proportion) of loci where the derived allele is observed in i copies out of n samples.
A standard forward approach to compute F n involves numerically solving the partial differential equation for the distribution of allele frequencies in the full population and then sampling from this distribution for the given sample size n (e.g. Gutenkunst et al. [2]). By enumerating mutation events and parental copying probabilities in a sample of size n, Jouganous et al. [3] showed that Eq 1 can be generalized to a recursion for {F n (i)} i=0,. . .,n (Fig 1). E½H� can be seen as a special case equal to F 2 (1), the i = 1 bin in the size n = 2 frequency spectrum. These recursions can also be derived as moment equations for the diffusion approximation [3,27,28].

Two-locus statistics
We will use this same intuition for the two-locus theory. First consider the model for two loci that each permit two alleles: alleles A/a at the left locus, and B/b at the right. There are four possible two-locus haplotypes, AB, Ab, aB, and ab, whose frequencies sum to 1 in the population. LD between two loci is measured as the covariance of their allele frequencies: Therefore D can also be interpreted as the probability of drawing two lineages from the population and observing one lineage of type AB and the other of type ab, minus the probability of observing the two cross types Ab and aB. As such, E½D� is a two-haplotype statistic, meaning we require just two haploid copies of the genome (or a single phased diploid genome) to estimate genome-wide E½D�, in the same way that the expected heterozygosity E½H� is a two-sample statistic of single-site variation.
Moment equations for D and D 2 . Enumerating possible copying, recombination, and mutation events for two lineages also leads to a well-known recursion for E½D� [13]. The possibility of sharing a common parent from the previous generation leads to the same 1 2N decay familiar from Eq 1. E½D� also decays due to recombination with rate proportional to the probability r of a recombination event between two loci in a given generation. Throughout, we assume r � 1, so that higher order terms may be ignored. For loosely linked or unlinked linked loci (r = 1/2), higher order terms must be considered [14].
To leading order in r, u, and 1 2N , we have Mutation doesn't contribute to E½D� because any mutation event is equally likely to contribute positively or negatively to the statistic. As a result, D is expected to be zero across the genome. Right: the corresponding two-locus statistics, including the Hill-Robertson system for E½D 2 �, rely on statistics of the same or lower order. Closed recursions can be found for any given E½D m �, leading to a sparse, linear system of ODEs. We denote However, the second moment E½D 2 � is positive. Hill and Robertson [14] found a recursion for a triplet of statistics including E½D 2 �, which we write as where p is the allele frequency of A, and q is the allele frequency of B. The recursion is where D and R are matrix operators for drift and recombination, respectively. To leading order in 1 2N and r, these take the form and R r ¼ r The three statistics in the Hill-Robertson system have a natural interpretation. E½D 2 � is the variance of D and has received plenty of attention over the years. The second statistic includes a term z = (1 − 2p)(1 − 2q) whose magnitude is largest when there are rare alleles at both loci, and which is positive when p and q both correspond to the minor allele (or both to the major allele). Thus E½Dð1 À 2pÞð1 À 2qÞ� ¼ E½Dz� measures positive covariance among low frequency variants. Fig 2A-2C shows that the decay of E½D 2 � and E½Dz� are sensitive to demographic history. Fig A2 in S1 Appendix shows how the bulk of the Dz statistic is contributed by pairs of variants where the rarest allele has frequency between 2 and 20%, while common variants comprises the bulk of D 2 .
E½p 2 � ¼ E½pð1 À pÞqð1 À qÞ� is the joint heterozygosity across pairs of SNPs. If we sample four haplotypes from the population, this is proportional to the probability that the first pair differ at the left locus, and the second pair differ at the right locus.
The applications in this article focus on generalizing the Hill-Robertson equations to multipopulation settings. However, we first outline generalizations to high-order moments and non-neutral evolution, leaving theoretical developments and simulations to the Appendix.
Generalizing to higher moments of D. The existence of tractable higher-order moment equations for one-locus statistics [3] suggests the existence of a similar high-order system for two-locus statistics. Higher moments of D provide additional information about the distribution of two-locus haplotypes. Appendix S1.1 shows that the Hill-Robertson system can be extended to compute any moment of D, and presents recursions for those systems of arbitrary order D m that closes under drift, recombination, and mutation.
This family of recursion equations takes a form similar to the D 2 system: the evolution of E½D m � requires E½D mÀ 1 z� and E½D mÀ 2 p 2 �, with each of those terms depending on additional terms of the same order and smaller orders (Fig 1). For any order m, Appendix S1.4.1 shows the system closes and forms a hierarchy of moment equations, in that the D m recursion contains the D m−2 system, which itself contains the D m−4 system, and so on (Fig A1 in S1 Appendix). Just as the Wright equation for heterozygosity generalizes naturally to equations for the more informative distribution of allele frequency [3], the Hill and Robertson equations for E½D� and E½D 2 � generalize to informative higher-order LD statistics.
Generalizing to arbitrary two-locus haplotype distribution. Given the analogy between the frequency spectrum and the Hill-Robertson equations, it is natural to study the connection between the moment equations for E½D n � and the evolution of the two-locus haplotype fre- While classical approaches for computing C n [18,20] were limited to neutrality and steadystate demography, recent coalescent and diffusion developments allow for C n to be computed under non-equilibrium demography and selection [8,23]. These approaches are computationally expensive and limited to one population, as C n has size ðnþ1Þðnþ2Þðnþ3Þ 6 , and the P-population distribution grows asymptotically as n 3P .
Generalizing the approach of Jouganous et al. [3], we can write a recursion equation on the entries of C n under drift, mutation, recombination, and selection at one or both loci (Appendix S1.3). As expected, this recursion does not close under selection: to find C n at time t + 1, we require C n+1 and C n+2 at time t. It also does not close under recombination, requiring a closure approximation. Using the same closure strategy for selection and recombination, however, we can approximate the entries of C n+1 and C n+2 as linear combinations of entries in C n and obtain a closed equation. This approach provides accurate approximation for moderate n under recombination and selection (Appendix S1.3.5) that represent a 10 to 100-fold speedup over the numerical PDE implementation in [23] (Table A1 in S1 Appendix). However, closure is inaccurate for small n.
By contrast to the full two-locus model, equations for moments of D close under recombination because the symmetric combination of haplotype frequencies that define D ensures the To illustrate the effect of admixture on LD curves, we consider two populations in isolation for 2N generations, followed by an admixture event where the focal population receives 1% of lineages from the diverged population. (E) E½D 2 � curves are largely unaffected by this low level of admixture. (F) However, E½Dz� is immediately and strongly elevated following admixture, and remains significantly elevated for prolonged time T (in units of 2N generations) since the admixture event.
https://doi.org/10.1371/journal.pgen.1008204.g002 cancellation of higher-order terms (Appendix S1.1.2). This makes the moments of D particularly suitable for rapid computation of low-order statistics over a large number of populations.
The Hill-Robertson system does not close, however, if one or both loci are under selection. Appendix S1.1.4 considers a model where one of the two loci is under additive selection. We derive recursion equations for terms in the E½D 2 � system and describe the moment hierarchy and a closure approximation, though we leave its development to future work. In the following we focus on neutral evolution.

Multiple populations
While a large body of work exists for computing expected LD in a single population, little progress has been made toward extending these models to multiple populations. Forward equations for the full two-locus sampling distribution become computationally intractable beyond just a single population, even with the moment-based approach described above. Here, we extend the Hill-Robertson system to any number of populations, allowing for population splits, admixture, and continuous migration.
Motivation: Heterozygosity across populations. To motivate our derivation of the multi-population Hill-Robertson system and provide intuition, we begin with a model for heterozygosity across populations with migration. With two populations we consider the crosspopulation heterozygosity, E½H 12 � ¼ E½p 1 ð1 À p 2 Þ� þ E½p 2 ð1 À p 1 Þ�, where p i and q i are allele frequencies at the left and right loci, respectively, in population i. This is the probability that two lineages, one drawn from each population, differ by state. At the time of split between populations 1 and 2, E½H 1 � ¼ E½H 2 � ¼ E½H 12 �. Because coalescence between lineages in different populations is unlikely, E½H 12 � is not directly affected by drift. In the absence of migration and under the infinite-sites assumption used here, this statistic increases linearly with the mutation rate over time (Fig A3 in S1 Appendix).
With migration, the evolution of E½H 12 � also depends on E½H 1 � and E½H 2 �. We define the migration rate m 12 to be the probability that a lineage in population 2 has its parent in population 1. Assuming m ij � 1, the probability that both lineages in E½H 12 � come from population 1 is m 12 (to leading order), in which case E½H 12 � tþ1 is equal to E½H 1 � t , and the probability that both come from population 2 is m 21 . Then to leading order in m ij , we have Similar intuition leads to recursions for E½H 1 � and E½H 2 � under migration, and this system easily extends to more than two populations.
The Hill-Robertson system with migration. We take the same approach to determine transition probabilities in the multi-population Hill-Robertson system. Suppose that at some time, a population splits into two populations. At the time of the split, expected two-locus statistics (D 2 , Dz, π 2 ) in each population are each equal to those in the parental population at the time of split (Appendix S1.2.1). Additionally, the covariance of D between the two populations, E½D 1 D 2 �, is initially equal to E½D 2 � in the parental population. In the absence of migration, Hill-Robertson statistics in each population evolve according to Eq 3, and With migration, additional moments are needed to obtain a closed system. These additional terms take the same general form as the original terms in the Hill-Robertson system, but include cross-population statistics, analogous to H 12 in the heterozygosity model with migration. Again using y to denote bases of Hill-Robertson moments, this basis is where P is the number of populations, and we slightly abuse notation so that D i D j stands in for all index permutations (D 2 1 , D 2 2 , and D 1 D 2 in the two-populations case). We derive transition probabilities under continuous migration in Appendix S1.2.2 leading to the closed recursion, where D, M, R, and U are sparse matrices for drift, migration, recombination and mutation that depend on the number of populations, population sizes N(t), and migration rates m. Admixture. Patterns of LD are sensitive to migration and admixture events, and low order LD statistics are commonly used to infer the parameters of admixture events [10,29]. A well-known result (e.g., example 2.7 in [30]) is that D in an admixed population can be nonzero even when D is zero in both parental populations if allele frequencies differ between the two parental populations. This is seen by enumerating all possible combinations of haplotype sampling when a fraction f of lineages were contributed by population 1, and 1 − f by population 2 (Appendix S1.2.3). More generally, immediately following the admixture event, the expectation E½D adm � in the admixed population is where δ = (p 1 − p 2 )(q 1 − q 2 ) [31].
To integrate the multi-population D 2 system after an admixture event, we require E½D 2 adm � and other second order terms in the basis (5) involving the admixed population. Using the same enumeration approach as for Eq 7, the expectation immediately following the admixture event is Each other required term can be found in a similar manner (Appendix S1.2.3). In this way, the set of moments may be expanded to include the admixed population and integrated forward in time using Eq 6.

Numerical implementation
We rescale time by 2N ref generations (N ref is an arbitrary reference population size, often the ancestral population size), so that the recursion can be approximated as a differential equation where ν are the relative population sizes at time Each matrix is sparse, and this equation can be solved efficiently using a standard Crank-Nicolson integration scheme. Our implementation allows users to define general models with standard demographic events (migrations, splits and mergers, size changes, etc.) similar to the interface familiar to @a@i and moments [2,3]. A single evaluation of the four-population model shown in Fig A4 in S1 Appendix can be computed in roughly 0.1 second. We packaged our method with moments [3] as moments.LD, a python module that computes expected statistics and performs likelihood-based inference from observed data (described below), available at bitbucket.org/ simongravel/moments. Validation. We validated our numerical implementation and estimation of statistics from simulated genomes using msprime [32]. Expectations for low-order statistics match closely with coalescent simulations. For example, Fig A4 in S1 Appendix shows the agreement for a four population model with non-constant demography, continuous migration, and an admixture event, for which we computed expectations using moments.LD that matched estimates from msprime. While approximating expectations from msprime required the time-consuming running and parsing of many simulations, expectations from moments.LD were computed in seconds on a personal computer.

Data and inference
Genotype data. Computing D using the standard definition requires phased haplotype data (Appendix S1.5). However, most currently available whole genome sequence data is unphased, so that we must rely on two-locus statistics based on observed genotype counts instead of haplotype counts. One could estimate haplotype statistics using the Weir [33] esti-matorD where n A is the count of A at the left locus, n B the count of B at the right locus, n d the number of diploid individuals in the sample, and {n AABB , n AABb , . . .} the counts of each observed genotype. However, the Weir estimator for D is biased. Fortunately, we can simply treat the Weir estimatorD as a statistic and obtain an unbiased prediction for its expectation (Appendix S1.7.3). Even though E½D n � can be estimated from 2n phased haplotypes, more samples are required to accurately estimate LD for a given pair of SNPs. However, as we are interested in genome-wide averages ofD and other LD statistics, even when individual estimates are noisy, by averaging over a very large number of pairs of SNPs we can accurately estimate LD from relatively few diploid genomes. 1000 Genome Project data. We computed statistics from intergenic data in the Phase 3 1000 Genomes Project data [34]. The non-coding regions of the 1000 Genomes data is low coverage, which can lead to significant underestimation of low frequency variant counts, which distorts the frequency spectrum and can lead to biases in AFS-based demographic inference [35]. However, low-order statistics in the Hill-Robertson system are robust to low coverage data in a large enough sample size (Fig A6 in S1 Appendix), so that low coverage data are well suited for inference from LD statistics (see also [12]).
To avoid possible confounding due to variable mutation rate across the genome, we calculated and compared statistics normalized by π 2 , the joint heterozygosity: s 2 d ¼ E½D 2 �=E½p 2 �, as in [12]. All figures showing s 2 d -type statistics are normalized using π 2 (YRI), the joint heterozygosity in the Yoruba from Ibidan, Nigeria (YRI). This normalization removes all dependence of the statistics on the overall mutation rate, so that estimates of split times and population sizes are calibrated by the recombination rate per generation instead of the mutation rate [23]. This is convenient given that genome-wide estimates of the recombination rate tend to be more consistent across experimental approaches than estimates of the mutation rate.
We considered all pairs of intergenic SNPs with 10 −5 � r � 2 × 10 −3 using the African-American recombination map estimated by Hinch et al. [36] using ancestry switch-points. The lower bound was chosen to further reduce the potential effect of short-range correlations of mutation rates, clustered mutations, experimental error, and low resolution of the recombination map at very short distances.
Likelihood-based inference on LD-curves. To compare observed LD statistics in the data to model predictions, and thus to evaluate the fit of the model to data, we used a likelihood approach. We binned pairs of SNPs based on the recombination distance separating them (Appendix S1.7.2). Bins were defined by bin edges {r 0 , r 1 , . . ., r n }, roughly logarithmically spaced. The model is defined by the set of demographic parameters Θ. We included the ancestral N ref as a parameter to be fit, which we also use to scale recombination bins as For a given recombination bin (ρ i , ρ i+1 ], we computed statistics and normalized by π 2 in one population (we used π 2 (YRI)), and denote this set of normalized statisticsv. We computed expectations for normalized statistics from the model, M i , and then estimated the likelihood as taking the probability of observing datav to be normally distributed with mean M and covariance matrix S (the normal distribution assumption is validated in Fig A5 in S1 Appendix).
We estimated S directly from the data by constructing bootstrap replicates from sampled subregions of the genome with replacement. This has the advantage of accounting for the covariance of statistics in our basis, as well as non-independence between distinct neighboring or overlapping pairs of SNPs. To compute the composite likelihood across ρ bins, we simply took the product of likelihoods over values of recombination bins indexed by i, so that To compute confidence intervals on parameters, we used the approach proposed by Coffman et al. [37], which adjusts uncertainty estimates to account for non-independence between recombination bins and neighboring pairs of SNPs.

Human expansion models underestimate LD between low frequency variants
The demographic model for human out-of-Africa (OOA) expansion proposed and inferred by Gutenkunst et al. [2] has been widely used for subsequent simulation studies, and parameter estimates have been refined as more data became available [3,35,38]. These models have typically been fit to the single-locus joint AFS, with Yoruba of Ibidan, Nigeria (YRI), Utah residents of Western European ancestry (CEU), and Han Chinese from Beijing (CHB) as representative panels. Gutenkunst et al. verified that the observed decay of r 2 was consistent with simulations under their inferred model. We first asked if the OOA model ( Fig 3A) is able to capture observed patterns of LD within and between these three populations. When fitting to all statistics in the multi-population basis, parameters diverged to infinite values, suggesting that the model is mis-specified. In particular, this model was unable to describe observed Dz statistics, with Dz-curves from the model drastically underestimating observations. We refit the OOA model without including Dz statistics, and we inferred best-fit parameters that generally align with estimates using the joint AFS (Table 1, left, and Fig 3). This model underestimated observed Dz in each population, especially in the YRI population (Fig 3D). Using AFS-inferred parameters from previous studies led to qualitatively similar results.
The Gutenkunst model is a vast oversimplification of human evolutionary history, so its failure to account for Dz is not all that surprising. However, given the good agreement of the model to both allele frequencies and r 2 decay [2], we did not expect such a large discrepancy. Having ruled out low coverage and spatial correlations in the mutation rate as explaining factors, our next hypothesis was a more complex demographic history. We generalized the Gutenkunst model with a number of additional parameters accounting for recent events, including size changes in the YRI population, recent mixture between populations, and substructure within each continental population. None of these modifications provided satisfactory fit to the data and some did not converge to biologically realistic parameters.

Inference of archaic admixture
E½Dz� is a measure of positive covariance between low-frequency alleles (Fig A2 in S1 Appendix). We therefore expect this statistic to be sensitive to the presence of rare, deep-coalescing lineages within the population, as those lineages will contribute haplotypes with a large number of tightly linked low frequency variants (see Discussion below).
Given prior genetic evidence for archaic admixture in Eurasia and Africa (reviewed in [39]), we proposed a model that includes two deeply diverged human branches, with one branch mixing with Eurasian ancestors beginning at the OOA event, and the second one mixing with the ancestors of the Yoruba population over a time period that could include the OOA event. In this scenario, this second branch could also contribute to Eurasians through admixture prior to the OOA event (Fig 4A). Many human lineages coexisted on the African continent, possibly until quite recently [40][41][42], and genetic evidence points to a history of  Table 1. Most statistics were accurately predicted by this model, including (B) the decays of E½D 2 � in each population, (C) the decay of the covariance of D between populations, and (E) the joint heterozygosity E½p 2 ðiÞ�. (D) However, E½D i ð1 À 2p i Þð1 À 2q i Þ� was fit poorly by this model, and we were unable to find a three-population model that recovered these observed statistics, including with additional periods of growth, recent admixture between modern human populations, or substructure within modern populations. Error bars represent bootstrapped 95% confidence intervals on the statistic estimate.
https://doi.org/10.1371/journal.pgen.1008204.g003 archaic admixture or deep structure across many modern African populations [43][44][45][46][47][48]. It is likely that modern humans have met and mixed with diverged lineages many times through history, rather than receiving just a single pulse of migrants [49,50]. We chose to model the mixing of archaic and modern human branches as continuous and symmetric [51], parameterizing the migration rate between these branches and the times that migration began and ended.
We considered two topologies for the archaic branches: 1) both branches split independently from that leading to modern humans ( Fig 4A and Table 1), and 2) one branch split from the modern human branch, which some time later split into the two populations (Fig A9  and Table A2 in S1 Appendix). Both models fit the data well with little statistical evidence to discriminate between these two models (Fig 4B-4E and Fig A8 in S1 Appendix). The difference in log-likelihood between the two models was ΔLL < 1, as opposed to ΔLL = 1,730 between models with and without archaic admixture. ΔLL between the best fit model with archaic admixture and the fully saturated model (using observations as expectations) was 767. Consistent among the inferred models was the age of the split between diverged and modern human branches within Africa at * 500 kya, though uncertainty remains with regard to the relationship between archaic human lineages in Africa and Eurasia. The sequencing of archaic genomes within Africa would clearly be helpful in resolving these topologies.
We inferred an archaic population to have contributed measurably to Eurasian populations. This branch (putatively Eurasian Neanderthal) split from the branch leading to modern Table 1. Inferred parameters for OOA models. Two models for the out-of-Africa expansion. We fit the commonly used 13-parameter model to the multi-population Hill-Robertson statistics (left). The best fit parameters shown here were fit to the set of statistics without the E½Dz� terms, because the inclusion of those terms led to runaway parameter behavior in the optimization. This is often a sign of model mis-specification. On the right, the same 13-parameter model is augmented by the inclusion of two deeply diverged branches, putatively Neanderthal and an unknown lineage within Africa. We inferred that these branches split from the branch leading to modern humans roughly 460 − 650 kya, and contributed migrants until quite recently (*19 kya). Times reported here assume a generation time of 29 years and are calibrated by the recombination (rather than mutation) rate. Confidence intervals were computed using the Godambe information matrix on bootstrap replicates of the data [37]. humans 470 − 650 thousand years ago (kya), which contributed 1.2 ± 0.6% ancestry in modern CEU and CHB populations after the out-of-Africa split. This range of divergence dates from our maximum-likelihood model overlaps with previous estimates of the time of divergence between Neanderthals and human populations, estimated at 550 − 765 kya [52]. The diverged African branch split from the ancestors of modern humans 460 − 540 kya and contributed to both the pre-OOA human branch and the lineage leading to YRI. This admixture began between 90 − 160 kya, well before the estimated split between Eurasian and the YRI lineages, so that this archaic branch also contributed to the ancestors of Eurasian populations. We estimated 4.7 − 9.2% ancestry contribution from this unknown population to YRI, and 1.9 − 6.6% contribution to CEU and CHB. We chose a separate population trio to validate our inference and compare levels of archaic admixture with different representative populations. This second trio consisted of the Luhya in Webuye, Kenya (LWK), Kinh in Ho Chi Minh City, Vietnam (KHV), and British in England and Scotland (GBR). We inferred the KHV and GBR populations to have experienced comparable levels of migration from the putatively Neanderthal branch. However, the LWK population exhibited lower levels of admixture (* 6%) in comparison to YRI, possibly suggesting population differences in archaic admixture events within the African continent (Table A3).

Multi-population two-locus diversity statistics
The application presented here relied on the four-haplotype statistics (D 2 , Dz, π 2 ). Studying these low-order multi-population statistics in a likelihood framework allowed us to infer a demographic model with archaic admixture, even without reference genomes from those Demographic events for the three modern human populations are parameterized as above, but we also include two branches with deep split from the ancestral population to modern humans. A putatively Neanderthal branch that remains isolated until the Eurasian split from YRI, and a deep branch within Africa that is allowed to be isolated for some time before continuously exchanging migrants with the common ancestral branch and the YRI branch. (B-E) This model fits the data much better than the model without archaic admixture, and especially for the Dz statistics (D). Fits to 35 more curves and statistics are shown in Fig A7 in S1 Appendix, and residuals are shown in Fig A8 in S1 Appendix. The migration rates inferred between the diverged African branch and YRI provides an estimate of * 7.5% contribution.
https://doi.org/10.1371/journal.pgen.1008204.g004 diverged populations. We have also shown that higher order statistics may be computed through this same framework. Extending higher order two-locus moment systems to multiple populations would potentially provide further information about demography, particularly for past encounters with archaic lineages.
Relation to other statistics. There are many approaches for computing expected statistics for diversity under a wide range of scenarios. Single-site statistics, which include expected heterozygosity and the AFS, may be computed efficiently using forward-or reverse-time approaches. Beyond the classical recursions for E½D� and E½D 2 � [12,14], two-locus statistics are difficult to compute for non-equilibrium, multi-population demographic models. Sved [53] proposed an IBD based recursion to compute E½r 2 � across subdivided populations, but its accuracy and interpretation remain debated [12].
The moments-based approach presented here generalizes the recursion for the single-site AFS presented in [3]. The moments system includes all heterozygosity statistics, so we recover expected F-statistics under arbitrary demography, which are commonly used to test for admixture [54][55][56]. Long-range patterns of elevated LD in putatively admixed populations are used to infer the timing of admixture events and relative contributions of parental populations [10,29]. These approaches rely on the recursion for E½D� after admixture events that is used here (Eqs 2 and 7). Thus the generalized Hill-Robertson system is sensitive to ancient admixture, but also captures statistics used to identify recent admixture history, with fewer assumptions about early history.
Plagnol and Wall [57,58] introduced a statistic, S � , specifically designed to scan for introgressed haplotypes without having sequence data from the diverged population. S � uses an adhoc score to identify SNPs that likely arose on haplotypes contributed from a deeply diverged population, and is estimated through simulation. These SNPs will tend to be rare and in high LD, and therefore also contribute to Dz (Fig 2D-2F). Thus even a small amount of archaic admixture will significantly elevate E½Dz� compared to that in an unadmixed population, and Dz itself could be used as an ad-hoc statistic similar to S � . Given its conceptual relationship to S � , it may not be so surprising that this previously overlooked statistic is particularly well suited for model-based inference of archaic admixture.
Caveats. Like many inference approaches in population genetics, we approximate human history using discrete, randomly mating populations with size and migration histories described by relatively few parameters. History is much more complex than this. Thus statistical uncertainties estimated using bootstrap analysis masks much larger, systematic errors due to model misspecification. In particular, some choices we made in modeling archaic admixture are certainly oversimplified, such as the assumption of symmetric and constant migration rates during the period of contact between archaic and modern humans.
Variability in fine-scale recombination rates between populations and over time contributes another source of systematic error. While large-scale recombination rates are generally better understood than the mutation rate in humans [for which current estimates vary over a factor of two [59]], recombination rates can vary at short distances. Spence and Song [60] showed that recombination maps are highly concordant across populations represented in the Thousand Genomes Project [34], although this correlation surely decreases at shorter distances. We filtered out pairs of mutations at very close distances (less than roughly 1kb) to reduce potential biases due to very fine scale variation. We therefore do not expect variation in recombination rate among human populations to explain the large differences in Dz compared to the Gutenkunst et al. model. However, the effect of population-specific recombination maps may play a role when considering finer-scale patterns and data from deeply diverged populations such as the Neanderthal. Finally, our model and inferences assumed that mutations are evolving neutrally. We chose to analyze SNPs in intergenic regions and excluded genic and intronic regions in an effort to reduce biases due to selection acting on mutations included in the analysis or nearby selected regions, although some intergenic regions are expected to be affected by selection or biased gene conversion. While outside the scope of this study, a more detailed characterization of the effects of linked selection on Hill-Robertson statistics is warranted.

Conclusion
We described an infinite hierarchy of multi-locus summaries of genomic diversity that are easy to compute under arbitrary, multi-population demographies. Some of these statistics are familiar, including expected heterozygosity, F-statistics, and LD decay, while others have been largely unexplored in multi-population models, such as the degree of LD between low frequency alleles (Dz) and the joint heterozygosity across sites and populations (π 2 ). The onepopulation Dz statistic, in particular, has an interesting history, as it has come up in early work as a mathematical stepping-stone on the way to computing D 2 [14], but was, to our knowledge, never used in data analysis. As it happens, this 'ghost' statistic provides a unique window into human history.
Using this set of summary statistics, we explored a commonly used model of human demographic history derived from single-site AFS and validated using LD decay curves. While many statistics under this model fit the data well, the model dramatically underestimates levels of LD among rare alleles. Modeling archaic admixture worldwide resolved this discrepancy. We recovered the signal of Neanderthal admixture in Eurasian populations, and found evidence for substantial and long-lasting admixture from a deeply diverged lineage in two African populations that is consistent with evidence from previous studies [46][47][48]57].
This model deserve a more thorough investigation, including data from ancient humans and additional contemporary African populations. We leave this to future work for three reasons. First, proposing a detailed multi-population model of evolution in Africa will require carefully incorporating anthropological and archaeological evidence, which is a substantial endeavor. Second, the inclusion of two-locus statistics from ancient genomes will require vetting possible biases associated with ancient DNA sequencing, although we see no problem with using two-locus statistics in modern populations jointly with one-locus statistics in ancient DNA.
Third, and more importantly, archaic admixture can hide in the blind spot of classical statistics, and widely used demographic models for simulating genomes underestimate LD between low frequency variants in populations around the globe, especially in Africa. This large bias affects neither the distribution of allele frequencies nor the amount of correlation measured by D 2 , but it may impact analyses aiming to identify disease variants based on overrepresentation of rare variants in specific genes or pathways. Thus both statistical and population geneticists would benefit from including archaic admixture into baseline models of human genomic diversity.
Supporting information S1 Appendix. Supporting material. In the Appendix, we provide detailed mathematical derivations, expanded discussions, supporting analyses, and supplemental figures and tables. (PDF)