Figures
Abstract
The ancestry of each locus of the genome can be estimated (local ancestry) based on sequencing or genotyping information together with reference panels of ancestral source populations. The length of those ancestry-specific genomic segments are commonly used to understand migration waves and admixture events. In short time scales, it is often of interest to determine the existence of the most recent unadmixed ancestor from a specific population t generations ago. We built a hypothesis test to determine if an individual has an ancestor belonging to a target ancestral population t generations ago based on these lengths of the ancestry-specific segments at an individual level. We applied this test on a data set that includes 20 Uruguayan admixed individuals to estimate for each one how many generations ago the most recent indigenous ancestor lived. As this method tests each individual separately, it is particularly suited to small sample sizes, such as our study or ancient genome samples.
Citation: Illanes G, Fariello MI, Spangenberg L, Mordecki E, Naya H (2022) Testing the existence of an unadmixed ancestor from a specific population t generations ago. PLoS ONE 17(8): e0271097. https://doi.org/10.1371/journal.pone.0271097
Editor: Gyaneshwer Chaubey, Banaras Hindu University, INDIA
Received: December 6, 2021; Accepted: June 23, 2022; Published: August 12, 2022
Copyright: © 2022 Illanes et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All the data and code that was used in this work was uploaded to figshare. Data: https://figshare.com/articles/dataset/Testing_the_existence_of_an_unadmixed_ancestor_from_a_specific_population_t_generations_ago/20277795 Code: https://figshare.com/articles/software/Testing_the_existence_of_an_unadmixed_ancestor_from_a_specific_population_t_generations_ago_code_/20277813.
Funding: Gabriel Illanes acknowledges support of Agencia Nacional de Investigación e Innivación (ANII-Uruguay) and Comisión Académica de Posgrado (CAP-Udelar) The Urugenomes project was funded by BID (Banco Iberomericano de desarrollo) Proyecto ATN / KK-L4584-JR “Fortalecimiento de las capacidades técnicas y humanas para las exportaciones de servicios genómicos”. Additionally, Maria Ines Fariello and Lucia Spangenberg obtained partial support from the ANII-Uruguay FSDA 1 2017 1 143647 and Lucia Spangenberg and Hugo Naya are also supported by FOCEM (MERCOSUR Structural Convergence Fund).
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
The information that we have about our ancestors and their origins comes mostly from our families’ stories and in best cases from reconstructed genealogies. With genomic information we can build more precise and reliable genealogies and even calculate the proportions of our ancestries. For instance, there can be a discordance between the self-declared ancestry and the ancestry estimated from a genome study, as shown in [1]. Even if the exact genealogy of an individual is known, it is very difficult to estimate the proportion of a particular ethnic group, since the amount of genetic material that is yielded from one generation to the next one is highly variable [2, 3]. As an example, according to Coop’s calculations using simulations, with a high probability (close to 1) one inherits almost zero genomic material from at least an ancestor that lived only 7 generations ago.
Genomic data has been used to study a great variety of human population characteristics, such as population structure [4], admixture events [5–7] and the estimation of ancestry proportions from an individual’s genome [8]. In particular, the challenge of estimating the local ancestry (ancestry-specific genomic segments) of an admixed individual, which means to determine the tracts in the genome corresponding to different ancestral populations (eg. European, African and Native American), has been successfully addressed with different approaches [9–11]. The possibility of representing the genome by a disjoint series of tracts with different ancestries enables the application of mathematical modeling tools to retrieve interesting information regarding the history of an individual. For instance, given a particular pedigree one can model admixture events by stochastic processes, allowing the study of inference methods for admixture deconvolution and segregation of tracts in the pedigree. Assuming that the target tracts are rare, hence they are unlikely to recombine, an admixture tract-length distribution was derived in [5]. Furthermore, some model assumptions were relaxed in [7], modeling tracts that descended from multiple migrant ancestors under a simplified model (Markovian Wright–Fisher). Additionally, a dyadic interval-based stochastic process for generating admixture tracts was developed by [6].
Here, we have developed a hypothesis test to assess whether it is likely that one of the individual’s ancestors t generations ago was an unadmixed ancestor (e.g. complete individuals genome with only one ancestry), given a fixed number t of generations and the length of the ancestry-specific tracts for every autosome.
We applied this test on a data set that includes the genomes of 20 Uruguayan individuals (ten descendants of the past local indigenous groups [1] and ten afro-descendants). According to historical records, most Uruguayan Amerindian were exterminated in 1831 [12]. Only some of them survived, and several women and children were taken as prisoners. As far as we know, no unadmixed indigenous individuals are living nowadays among the Uruguayan population. Our previous study has shown that there is non-negligible indigenous ancestry in this particular data set, indigenous percentages range from 7% to almost 40%. Also, mitochondrial DNA haplogroups show indigenous haplogroups such as B or C. Admixture results together with admixture graphs show a genomic affinity with Amazonian and one Andean indigenous group [1].
In the current study we want to find a new hypothesis test that brings deeper information about each individual’s history. In this sense, our motivation relies on knowing whether descendants of the indigenous groups had ancestors that survived the genocide, i.e. had an unadmixed (“complete”) indigenous ancestor about just 3 or 4 generations ago; or that the admixture events occurred before the genocide, meaning that those few survivors were admixed with the general population.
2 Methods
2.1 Definitions and notations
Let a0 be an individual, and the individual’s ancestors t generations ago. The individuals
mate, with offspring
, for all i and t. We will assume that a0 and all their ancestors up to generation t are admixed with respect to a family of ancestral populations
. Furthermore, for a given individual a0, let ci, i = 1, …, 22 denote their autosomes. Each ci consists of a chromosome pair
.
In this work, we will use the terms “segment” and “tract” of a chromosome in the following way.
Definition 2.1 (Segment of a chromosome). We will refer to a haplotype of a chromosome as a “segment” (it can contain genetic information related to different ancestral populations).
Definition 2.2 (Tract of a chromosome). We will refer to a segment with all its genetic information related to the same ancestral population as a “
ancestral tract”, or just “
-tract”.
This distinction is arbitrary, but it is important to note if a given segment has all its genetic information related to the same ancestral population or not.
In this work, we will measure lengths of segments and tracts in Morgans, which is a usual measure unit to consider, and very suited for this work.
Definition 2.3 (Morgan). A “Morgan” is defined as the distance between chromosome positions for which the expected number of recombinations between homologous chromosomes in a single generation is 1.
In this work, we will assume that, for a given individual, every chromosome can be considered as a concatenation of tracts:
where each
is a
-tract, for some λ ∈ Λ. We will also assume that, for a given individual a0, we know the length (in Morgans) and ancestral population related to every tract
, for every chromosome of a0.
Definition 2.4 (-complete individual). For a given λ ∈ Λ, we say that an individual a is
-complete if all their chromosomes are
-tracts.
2.2 The hypothesis test
The objective of this work is, for a fixed generation t, develop a hypothesis test to assess if at least one of the ancestors is
-complete, for a given λ ∈ Λ.
Without loss of generality, let us focus on the two population case, and
. For a given t, we are interested in doing the following test,
(1)
One of our major problems is not having information about an individual’s ancestors t generations ago. If we would like to sample the 2t ancestors of a0, t generations ago, we would not have enough information about a0, or about their ancestors, that we can use to fix a realistic distribution function on the space of all possibilities. Our strategy, then, is to focus on a case of H0, where we can fix the ancestors’ pedigree.
(2)
Without loss of generality, if at ∈ H0 or if , we assume that
is the
-complete ancestor of a0 (if not, reorder the family tree to make it so). While there are several distribution functions supported in H0 which could be used to sample at, there is only one possibility in
; that is,
is a
-complete ancestor, and
are all
-complete ancestors. The test 1 is a composite hypothesis test, whereas the test 2 is a simple hypothesis test. In the S1 File, we show that we can build statistics such that their p-value under H0 is always stochastically smaller than their p-value under
.
2.3 Mathematical model
We assume that chromosomes can be thought as real intervals, instead of a sequence of bases. This assumption aims to ease some computational burden (we will explore the need for simulations in subsection 2.5). If this assumption is not made, we have to consider a large amount of very long vectors during the simulations, which would consume a lot of computational resources. When this assumption is made, we model each chromosome as an interval, and simulate the Poisson process in the interval using the exponential distribution. This assumption is not a strong one, because the number of recombination points introduced during the meiosis is much smaller than the the total number of bases on each chromosome.
For a given ancestor and chromosome i, we consider the chromosome pair
. During the meiosis, those chromosomes recombine to create an offspring chromosome as follows:
- Recombination points are introduced using a Poisson process with parameter Li (length of the chromosome in Morgans). Including the borders of the interval [0, Li], we obtain {x0 = 0, x1, …, xn, xn+1 = Li}.
- A parent chromosome is selected randomly. The segment tr1 = [0, x1] in the selected chromosome will be the first segment of the offspring chromosome.
- At the point x1, switch to the other chromosome, and concatenate the segment tr2 = [x1, x2].
- The process is repeated until the length of the offspring chromosome is Li.
This process is illustrated in Fig 1 (meiosis for complete chromosomes and meiosis for admixed chromosomes).
Top: recombination of two -complete chromosomes during meiosis to create an admixed offspring. The first one is
-complete (red) and the second one is
-complete (blue). Bottom: Two admixed chromosomes consisting of
-tracts (red) and
-tracts (blue) recombine during meiosis to create an offspring chromosome.
Alternatively, we could have chosen a Wright-Fisher model [5, 7] as our model. As it possess the Markov property, it is easier to develop mathematical models and tests; however, it fails to capture some structures when we work at an individual level, with small values of t. One such example is when a -complete mates with a
-complete individual: under the Diploid Wright-Fisher model, one of the offspring chromosomes will be
-complete and the other one will be
-complete; whereas in a Markovian Wright-Fisher model, both offspring chromosomes can be admixed, or even lose the genetic information of one of the parents. We conclude that Markovian Wright-Fisher models are only suitable when working with whole populations and large values for t.
2.4 Definition of the test statistic
Let us fix the parameter t of our hypothesis test; the objects we define in this section depend on t, but we will not index it to simplify the notation. Our objective is to define a test statistic for the test 2 that bounds the same test statistic for the test 1. Our strategy is to do that in two steps: first, we will construct a score for each chromosome pair; and second, we will combine all those scores into a test statistic in different manners.
2.4.1 Chromosome scores.
Definition 2.5 (Chromosome statistics). Given a chromosome pair, we will consider two possible statistics:
-
is the maximum length among all
-tracts in the chromosome pair,
-
is the maximum sum of lengths of all
-tracts among the chromosome pair.
In order to ease the notation, we will denote the chosen statistic by mi, for i = 1, …, 22, unless the distinction is needed.
As the lengths of the chromosomes are all different, the mi are not comparable across chromosome pairs. Let Mi be the random variable from which mi is sampled, and define pi as:
(3)
the probability under H0 of observing a smaller chromosome statistic Mi that the one observed mi. As a technical consideration, we will condition the probability pi to Mi > 0 to improve the performance of the hypothesis test (we refer to the supplementary section for further details).
Definition 2.6 (Chromosome scores). Let mi, for i = 1, …, 22 be a chromosome statistic. Denote Mi the random variable from which mi is sampled. For i = 1, …, 22, we define the chromosome score pi as
It is important to note that
depends on t.
The chromosome scores have similar distributions across all chromosomes, and thus we can compare and combine them: To see that, let us denote as Pi the random variable used to sample pi. Using the probability integral transform, and observing that and
, we deduce the distribution function of the random variable Pi:
(4)
where
denotes that the random variables are equal in distribution. We observe that
and
if i ≠ j, but we will avoid the chromosome indexation to simplify the notation. We conclude that Pi ≠ Pj if i ≠ j, but they have the same range, and they both behave as uniform distributions in the interval (0, 1). They only differ in the weight of their atoms (when M = 0 or M = L).
2.4.2 Combining the chromosome scores into a test statistic.
The distribution of the random variable Pi is given by Eq 4, and we observe that if i ≠ j. Assuming that the recombination spots are independent between chromosomes, then all Mi are independent, and thus the Pi are independent.
The distribution function of Mi under H0 is unknown, hence to compute the pis we can simulate using Monte Carlo their distribution under . The probabilities we need to approximate are
, w0 and wL, that are enough to compute pi and approximate their theoretical distribution function.
Subsequently, we propose two different ways of combining all chromosome scores pi into a test statistic whose p-value is easy to compute. Our first proposal is to define pmax as the maximum of all pi (5)
If Fi is the distribution of Pi, then the distribution of the random variable Pmax:
(6)
and the final p-value as
.
The second idea is to consider psum, the sum of all chromosome scores
(7)
As the distribution functions of all Pi, Fi, are different, it is very difficult to compute the theoretical distribution . However, as we know Fi for all i, it is easy to approximate the p-value of the test using Monte Carlo simulations.
Considering that both pmax and psum can be constructed using both definitions of mi defined in 2.5, we propose four variants of the hypothesis test 2.
Definition 2.7. We define the following test statistics.
- pmm when we use
and we consider the maximum of all pi.
- psm when we use
and we consider the maximum of all pi.
- pms when we use
and we consider the sum of all pi.
- pss when we use
and we consider the sum of all pi.
As we are interested in the hypothesis test 1, we need the following theorem 2.1, that allows us to simulate under and use the results to bound the p-values of test 1.
Theorem 2.1. Let p be any of the four p-values defined for the hypothesis test 2 (pmm, psm, pms, pss). Then, CR = {p ≤ α} is a critical region for the test 1 with probability β ≤ α.
In other words, we can control the type I error of the hypothesis test 1 by controlling the type I error of the hypothesis test 2. The proof of theorem 2.1 is detailed in the S1 File.
Algorithm 1 shows a summary of the methodology we developed. Usually, one should consider all choices for chromosome statistic and test statistic. As long as they result in few tests (i.e. we avoid multiple testing issues), we can consider only the smallest p-value obtained, and reject the null hypothesis if any of the test rejects.
Algorithm 1 Ancestry test algorithm for a given individual
1) Set an objective ancestral population and a number of generations t.
2) Choose a chromosome statistic; usually or
. Compute all 22 statistics.
3) Using simulations, estimate the distribution functions for the chromosome statistics under , and compute the chromosome scores.
4) Choose a way of combining all chromosome scores into a test statistic; usually the sum of scores psum, or the maximum of scores pmax.
5) Compute the distribution of the test statistic, and compute the test p-value p.
return p
2.5 Theoretical computation of the distribution of a chromosome pair
The objective of this section is to show that the computation of the chromosome statistics distribution, under , is a very difficult problem. The main reason is that our model for chromosome recombination is not a Markov process when we condition only to the genetic information of the parent chromosomes.
An important observation is that, under and for a given t, one of the chromosomes in each chromosome pair
of a0 will be a
-complete chromosome. Let us assume
is the
-complete chromosome. We only need to focus on the
-tracts in
, and deduce the distribution function of the chosen chromosome pair statistic mi.
Let us start for t = 1. In this case, we have two chromosome pairs (one for each parent). One of the chromosome pair is -complete, and the other one is
-complete. The first chromosome pair recombines to create a
-complete chromosome, and the other pair recombines to create a
-complete chromosome (as expected). Thus,
is a
-complete chromosome, so we conclude that
is false if neither of the chromosomes is
-complete.
For t = 2, will be a recombination of a
-complete chromosome and a
-complete chromosome (as in Fig 1). We observe that the length of each tract is distributed as exp(1), and tracts alternate between
and
. Let Ni be the amount of
-tracts in
. If we can compute the distribution of Ni, we will be able to compute the distribution of mi, whichever we choose. Let
be the lengths of the
-tracts in
, then the distribution functions for
and
are
(8)
(9)
We conclude that, for t = 2, the distributions can be computed, or at least approximated with precision. However, for t ≥ 3, it is not clear how to compute the distribution of the lengths of the -tracts. The problem is that, after we reach a recombination point in
, we can not compute the exact probability of the next tract being
or
, because it is not a Markov process. This means that we can not compute the distribution of Ni, and can not recover the Eqs 8 and 9.
3 Results
3.1 Simulated results
Our only option is to simulate the distributions using Monte Carlo methods, which can be done fast under . We use the R software for raw data manipulation, and the Julia software [13] to run the hypothesis tests. The data can be found in http://urugenomes.org/lovd/variants, and the R and Julia code can be found in https://github.com/gabriel-illanes/Ancestors_test.
3.1.1 Simulated distributions under
.
We compare the effect of increasing the number of generations t, and the effect of choosing as statistic the maximum length of -tracts (
) or the sum of lengths of
-tracts (
). In Fig 2 we show the simulations for the 11th chromosome, as it has the mean length of the rest of the chromosomes.
Left: Histogram of 10000 simulations of for generations t = 2, 4, 6 under
. We can observe that the atom in 0 becomes larger and, in general, the distributions of
become stochastically smaller as t increases. Right: Histogram of 10000 simulations of
and
for generation t = 2 under
. The distribution of
is stochastically smaller than the distribution of
; and for t = 2, the density of
is symmetrical with respect to L/2.
As expected, the statistics decreases as t increases; for t = 6 we already observe a very large value of . Also, we verify that Mmax is stochastically smaller than Msum, as the sum of lengths will always be larger than the maximum length.
For verifying Equation 2.6 and validate that we can estimate the distribution function of Pi using only the atoms ω0 and ωL, we first simulate 10000 values of . From Equation 2.6, we observe that we can estimate the distribution function of
using only the estimated values of ω0 and ωL. Whereas a more naive and inefficient method would be to simulate a new set of 10000 values of
, obtain a vector of chromosome scores
and use them to create the empirical cumulative distribution function (Fig 3).
In blue, the estimation is done using a new set of 10000 simulations. In red, we use the estimated values of the atoms ω0 and ωL from the original simulations.
3.1.2 Power of the test.
We have four possible variants of the hypothesis test statistic: for each chromosome compute either the maximum length of the -tract or the sum of the lengths of the
-tracts and then combine them into a global statistic either as maximum of all pi or sum of all pi. We asses the power of the hypothesis test in different scenarios, each one of them being a particular case of the alternative hypothesis H1, based on 1000 Monte Carlo simulations for each one.
The first scenario is, for a given t, the ancestors have, in average, 2/2t
genetic information, whereas ancestors
have 0
genetic information. In other words, we take twice the
genetic information of a
-complete ancestor, and spread it across the first 2t−1 ancestors (the ancestors from one parent’s side).
This scenario aims to study the impact of the structure of the -tracts, as we expect to have more
genetic information than in the
scenario, but resulting from multiple small tracts, rather than several larger ones. We expect that considering the length of maximum
-tract as chromosome pair statistic (either pmm or pms) should yield more power for the hypothesis test. The best results are obtained when we consider pmm, as expected (Table 1).
The second scenario aims to study what happens when some of the -chromosomes are replaced with
-chromosomes. More precisely, we start considering the
scenario, take the
-complete ancestor, and replace half of their chromosomes (in average) with
-chromosomes.
As we expect to obtain half of the genetic information compared to the
scenario, considering the sum of all the chromosome pair scores as test statistic (either pms or pss) should yield more power for the hypothesis test. The best results are obtained when we consider pms and pss, as expected (Table 2).
The third scenario aims to understand what happens, when we do the hypothesis test for generation t, but the true scenario is for generation t + 1 (the first
-complete ancestor can be found t + 1 generations ago). A priori, it is not clear, which of the methods will work best, as we expect smaller
-tracts, and less
genetic information than the
scenario for generation t.
The best results are obtained when we consider pms and pss (Table 3). We could conclude that the impact of obtaining less genetic information is larger than the length reduction of the
-tracts.
From this results several remarks can be done:
- The most impactful decision, under the studied scenarios, is how to combine the chromosome scores to obtain a test p-value, rather than how to obtain a chromosome score.
- For almost every scenario and choice of method, we obtained a test power greater than 0.05, which is the expected power under the null hypothesis.
- The difficulty of the problem increases rapidly when increasing t. When t > 4, we can not expect to obtain reliable results. When t = 7, the (simulated) probability of all the
genetic information disappearing (genetic drift) is greater than 0.05, so we would obtain power 0 for any test with level α = 0.05.
3.2 Empirical results and discussion
We applied the hypothesis test onto a real data set, originated in the context of the project Urugenomes. In this project, 10 Uruguayan individuals of known Amerindian ancestry (probably Charrúas) were analyzed; these are the same 10 individuals that were studied in [1]. The inclusion criteria to be part of the study was to have at least one indigenous great grandfather or great great grandfather, according to social anthropological studies and family records and genealogies. Additionally, 10 individuals of known African ancestry were included, that did not know about their Amerindian ancestry. According to historical records, after the first Europeans (Spanish and Portuguese) came to the country, Africans were brought as slaves. Recent results of the Urugenomes project (urugenomes.org) show that these African descendants have also admixture with Amerindian (manuscript in preparation), so they were included in the present study.
Whole genome sequencing of these 20 individuals was done using NGS. Variants were determined and results were phased (haplotype constructions) using 1000Genomes project as reference panel [14]. For this study, we kept only 363578 genome-wide variants, which correspond to the genotyping array positions used in [15] to study Native American populations. Phased variants were used to construct ancestry specific haplotypes (local ancestry estimations) using RFMix [10]. For this, a reference panel was used that contained complete individuals of European, African and Amerindian ancestries. As a result the data set contained ancestry specific segments of different lengths for each individual. For the purpose of the present work, only indigenous ancestry segments were considered, while the other two ancestries (African and European) were masked out of the data. In summary, the data set is represented by a matrix of 40 haplotypes (corresponding to 20 individuals) and 363578 variants, where information is kept only within indigenous haplotypes and the rest were set to missing data. Out of the matrix, the length distribution of indigenous segments in each haplotype was determined, which is the starting point of the proposed algorithm. For each individual, we obtained their chromosome statistics (either the length of maximum indigenous tract, or the sum of lengths of all indigenous tracts), and undertook all four variants of the hypothesis test. We tested, for t = 2, …, 5, whether the individuals might have had at least a complete Amerindian ancestor t generations ago. Fig 4 presents the obtained p-values for each individual, method and value of t.
Estimated values of pmm (top left), psm (top right), pms (bottom left), and pss (bottom right), using 10000 iterations, for every individual and every t = 2, …, 5. Given a choice of statistic, an individual, and fixed t, we will reject the null hypothesis if the corresponding p-value is below 0.05 (shown in the horizontal line in all graphics). For a unified criterion across all statistics, if one of them rejects the null hypothesis, that is enough statistical evidence to reject H0 for the given individual and t.
We observe that the factor that impacts the most option is the combination of chromosome scores, as we observe larger p-values when we consider the sum of all chromosome scores as the test statistic. This can be interpreted as observing more indigenous genetic information than the expected under , but there are no chromosomes with very large scores. This is a similar behavior as the one observed in the first simulated scenario. That being said, in general, rejecting the hypothesis test for at least one of the statistics should be enough statistical evidence to conclude that the individual does not have any complete Amerindian ancestor t generations ago, specially considering the low power of these tests. In other words, in order to reject the null hypothesis for a given t and a given individual, we should focus on the smallest p-value across all statistics.
Considering the test results for the pmm and psm statistics, we observe that there is a concordance of the test with the expected biological results. Individuals 12, 14, 15, 16, 19 and 20 do not reject the null hypothesis for the presence of a complete Amerindian ancestor for t = 3 generations ago -they could have had a complete Amerindian ancestor 3 generations ago-, whereas individuals 11, 13, 17 and 18 reject the null hypothesis for t = 3 -there is statistical evidence pointing out that these individuals did not have a complete Amerindian ancestor 3 generations ago-. When considering the test results for pms and pss, we observe larger p-values for every test -across both individuals and generations-, and thus, we do not focus on them. Concurring, individuals 12, 14, 15, 16, 19 and 20 have the largest indigenous ancestry among the 10 individuals with Amerindian ancestry as calculated by genomic approaches.
Regarding the individuals that declared African ancestry (individuals 1 to 10), we observe that they have lower p-values, in average, compared to individuals that declared Native American ancestry (individuals 11 to 20). It is important to note that individuals 7 and 9 do not reject the null hypothesis for t = 3, but this does not contradict the individuals’ declared ancestry (as they could have had at least one complete Native American ancestor 3 generations ago, as well as several complete African ancestors). Another interesting thing to note is that individual 5 rejects the null hypothesis for t = 5; it is possible that the individual’s family tree members are not originary from South America.
References
- 1. Spangenberg L, Fariello MI, Arce D, Illanes G, Greif G, Shin JY, et al. Indigenous ancestry and admixture in the Uruguayan population. Frontiers in Genetics. 2021. pmid:34630523
- 2.
Coop G. How much of your genome do you inherit from a particular ancestor?; 2013. https://gcbias.org/2013/11/04/how-much-of-your-genome-do-you-inherit-from-a-particular-ancestor.
- 3. Caballero M, Seidman DN, Qiao Y, et al. Crossover interference and sex-specific genetic maps shape identical by descent sharing in close relatives. PLoS Genetics. 2019. pmid:31860654
- 4. Guan Y. Detecting structure of haplotypes and local ancestry. Genetics. 2014;196(3):625–642. pmid:24388880
- 5. Pool JE, Nielsen R. Inference of historical changes in migration rate from the lengths of migrant tracts. Genetics. 2009;181(2):711–719. pmid:19087958
- 6. Liang M, Nielsen R. The lengths of admixture tracts. Genetics. 2014;197(3):953–967. pmid:24770332
- 7. Gravel S. Population genetics models of local ancestry. Genetics. 2012;191(2):607–619. pmid:22491189
- 8. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research. 2009;19(9):1655–1664. pmid:19648217
- 9. Jonathan K Pritchard MS, Donnelly P. Inference of Population Structure Using Multilocus Genotype Data. Genetics. 2000;155:945–959.
- 10. Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. The American Journal of Human Genetics. 2013;93(2):278–288. pmid:23910464
- 11. Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genetics. 2012;8:e1002453. pmid:22291602
- 12.
y Lara EFA. Salsipuedes 1831: los lugares. Universidad de la República, Dirección General de Extensión Universitaria; 1985.
- 13. Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A fresh approach to numerical computing. SIAM review. 2017;59(1):65–98.
- 14. Delaneau O, Howie B, Cox A, Zagury JF, Marchini J. Haplotype estimation using sequence reads. American Journal of Human Genetics. 2013;93:696–787.
- 15. Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, et al. Reconstructing Native American Population History. Nature. 2012;488(7411):370–374. pmid:22801491