Figure 1.
Geographic location of the 12 populations studied.
Blue-green dots represent Western Pygmy (WPYG) populations, maroon dots represent Eastern Pygmy (EPYG) populations, and yellow dots represent agricultural (AGR) populations. 1. Bakola from Cameroon, 2. Baka from Gabon, 3. Baka from Cameroon, 4. Biaka from the Central Africa Republic, 5. Mbuti from the Democratic Republic of Congo, 6. Twa from northern Rwanda, 7. Twa from southern Rwanda, 8. Yoruba from Nigeria, 9. Ngumba from Cameroon, 10. Akele from Gabon, 11. Chagga from Tanzania, 12. Mozambicans from Mozambique.
Figure 2.
Estimated structure of populations of African farmers and Pygmy hunter–gatherers, based on autosomal and X-linked regions.
Individuals are represented as thin vertical lines partitioned into segments corresponding to their membership of the genetic clusters indicated by the colors. G. and C. Baka stand for Gabonese and Cameroonese Baka, and N. Twa and S. Twa stand for Twa Pygmies from north and south of Rwanda, respectively. (A) Estimated structure of the entire population dataset, which includes all individuals except those displaying cryptic relatedness. K, the prior number of groups, varied from 2 (upper chart) to 5 (lower chart). For the models in which K was at least 5, the STRUCTURE program detected no additional cluster. The likelihood of the data was maximal at K = 4 (the mean ln[likelihood] values for K = 2, 3, 4 and 5 were equal to −16606, −16563, −16277 and −16290, respectively). (B) Estimated structure of the “filtered population dataset.” We excluded from this dataset those individuals whose proportion of ancestry in another population group was higher than 20% at K = 4, the most probable value of K. Using this filtering procedure, we excluded 92 individuals, including 15 Bakola, 2 C. Baka, 2 G. Baka, 4 Biaka, 1 Mbuti, and 21 Twa Pygmies, as well as 4 Yoruba, 5 Ngumba, 5 Akele, 12 Chagga, and 21 Mozambican farmers.
Figure 3.
Site frequency spectra of the WPYG, EPYG, and AGR populations for the 20 autosomal regions, using the filtered population dataset.
Gray histograms represent the expected site frequency spectra (SFS) of a constant-sized panmictic population with the same number of individuals as observed in the three population groups.
Table 1.
Mean diversity indices and neutrality tests across the 24 independent genomic regions sequenced in the filtered population dataset of Western Pygmies (WPYG), Eastern Pygmies (EPYG), and African farmers (AGR).
Figure 4.
Different models simulating the demographic regime of the WPYG and EPYG groups and the mean proportion of small distances (Ψ0.5) obtained in comparisons with simulated statistics.
Times are in generations. Tbot and Sbot are the time and strength of the bottleneck, respectively. Trec and Srec are the time and strength of the population-size recovery, respectively. Modeling details and the prior distributions of parameters are given in Table S8. We calculated the mean Ψ0.5 for a given model and set of parameters, by resampling, among 100,000 simulations, 100 sets of 10,000 simulations of the model, calculating Ψ0.5 for each set and reporting the mean Ψ0.5 across sets. The model with one bottleneck (Tbot: 100–1000 generations, Sbot = 5) and one recovery (Trec = Tbot-5 generations, Srec: 0.2–0.5) generated, for the WPYG group, the maximum Ψ0.5 in 76% of cases when compared with all models, and in 96% of cases when compared with only constant population-size models. For the EPYG group, the model with one bottleneck (Tbot: 10–100 generations, Sbot = 10–20) generated the maximum Ψ0.5 in 28% of cases when compared with all models, and in 100% of cases when compared only with constant population-size models.
Figure 5.
Four possible models explaining the branching history of African farmers, Western Pygmies, and Eastern Pygmies.
Arrows indicate symmetric gene flow.
Figure 6.
Prior and approximated posterior distributions of the IM model and IM parameters under the best-fit A-WE model.
Black lines represent prior distributions and gray histograms represent approximated posterior distributions obtained by the ABC method [37], except for model choice, for which the posterior distribution was estimated based on the proportions of small distances generated by each model (see Materials and Methods). Divergence times Tdiv are expressed in years and migration rates m in proportion of migrants per generation. The prior and approximated posterior distributions of the IM model and IM parameters under the best-fit A-WE model were obtained using the filtered population dataset. Those obtained using the composite population dataset are reported in Figure S3. Of note, the posterior distributions obtained with the composite population dataset were generally more narrowly peaked than those obtained with the filtered population dataset.
Table 2.
Estimates, confidence intervals, and accuracy of estimations of population separation times and levels of gene flow between WPYG, EPYG, and AGR groups, under the most probable A-WE model.