Dissecting the Within-Africa Ancestry of Populations of African Descent in the Americas

Background The ancestry of African-descended Americans is known to be drawn from three distinct populations: African, European, and Native American. While many studies consider this continental admixture, few account for the genetically distinct sources of ancestry within Africa – the continent with the highest genetic variation. Here, we dissect the within-Africa genetic ancestry of various populations of the Americas self-identified as having primarily African ancestry using uniparentally inherited mitochondrial DNA. Methods and Principal Findings We first confirmed that our results obtained using uniparentally-derived group admixture estimates are correlated with the average autosomal-derived individual admixture estimates (hence are relevant to genomic ancestry) by assessing continental admixture using both types of markers (mtDNA and Y-chromosome vs. ancestry informative markers). We then focused on the within-Africa maternal ancestry, mining our comprehensive database of published mtDNA variation (∼5800 individuals from 143 African populations) that helped us thoroughly dissect the African mtDNA pool. Using this well-defined African mtDNA variation, we quantified the relative contributions of maternal genetic ancestry from multiple W/WC/SW/SE (West to South East) African populations to the different pools of today's African-descended Americans of North and South America and the Caribbean. Conclusions Our analysis revealed that both continental admixture and within-Africa admixture may be critical to achieving an adequate understanding of the ancestry of African-descended Americans. While continental ancestry reflects gender-specific admixture processes influenced by different socio-historical practices in the Americas, the within-Africa maternal ancestry reflects the diverse colonial histories of the slave trade. We have confirmed that there is a genetic thread connecting Africa and the Americas, where each colonial system supplied their colonies in the Americas with slaves from African colonies they controlled or that were available for them at the time. This historical connection is reflected in different relative contributions from populations of W/WC/SW/SE Africa to geographically distinct Africa-derived populations of the Americas, adding to the complexity of genomic ancestry in groups ostensibly united by the same demographic label.

: List of publications and populations included into the database of mtDNA HVS I/II sequences.              Text S1: Detailed description of mtDNA database, grouping of African ethnic groups, the historical context for African/American populations, and admixture analysis using ADMIX. Gonzáles 2006 [22] (includes Rando 1998 Figure 1 and Figure S1 assessed by: 1) typing and analyzing using STRUCTURE program ("type+structure") or 2) literature mining of the mtDNA-, NRY-, and AIMs-based estimates ("published") or 3) assembling the raw mtDNA/NRY data that were subjected to admixture analysis ("admix") using our comprehensive database for defining the parental populations (if multiple references were used, samples were first pooled and then analyzed as a single group). Values are reported as relative contribution from African, W Eurasian (mtDNA/NRY) or European (AIMs)

Africa
and Native American (mtDNA/NRY) or Native American/SE Asian populations (AIMs*, except in [51] reporting Native American ancestry only) ± SE (which is defined as sampling error for "admix", standard error for "published" (if reported) and pseudo-standard error for "type+structure" where pSE=[1/2 ( PI)]/1.645). For those cases where we estimated mtDNA/NRY-based admixture using "admix", we had to define the ancestral African, European and Native American/Asian parental populations. The details of which populations were included in the mtDNA analysis can be found in Text S1. For NRY, simply all populations from Table S2 with designation in column "Continent": European/African/Asian and "Admixed": Native were used. The details of "type+structure" methods, including the description of parental populations, is listed in Methods section under "Autosomal AIMs".   *Samples from Salas 2002 are mainly from Mozambique but also include samples from bordering neighbors, ** Semitic, Cushistic, Berberic and Chadic language groups all belong to Afro-Asiatic macro-language family, *** Bold-italic font indicates a group designation, Kanuri were combined with overlapping samples from Cameroon with East origin, † Bainouk (1), Baiote (6) sub-language groups can be grouped into 3 clusters: 1) Chadic + Nilo-Saharan + non-Bantoid Volta-Congo speakers, 2) Mande + Atlantic North and South, 3) Berber and Semitic speakers (with "among group variation" V A contributing to 2.3% of total variation, and "among populations within groups" V B contributing to 0.31% (P=0.01) of total variation, both P<0.05).    The relative contribution of African regions to the admixed populations of the Americas and Africa, analyzed using ADMIX and reported as a mean ± SD (for details on admixed populations see Text S1). Founding African populations include SW/WC Bantu, and SE and W/WC Africa subdivided according to geography. W/WC Africa was also subdivided according to language and combination of ethnicity/language/geography*. Estimates that did not fit our criteria (2SDs >0) yet were high and consistent enough to be considered guidance are indicated in grey (the high SD was due to combination of small sizes of both admixed population and source population) and the alternative relative contribution indicated by asterisk.

Table S6 (continued)
*Note: For details on "W/WC-language" and "W/WC-Ethnic groups 1" see text in Figures S3 and S4, respectively. For "W/WC-Ethnic groups 2" the groups are defined as follows (details in Figure S5 and Text S1):   Text S1: Detailed description of mtDNA database, grouping of African ethnic groups, the historical context for African/American populations, and admixture analysis using ADMIX.

1) Detailed description of mtDNA database
The added when the sequencing primers ended before this position but the authors checked for L2 status using RFLP. The variation in the database was then collapsed into 429 distinct haplotypes that were defined while considering the variation within the database.  Figure S4). Second, we combined those ethnicities that belong into the same language group, collapsing the total data into 26 distinct groups of populations (File S5). Using the same SAMOVA grouping method and after excluding 4 outlying groups with small size, the remaining 22 groups can be reduced to 9 clusters that further need to be collapsed for the purpose of admixture analysis into 6, combining small and similar populations (9 groups: v A =2.43%, v B * =0.14%; 6 groups: v A =2.28%, v B =0.46%, where genetically similar groups with n<90 were grouped; Figure S5). Notable difference was found in the Fulbe people of WC and W that were grouped in the previous analysis. Also, two new groups were separated from the West Niger-Congo cluster: Loko-Mende of Sierra Leone and Lebou-Wolof-Serer of Senegal.

2) Cluster separation of African variation based
As discussed below, these groupings provide us with higher resolution in admixture analysis for some of the admixed populations. Gabon, and Angola, and 3) only Mozambique (as the more probable source of variation) in SE Africa was considered (also based on the fact that the calculated contribution from Kenya was found to be minimal).