Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

New Insights into the Lake Chad Basin Population Structure Revealed by High-Throughput Genotyping of Mitochondrial DNA Coding SNPs

  • María Cerezo ,

    Contributed equally to this work with: María Cerezo, Antonio Salas

    Affiliation Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Medicina Legal, Facultade de Medicina, Universidade de Santiago de Compostela, CIBERER, Galicia, Spain

  • Viktor Černý,

    Affiliation Archaeogenetics Laboratory, Institute of Archaeology of the Academy of Sciences of the Czech Republic, Prague, The Czech Republic

  • Ángel Carracedo,

    Affiliation Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Medicina Legal, Facultade de Medicina, Universidade de Santiago de Compostela, CIBERER, Galicia, Spain

  • Antonio Salas

    Contributed equally to this work with: María Cerezo, Antonio Salas

    Affiliation Unidade de Xenética, Departamento de Anatomía Patolóxica e Ciencias Forenses, Instituto de Medicina Legal, Facultade de Medicina, Universidade de Santiago de Compostela, CIBERER, Galicia, Spain



Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert.

Methodology/Principal Findings

Samples from twelve ethnic groups from the Chad Basin (n = 542) have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA) Single Nucleotide Polymorphisms (mtSNPs) using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg) heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups.


Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I), but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that analysis of mtSNPs at high resolution could be a fast and extensive approach for screening variation in population studies where labor-intensive techniques such as entire genome sequencing remain unfeasible.


The African Sahel together with a more southerly localized zone of savannah forms a clearly distinguishable biome. Also known as the Sudan or Macro-Sudan belt, this region displays some common linguistic features across current linguistic families [1]. The Sudan belt lacks higher mountains or other geographic barriers to migration and in genetics has been interpreted as a bidirectional corridor of human migrations [2], [3], [4]. From an ecological point of view this zone contains both high grasses in the north, and more or less dispersed shrubs and trees in the south, and comprises the natural surroundings known to humans in Africa from their early beginnings. The Sudan belt also experiences annual cycles of wet and dry seasons allowing for the coexistence of two populations with different lifestyles: nomadic pastoralists and sedentary farmers.

Approximately in the middle of the Sudan belt is the Lake Chad Basin, with Lake Chad in its imaginary centre. The Lake Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed [5]. Due to Pleistocene climatic oscillations Lake Chad has often changed in size and shape. For example, in the early Holocene, Lake Mega-Chad was formed covering a maximum surface area of 350,000 km2. Such a giant lake, the largest in Africa at the time, with a wealth of food resources undoubtedly attracted long-term human settlements. Lake Chad reached its current size of about 20,000 km2 approximately 3,000 years ago, around the time that the Sahara grew to its present size and became nearly impenetrable for humans [6]. Historically, inhabitants of the Chad Basin constantly migrated around Lake Chad in synchrony with the receding shorelines of the lake. It seems probable that in its present form Lake Chad acted as a final destination for two population movements in the Sudan belt – one from West represented by the Fulani and the other from East represented by the Arabs.

Variation in mitochondrial DNA (mtDNA) has demonstrated to be useful for the interpretation of historical and contemporary demographic events around the world, and in particular for reconstructing the evolution and origin of human populations. Most population studies carried out in Africa have been based on analysis of the control region of mtDNA, sometimes complemented with analysis of hg diagnostic coding region sites [7], [8], [9], [10], [11], [12], [13]. Although a number of complete African mtDNA genomes have been obtained and deposited in GenBank or other databases, most of these studies were focused on a phylogenetic rather than a demographic perspective [14], [15], [16].

On the other hand, variation in mtDNA is commonly analyzed using standard sequencing procedures targeting the first and/or the second hypervariable regions (HVS-I/II) or the whole control region (including HVS-I/II). Analysis of coding region mtDNA SNPs using minisequencing is another common approach [17], [18], [19], [20], [21]. However, this technique is inadequate for genotyping large amounts of SNPs. Recently, Cerezo et al. [22] reported a novel MassARRAY SNP genotyping system for genotyping the large number of SNPs located in the coding region of mtDNA using MALDI-TOF. In this work, we present the first population application of this technique, genotyping 542 samples from 12 different ethnic groups of the Lake Chad Basin.

Materials and Methods

Ethics Statement

Oral informed consent was required for the samples, and all of them were anonymized. The study was approved by the Ethical committee of the University of Santiago de Compostela. The study also conforms to the Spanish Law for Biomedical Research (Law 14/2007- 3 of July).

Population samples

We analyzed 542 individuals from 12 different ethnic populations sampled around the Lake Chad Basin. Most of these samples (n = 441; 80%) were previously reported for the HVS-I segment and selected Restriction Fragment Length Polymorphisms (RPLPs) in Černý et al. [3]. Information about the ethnic adscription and the rationale for sampling collection is provided in [3] (see also Table 1). Briefly, we have analyzed the following population samples: Hide (n = 47), Kotoko (n = 62), Mafa (n = 57), Masa (n = 41), Buduma (n = 30), Chad Arabs (n = 27), Shuwa Arabs (n = 39), Fali (n = 40), Bongor Fulani (n = 50), Tcheboua Fulani (n = 40), Kanembu (n = 50) and Kanuri (n = 59). DNA extract were then submitted to the laboratory of Santiago de Compostela where the genotyping was carried out.'

DNA extraction of new samples was carried out as described in Černý et al. [3]. All samples analyzed in the present study were previously subjected to whole genome amplification (WGA) using the Genomiphi v.2 kit (GE Healthcare Life Sciences; Uppsala, Sweden) according to the manufacturer's protocol. Only 1 µl of the original extracted DNA (at least >10 ng/µl) was used for WGA. The WGA product was subsequently diluted 1∶16 in water and then directly used for MALDI-TOF MS genotyping.

MALDI-TOF MS mtSNP genotyping

All samples were genotyped for a set of 230 mtSNPs using the technology described in Cerezo et al. [22]. Sixty of the samples (11%) were already genotyped for the whole set of mtSNPs [22] (Table 2).

The two phylogenetic trees in Figure 1 and 2 of Cerezo et al. [22] indicate all of the mtSNPs genotyped in the present study, including diagnostic control region variants.

Figure 1. Map of the Lake Chad Basin showing frequencies of the main African hgs in the different ethnic groups analyzed.

Figure 2. Phylogeny of African hgs at a medium level of phylogenetic resolution and (below branches) counts of these hgs for the different ethnic groups.

Bottom of the figure: population labels have in brackets the sample sizes; numbers below branches indicate the hg relative frequencies in each population group and in the total sample size (row “Total”); therefore, each row sums to 1. The counts for the maximum level of resolution are provided in Table S2; the full phylogenetic tree for the SNPs considered in the present study is provided in Cerezo et al. [22]. All positions in the tree refer to the revised Cambridge Reference Sequence (rCRS; [49]); all positions are transitions unless a letter indicates a transversion. Underlined positions are parallel mutations within this tree, while “!” indicates a back mutation. A deletion is indicated as “del”, while “+” indicates an insertion.

Assessment of the genotyping quality was carried out by replicating some samples from the Chad Basin in different runs plus six good quality DNA samples from the CEPH (Centre d'Etude du Polymorphisme Humain;, namely, NA10830, NA10831, NA10860, NA10861, NA11984, NA12147 (that were used as positive controls). Table S1 summarizes information on call rates per mtSNP. We have not detected genotyping inconsistencies among 7,436 replicated genotypes. For 77% of the mtSNPs the calling rate was above 90%; which can be considered quite acceptable if we take into account that all the samples were collected several years ago (some of the buccal swabs are now more than 10 years old). Some mtSNPs however virtually failed (e.g. 5128G; 5746A) or yielded poor results (650C; 9545G). More information is available in Table S1.

Given that the majority of the failed mtSNPs did not occur at the final branches, there were no problems to assign lineages to their maximum known level of phylogenetic resolution.

Standard sequencing analysis

All phylogenetic inconsistencies observed using MALDI-TOF MS were automatically sequenced using the protocol described in Álvarez-Iglesias et al. [20] and as indicated in Cerezo et al. [22]. New samples were also sequenced for the control region (see Table S2).


African phylogeny and nomenclature is very complex and has been elaborated during the last decade based on the control region and complete genome sequencing efforts [8], [9], [13], [14], [15], [23]. All of these phylogenetic efforts have been compiled in the Phylotree project (; mtDNA tree Build 11; 7 Feb 2011) [24].

Statistical analysis

DnaSP v.5 [25] was used to compute diversity indices, including nucleotide and haplotype diversity and the average number of nucleotide differences. Arlequin [26] was used to conduct analysis of molecular variance (AMOVA); the significance of the covariance components associated with different levels of genetic structure was tested using a non-parametric permutation procedure [26].

Principal Component Analysis was carried out on population hg frequencies and using R (

Lamarc (Likelihood Analysis with Metropolis Algorithm using Random Coalescence) [27] was used to estimate: (i) θ, which for females it is expected to be equal to 2Neµ for neutral mutations in mtDNA (where µ is the neutral mutation rate per generation and Ne the effective population size of females), (ii) population growth as g = -(lnθt/lnθpresent day)/t, where g is expressed as the relationship between θ at a time t>0 in the past and θ at the present (t = 0), and (iii) migration rate, defined as M = m/µ, where m is the immigration rate per generation), between the 12 ethnic groups used in the present study and using information from mtSNPs. Estimates were obtained for three independent replicates using a Bayesian framework. The jModelTest v.0.1.1. [28] software (with default heating and burn-in parameters) was used to obtain the base frequencies, mutation parameters and the best mutation model (according to the Akaike information criteria), which (for our data) was the general time reversible (GTR) model. Estimates from iModelTest were obtained using three replicates using 10 initial chains (sampling interval of 20 and burn-in period of 1000) and two final chains (sampling interval of 20 and burn-in period of 1000). Transition:transversion rate was set to 30.

Some caveats should be considered in regards to the demographic estimates obtained. The mtDNA is in reality a single locus and therefore all the values should be taken with caution: “For estimation of Theta and migration rate it is possible to get results with one region but they will improve markedly with more; doubling the number of regions nearly doubles the available information. Estimation of growth rate is very poor with less than 3 unlinked regions and particularly benefits from having more.” ( On the other hand, sample sizes for some of the groups are relatively low and therefore the impact on the different estimates is unpredictable. Last but not least, the neighboring source populations for the Chad Basin are not represented in the mathematical model (e.g. East, North, western Africa, etc); therefore, we are committed to assume a simplistic model were the provenance of the different lineages comes from one of the 12 ethnic groups considered.

For all of these computations and in order to account for missing data, failed SNPs were imputed according to known phylogeny. Given the fact that there exists a robust mtDNA worldwide phylogeny based on entire mtDNA genomes (>8,700), the phylogenetic-based approach for imputation seems more reliable than those based on e.g. metrics for linkage disequilibrium [29].


Genetic diversity in the Chad Basin populations

Several diversity indices have been computed for the 12 ethnic groups analyzed in this study. These indices have been obtained both individually for HVS-I and mtSNPs, and in combination for HVS-I plus mtSNPs (Table 2). With few exceptions, haplotype diversity yielded slightly higher values for HVS-I than for mtSNPs whereas nucleotide diversity was approximately twice as large for the mtSNPs as for HVS-I. Diversity values are very heterogeneous among the 12 population samples analyzed. The Hide sample shows high values of diversity independently of the mtDNA segment analyzed, whereas the opposite pattern was observed in the two Fulani samples. In general, the four nomadic populations included in this study have lower diversity values than the sedentary populations (Table 2).

Some minor differences may be related to language family, providing an explanation for why the Niger-Congo family has lower diversity values than the other two groups (Table 2), but the latter probably reflects the presence of the low diversity characterizing the two nomadic Fulani groups. No differences were observed in diversity values between populations located in the north (the Shuwa and Chad Arabs, Kanuri, Buduma, Kanembu, and Kotoko) versus those located in the south (the Bongor and Tcheboua Fulani, Hide, Mafa, and Masa; see map in Figure 1).

The diversity values obtained for the combined HVS-I plus mtSNPs are approximately an average of the values obtained for the two segments individually for nucleotide diversity and the average number of nucleotide differences, but, as expected, are slightly higher for haplotype diversity in most of the groups.

Phylogeography of the populations in the Chad Basin

The graphs in Figure 1 show the distribution of main African hgs in the Chad Basin. This broad hg classification clearly indicates substantial heterogeneity in the region. For instance, the Kotoko, Masa, Kanuri, and Mafa have frequencies of L3 haplotypes above 60%, in contrast with frequencies of only 30% for the Kanembu, Buduma and Chad Arabs. The Shuwa Arabs and both Fulani populations (Tcheboua and Bongor) do not have L0 haplotypes, which are approximately 11% in the Kotoko and 13% in the Hide. L2 is above 40% in the Buduma and Kanembu, but only 20% or less in the Masa, Kanuri, Fali, and both of the Fulani populations. Haplogroup M1 is present only in both Arab populations and the Buduma. Percentages of non-Sub-Saharan lineages vary also among ethnic groups (included in the category “Others” in Figure 1).

Figure 2 shows a medium resolution phylogeny indicating hg frequencies in the 12 groups genotyped. The most common sub-lineages in all the Chad Basin populations are L2a, L3b, L3f, and L3e.

Haplogroup frequencies measured to the maximum obtainable resolution for the mtDNAs genotyped in this study are shown in Table S3. The data indicate that the sub-hgs, L3b1a and L3e5, are the most common lineages in the Chad Basin (both accounting for >17% of the total sample). These two lineages are present in nearly all of the populations included in this study with the exception of the Chad Arabs (both L3b1a and L3e5) and the Buduma (L3e5). The unusual high frequency of L3e5 in the Chad Basin could be explained by a local expansion; however, because nearly all of the population samples analyzed carry L3e5 mtDNAs, it is more likely that this event occurred before the ethnogenesis of the region. At this level of resolution, it is remarkable that some sub-lineages are frequent in some populations but nearly absent in the rest. For instance, L1b1a appears in the two Fulani groups with frequencies greater than 18%, but with significantly lower frequencies in the other groups (below 6% in the Kanuri, and less than 3% in the other populations). L3b1b is also observed with high frequency in the Fulani Tcheboua (15%), but only appears with low frequency in the Fulani Bongor (4%) and is absent in the rest of the populations. Chad Arabs account for all the R0a mtDNAs in the Chad Basin (19%), in agreement with the high frequency of R0a reported in the Arabian Peninsula [30] and especially in its Southern tip and Socotra island [31]. The whole genome of two of these samples was recently reported [32] and classified within the widespread subclade R0a2f characterized by a substitution at position 8251. No mtDNAs of Eurasian ancestry have been observed in samples from the Fali, Kanembu, Mafa, and Masa. The typical North African lineages (M1 and U6; 2% in the total sample) are mainly observed in the two Arab groups. Most of the M1 lineages likely come from the Mediterranean instead of East Africa. For instance, four Shuwa Arabs belong to M1a1 and another within the sub-branch M1a1a (as indicated by a transition at position 14182 and a reversion at position 16249); the distribution of M1a1 is mainly Mediterranean. The Chad Arab sample #AC92 and the two Buduma #Bu87 and #Bu89 belong to the M1a3 branch, which has a predominant Mediterranean distribution. Finally, two other Buduma samples belong to M1a3, also mainly of a Mediterranean distribution [23]. The three mtDNAs belonging to U6 were found in one Shuwa Arab, one Kanuri and one Mafa, with the one in the Kanuri belonging to U6a5, again a Mediterranean branch. More interestingly, the U6b lineage found in the Mafa is of the so called “Canarian Branch”, indicating that perhaps the Chad Basin could participate in the demographic wave that originally moved U6 hg towards the Canary Island from East Africa.

Analysis of molecular variance in the Chad Basin

Analyses of molecular variance (AMOVA) were carried out on the 12 Chad Basin populations analyzed in this study using the following grouping schemes: all of the populations individually, populations grouped by language family, and populations grouped according to their locations in the North or the South of the Chad Basin (Table 3). Most of the genetic variation (∼96%) was found to occur within populations, whereas variation between populations accounted for only 4%. These values were virtually the same independently of the grouping scheme. Genetic variation among these groups is therefore below the inter-population differentiation reported to exist on the African continent (∼12%; see [3]). The level of molecular resolution does not seem to be an influencing factor in the apportioning of genetic variance in the Chad Basin (Table 3), although mtSNPs do seem to contribute a subtle increment to the genetic variation.

Table 3. Apportioning of genetic variance considering different genomic regions (HVS-I, mtSNPs, and both in combination) and groups (populations, language families and geography).

Principal Component Analysis of the Chad Basin populations

PCA was carried out based on hg frequencies to the maximum level of resolution. The three first components account for a total of ∼40% of the variation, and it shows notable divergence between the different ethnic groups from the Chad Basin. Thus, the first principal component (PC1), which accounts for 15% of the variation, locates Mafa and Kanuri in one side of the plot, and the two Fulani populations in the opposite pole. PC2 (13%) shows also the Mafa in one side of the plot and the Kanembu in the other extreme. PC3 (12%) displays again Mafa in one pole opposed to the Kotoko. There are not unique features in the Mafa that makes this population different to the other ethnic groups, but an accumulative effect of several differences in hg frequencies (some of them are more pronounced than others e.g. high frequency of hgs L2b2, L3d1d, L3e2). Apart from the two Fulani samples, the two Arab ones are proximal in the plot indicating a close maternal phylogenetic relationship. The most distinctive features of the two Arab populations compared to the other Chad populations are the presence of non sub-Saharan lineages, such as R0a or M1 hgs.In agreement with the analysis of the different genetic diversity metrics, it is interesting to note that the nomadic populations are more tightly grouped in the scatter plot than the sedentary ones (Figure 3), mirroring their more reduced genetic diversity.

Figure 3. PC plot of ethnic relationships based on hg frequencies.

Percentage values in brackets refer to the amount of variation accounted by the first three principal components (PC1, PC2, and PC3). Codes for populations are as indicated in Table 1. Nomadic populations are plotted in blue while sedentary ones are plotted in red.

Population growth, effective population size and migration rates

We further estimated the population mutation parameter, θ, which in conjunction with the average mtDNA mutation rate was used to infer the effective population sizes of the different Chad Basin populations assuming a neutral model of molecular evolution (Table 4). For an average entire genome mutation rate of 1.655 × 10−8 base substitution per nucleotide per year [33], female effective population size ranges from 359,200 in Masa to 5,423,000 in Buduma (Table 4). Curiously, three out of four populations (Chad Arabs, and Bongor and Tcheboua Fulani) showed negative growth rates, while others have positive values, such as the Buduma and the nomadic Shuwa Arabs (Table 4).

Table 4. Inter-population migration rates, population growth, and effective population sizes for the different ethnic groups from the Chad Basin.

There are not unique features that would explain the observed migration rates values (although not all should be considered as reliable estimates; see M&M; and Table 4). Thus, for instance, the highest migration rate was obtained for Kanuri into Masa (Table 4); from a phylogeographic point of view these two populations share the highest amount of sub-clades (Table S3), and they are displayed together for the PC3 (and to a minor extent for the PC2) in the PCA. Bongor Fulani have the highest frequencies for hg L1b1a and L3b1a, which could explain their influence in the Fali.


The values of the diversity indices computed for the HVS-I and the mtSNPs show clear-cut differences, mirroring the fact that haplotype diversity is enriched in the HVS-I segment by the presence of rare (or private) variants, whereas the agglomeration of identical sequences into different hgs (possibly suffering from bias in mtSNPs selection) enriches the nucleotide diversity. Therefore, values computed using these different mtDNA segments summarize different aspects of the molecular diversity in populations. In fact, diversity values of HVS-I and mtSNPs for the 12 different ethnic groups analyzed moderately correlate for the haplotype diversity (h, r2 = 0.90), but very poorly correlate for the nucleotide diversity (π, r2 = 0.42). Thus, one can speculate different demographic scenarios for each population according to their differential diversity values. For instance, nomadic or semi-nomadic populations tend to experience loss of diversity by genetic drift (assuming moderate admixture with those populations they meet sporadically), reducing both haplotype and nucleotide diversities. Provided that the gene flow is negligible for a given effective population size, the HVS-I region of more ancestral nomadic populations would be expected to retain more haplotype diversity (due to the presence of more rare variants) than younger nomadic groups. Thus, although the (semi-)nomadic populations (the Chad Arabs, Shuwa Arabs, Bongor Fulani, and Tcheboua Fulani) all have very low nucleotide and haplotype diversity values compared with sedentary populations, the two Arab populations retained higher values of haplotype diversity in the HVS-I segment than the Fulani (this signal is not as clear for nucleotide diversity in the HVS-I segment, possibly due to the low inter-population differences observed for all of the population groups analyzed). This hypothesis is compatible with the fact that nomadic populations as a whole have lower diversity than sedentary groups (Table 2). Finally, values for the average number of pairwise differences and nucleotide diversity are highly correlated, as expected given that both indices are based on the same principles.

Demographic inferences carried out only using summarizing indices (such as nucleotide and haplotype diversities) have to be considered with caution because in reality each human population defined only by ethno-linguistic criteria is composed of an amalgamation of genetic lineages of different ages and origins, and therefore, none has a simple past.

Haplogroup patterns vary substantially among the different ethnic groups studied. In some, hg composition seems to correlate well with historical documentation and their known demographic past. Thus, the presence of R0a only in Chad Arabs is expected given the high frequency of this hg in Southern Arabia [32]. In addition, the M1 sub-lineages observed in our samples have a mainly Mediterranean distribution, and are exclusively found in the two Arab populations and the Buduma (also located in the northern Chad Basin). The spread of this hg to the African Sahel (and possibly further into the Chad Basin) might have been mediated by the Tuareg nomads [34]. Also, the presence of L1c sub-lineages in the Hide (Cameroon; Chad Basin) compared with the rest of the populations indicate narrow contact of this population with Central African populations (including Pygmy populations), where this lineage is found with high frequency [7], [8], [16].

The few Eurasian profiles observed in the Chad Basin did not cluster in any particular ethnic group. Their control region segments are not informative from a phylogeographic point of view, and these sequences are broadly distributed around Eurasia. The only exception is the U5b1 HVS-I profile T16189C C16192T C16270T C16320T that is detected mainly in Africa [35] and is a perfect match with a sample from Spain (

With the exception of a few population studies based on complete genomes [36], [37] or coding region segments [38], [39], [40], most of the genotyping studies carried out to date were based on control region sequences [8] and/or mtSNPs at a low to moderate level of hg definition [18], [19], [20], [21], [41], [42]. Some other studies [15], [43], [44], [45] focused on phylogenetic issues by genotyping selected branches of the mtDNA tree, but did not consider the population as a whole. The mtSNPs genotyped in this study were designed to identify mtDNAs of African ancestry to the maximum level of molecular resolution provided by known phylogeny based on complete genome sequences. In theory, the 230 mtSNPs should be able to discriminate among 147 different terminal branches of the Sub-Saharan phylogeny (L-hgs), along with dozens of intermediate hgs, and other African non-L branches (such as sub-hgs of U6 or M1). Moreover, the mtSNPs allow for a more rigorous classification of mtDNAs into hgs due to the (average) low mutation rate characterizing these SNPs compared with the mutation rate in the control region [33].

Analyses of mtSNPs in combination with sequencing information (control region) has provided new insights regarding population features of the Chad Basin populations [3], and open new perspectives for new pan-African phylogenetic studies as well as for the reconstruction of the patterns of Trans-Atlantic slave trade into America:

  1. Since the mtSNPs used in the present study were designed to detect major and minor branches of the phylogeny, while the control region variation analyzed previously [3] accounts for both (unbiased) common and rare variants, the evolutionary histories told by both sets of markers are different. Thus, as dissected in the present study, different styles of life (nomadic versus sedentary populations) can leave different signatures in the two sets of markers.
  2. We have estimated for the first time different demographic parameters of the Chad Basin populations, including Ne, population growth, and migration rates. Nomadic populations show signals of negative growth (as also indirectly indicated by diversity metrics), which not always coincide with those that have higher Ne; in fact, both parameters are just moderately correlated (r2 = 0.59). However, larger sample sizes would be required in order to yield solid figures for all these parameters.
  3. Analysis of mtSNPs have allowed to reveal new phylogeographic features in the Chad populations, not discussed previously [3]. Given that this is the first study analyzing a pan-African mtSNP hg panel to a population level (with the only exception of a test sample analyzed previously [22]), it is still not possible to make inferences concerning the gene flow of neighboring source populations to the different Chad ethnic groups; however, the present study provides a high resolution hg map for future African studies.
  4. It has been previously demonstrated [7], [9], [46] that only broad patterns of variability can be established in Africa with the current mtDNA data (basically HVS-I segments); therefore, tracking ‘African-American’ lineages to particular African regions might be fraught with problems due to the low level of genetic resolution. Analysis of mtSNPs, as undertaken in the Chad Basin, could help to achieve new insights into the patterns of Atlantic slave trade.
  5. Analyzing mtSNPs to a high level of hg resolution in populations allows a better selection of mtDNAs for further entire genome sequencing or for the design of multiplex mtSNP panels of interest in population, medical, and forensic genetics [20], [21], [47], [48].

In conclusion, given that genotyping mtSNPs is straightforward compared with the intense effort demanded by sequencing complete genomes, the present study opens the door to more ambitious pan-African studies that would improve our knowledge on the mtDNA phylogeography in this continent.

Supporting Information

Table S1.

mtSNP calling rates and number of phylogenetic inconcistencies in the global dataset. Mutation hits in Soares et al [33] are also given for the phylogenetic inconsistencies.


Table S2.

mtSNP genotypes and control region data for the 542 individuals from the Chad Basin region genotyped.


Table S3.

Haplogroup frequencies in all the population samples analyzed at the maximum level of resolution provided by the control region and the mtSNPs.



We thank María Torres for his assistance with the genotyping and sample management. MALDI-TOF genotyping was carried out in the CEGEN (Centro Nacional de Genotipado) node from Santiago de Compostela (Spain).

Author Contributions

Conceived and designed the experiments: MC AS. Performed the experiments: MC AS. Analyzed the data: MC AS. Contributed reagents/materials/analysis tools: VC AC AS. Wrote the paper: MC AS.


  1. 1. Güldemann T (2008) The Macro-Sudan belt: towards identifying a linguistic area in northern sub-Saharan Africa;. In: Heine B, Nurse D, editors. Cambridge: Cambridge University Press.
  2. 2. Bereir RE, Hassan HY, Salih NA, Underhill PA, Cavalli-Sforza LL, et al. (2007) Co-introgression of Y-chromosome haplogroups and the sickle cell gene across Africa's Sahel. Eur J Hum Genet 15: 1183–1185.
  3. 3. Černý V, Salas A, Hájek M, Žaloudková M, Brdička R (2007) A bidirectional corridor in the Sahel-Sudan belt and the distinctive features of the Chad Basin populations: a history revealed by the mitochondrial DNA genome. Ann Hum Genet 71: 433–452.
  4. 4. Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, et al. (2009) The genetic structure and history of Africans and African Americans. Science 324: 1035–1044.
  5. 5. Batello C, Marzot M, Touré AH, Kenmore PE (2004) The future is an ancient lake: traditional knowledge, biodiversity, and genetic resources for food and agriculture in Lake Chad Basin ecosystems. Food and Agriculture Organization of the United Nations., and FAO Inter-Departmental Working Group on Biological Diversity for Food and Agriculture.
  6. 6. Kropelin S, Verschuren D, Lezine AM, Eggermont H, Cocquyt C, et al. (2008) Climate-driven ecosystem succession in the Sahara: the past 6000 years. Science 320: 765–768.
  7. 7. Salas A, Carracedo Á, Richards M, Macaulay V (2005) Charting the Ancestry of African Americans. Am J Hum Genet 77: 676–680.
  8. 8. Salas A, Richards M, De la Fé T, Lareu MV, Sobrino B, et al. (2002) The making of the African mtDNA landscape. Am J Hum Genet 71: 1082–1111.
  9. 9. Salas A, Richards M, Lareu MV, Scozzari R, Coppa A, et al. (2004) The African diaspora: mitochondrial DNA and the Atlantic slave trade. Am J Hum Genet 74: 454–465.
  10. 10. Richards M, Macaulay V, Hill C, Carracedo Á, Salas A (2004) The archaeogenetics of the dispersals of the Bantu-speaking peoples. In: Jones M, editor. Studies in honour of Colin Renfrew. Cambridge: McDonald Institute for Archaeological Research. pp. 1363–1349.
  11. 11. Beleza S, Gusmão L, Amorim A, Carracedo Á, Salas A (2005) The genetic legacy of western Bantu migrations. Hum Genet 117: 366–375.
  12. 12. Plaza S, Salas A, Calafell F, Corte-Real F, Bertranpetit J, et al. (2004) Insights into the western Bantu dispersal: mtDNA lineage analysis in Angola. Hum Genet 115: 439–447.
  13. 13. Kivisild T, Reidla M, Metspalu E, Rosa A, Brehm A, et al. (2004) Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am J Hum Genet 75: 752–770.
  14. 14. Torroni A, Achilli A, Macaulay V, Richards M, Bandelt H-J (2006) Harvesting the fruit of the human mtDNA tree. Trends Genet 22: 339–345.
  15. 15. Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, et al. (2008) The dawn of human matrilineal diversity. Am J Hum Genet 82: 1130–1140.
  16. 16. Quintana-Murci L, Quach H, Harmant C, Luca F, Massonnet B, et al. (2008) Maternal traces of deep common ancestry and asymmetric gene flow between Pygmy hunter-gatherers and Bantu-speaking farmers. Proc Natl Acad Sci U S A 105: 1596–1601.
  17. 17. Coble M (2004) The identification of single nucleotide polymorphisms in the entire mitochondrial genome to increase the forensic discrimination of common HV1/HV2 types in the Caucasian population. Washington: The George Washington University.. 206 p.
  18. 18. Álvarez-Iglesias V, Barros F, Carracedo Á, Salas A (2008) Minisequencing mitochondrial DNA pathogenic mutations. BMC Med Genet 9: 26.
  19. 19. Álvarez-Iglesias V, Jaime JC, Carracedo Á, Salas A (2007) Coding region mitochondrial DNA SNPs: targeting East Asian and Native American haplogroups. Forensic Sci Int Genet 1: 44–55.
  20. 20. Álvarez-Iglesias V, Mosquera-Miguel A, Cerezo M, Quintáns B, Zarrabeitia MT, et al. (2009) New population and phylogenetic features of the internal variation within mitochondrial DNA macro-haplogroup R0. PLoS ONE 4: e5112.
  21. 21. Quintáns B, Álvarez-Iglesias V, Salas A, Phillips C, Lareu MV, et al. (2004) Typing of mitochondrial DNA coding region SNPs of forensic and anthropological interest using SNaPshot minisequencing. Forensic Sci Int 140: 251–257.
  22. 22. Cerezo M, Černý V, Carracedo Á, Salas A (2009) Applications of MALDI-TOF MS to large-scale human mtDNA population-based studies. Electrophoresis 30: 3665–3673.
  23. 23. Olivieri A, Achilli A, Pala M, Battaglia V, Fornarino S, et al. (2006) The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa. Science 314: 1767–1770.
  24. 24. van Oven M, Kayser M (2009) Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum Mutat 30: E386–394.
  25. 25. Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451–1452.
  26. 26. Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131: 479–491.
  27. 27. Kuhner MK (2006) LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22: 768–770.
  28. 28. Posada D (2008) jModelTest: phylogenetic model averaging. Mol Biol Evol 25: 1253–1256.
  29. 29. Elson JL, Majamaa K, Howell N, Chinnery PF (2007) Associating mitochondrial DNA variation with complex traits. Am J Hum Genet 80: 378–382; author reply 382-373.
  30. 30. Černý V, Mulligan CJ, Rídl J, Žaloudková M, Edens CM, et al. (2008) Regional differences in the distribution of the sub-Saharan, West Eurasian, and South Asian mtDNA lineages in Yemen. Am J Phys Anthropol 136: 128–137.
  31. 31. Černý V, Pereira L, Kujanová M, Vašíková A, Hájek M, et al. (2009) Out of Arabia-the settlement of island Soqotra as revealed by mitochondrial and Y chromosome genetic diversity. Am J Phys Anthropol 138: 439–447.
  32. 32. Černý V, Mulligan CJ, Fernandes V, Silva NM, Alshamali F, et al. (2010) Internal diversification of mitochondrial haplogroup R0a reveals post-Last Glacial Maximum demographic expansions in South Arabia. Mol Biol Evol 28: 71–78.
  33. 33. Soares P, Ermini L, Thomson N, Mormina M, Rito T, et al. (2009) Correcting for purifying selection: an improved human mitochondrial molecular clock. Am J Hum Genet 84: 740–759.
  34. 34. Pereira L, Černý V, Cerezo M, Silva NM, Hájek M, et al. (2010) Linking the sub-Saharan and West Eurasian gene pools: maternal and paternal heritage of the Tuareg nomads from the African Sahel. Eur J Hum Genet 18: 915–923.
  35. 35. Rando JC, Pinto F, González AM, Hernández M, Larruga JM, et al. (1998) Mitochondrial DNA analysis of northwest African populations reveals genetic exchanges with European, near-eastern, and sub-Saharan populations. Ann Hum Genet 62(Pt 6): 531–550.
  36. 36. Gonder MK, Mortensen HM, Reed FA, de Sousa A, Tishkoff SA (2007) Whole-mtDNA genome sequence analysis of ancient African lineages. Mol Biol Evol 24: 757–768.
  37. 37. Tanaka M, Cabrera VM, González AM, Larruga JM, Takeyasu T, et al. (2004) Mitochondrial genome variation in eastern Asia and the peopling of Japan. Genome Res 14: 1832–1850.
  38. 38. Herrnstadt C, Elson JL, Fahy E, Preston G, Turnbull DM, et al. (2002) Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences from the major African, Asian, and European haplogroups. Am J Hum Genet 70: 1152–1171.
  39. 39. Kivisild T, Shen P, Wall DP, Do B, Sung R, et al. (2006) The role of selection in the evolution of human mitochondrial genomes. Genetics 172: 373–387.
  40. 40. Finnilä S, Lehtonen MS, Majamaa K (2001) Phylogenetic network for European mtDNA. Am J Hum Genet 68: 1475–1484.
  41. 41. Coble MD, Vallone PM, Just RS, Diegoli TM, Smith BC, et al. (2006) Effective strategies for forensic analysis in the mitochondrial DNA coding region. Int J Legal Med 120: 27–32.
  42. 42. Brandstätter A, Salas A, Niederstätter H, Gassner C, Carracedo Á, et al. (2006) Dissection of mitochondrial superhaplogroup H using coding region SNPs. Electrophoresis 27: 2541–2550.
  43. 43. Achilli A, Perego UA, Bravi CM, Coble MD, Kong Q-P, et al. (2008) The phylogeny of the four pan-American MtDNA haplogroups: implications for evolutionary and disease studies. PLoS ONE 3: e1764.
  44. 44. Pala M, Achilli A, Olivieri A, Kashani BH, Perego UA, et al. (2009) Mitochondrial haplogroup U5b3: a distant echo of the epipaleolithic in Italy and the legacy of the early Sardinians. Am J Hum Genet 84: 814–821.
  45. 45. Perego UA, Achilli A, Angerhofer N, Accetturo M, Pala M, et al. (2009) Distinctive Paleo-Indian migration routes from Beringia marked by two rare mtDNA haplogroups. Curr Biol 19: 1–8.
  46. 46. Salas A, Torroni A, Richards M, Quintana-Murci L, Hill C, et al. (2004) The phylogeography of mitochondrial DNA haplogroup L3g in Africa and the Atlantic slave trade. Am J Hum Genet 75: 524–526.
  47. 47. Salas A, Amigo J (2010) A reduced number of mtSNPs saturates mitochondrial DNA haplotype diversity of worldwide population groups. PLoS One 5: e10218.
  48. 48. Mosquera-Miguel A, Álvarez-Iglesias V, Lareu MV, Carracedo Á, Salas A (2009) Testing the performance of mtSNP minisequencing in forensic samples. Forensic Sci Int Genet 3: 261–264.
  49. 49. Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, et al. (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23: 147.