Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Deep Rooting In-Situ Expansion of mtDNA Haplogroup R8 in South Asia



The phylogeny of the indigenous Indian-specific mitochondrial DNA (mtDNA) haplogroups have been determined and refined in previous reports. Similar to mtDNA superhaplogroups M and N, a profusion of reports are also available for superhaplogroup R. However, there is a dearth of information on South Asian subhaplogroups in particular, including R8. Therefore, we ought to access the genealogy and pre-historic expansion of haplogroup R8 which is considered one of the autochthonous lineages of South Asia.

Methodology/Principal Findings

Upon screening the mtDNA of 5,836 individuals belonging to 104 distinct ethnic populations of the Indian subcontinent, we found 54 individuals with the HVS-I motif that defines the R8 haplogroup. Complete mtDNA sequencing of these 54 individuals revealed two deep-rooted subclades: R8a and R8b. Furthermore, these subclades split into several fine subclades. An isofrequency contour map detected the highest frequency of R8 in the state of Orissa. Spearman's rank correlation analysis suggests significant correlation of R8 occurrence with geography.


The coalescent age of newly-characterized subclades of R8, R8a (15.4±7.2 Kya) and R8b (25.7±10.2 Kya) indicates that the initial maternal colonization of this haplogroup occurred during the middle and upper Paleolithic period, roughly around 40 to 45 Kya. These results signify that the southern part of Orissa currently inhabited by Munda speakers is likely the origin of these autochthonous maternal deep-rooted haplogroups. Our high-resolution study on the genesis of R8 haplogroup provides ample evidence of its deep-rooted ancestry among the Orissa (Austro-Asiatic) tribes.


India is a melting pot of multi-lingual populations with a unique complex genome diversity [1]. The linguistic diversity prevalent among Indian populations is associated with the presence of four linguistic families: Dravidian (DR), Indo-European (IE), Austro-Asiatic (AA) and Tibeto-Burman (TB) [1]. Of these four groups, AA tribes are considered to be the first settlers of the Indian subcontinent, representing about 30 endogamous tribal populations [2]. The AA linguistic family is traditionally divided into two basic subfamilies: Mon-Khmer and Mundari [3]. Among these two subfamilies, Mundari speakers, the traditional hunter-gatherers, are exclusively found in the Indian subcontinent [3][4]. Because Mundari populations are considered to be the earliest inhabitants of the Indian subcontinent, their migration during demic expansion of the agriculturalists in the Neolithic era, as has been suggested for Mon-Khmer speaking Nicobarese [5], appears doubtful.

Numerous studies employing evolutionary-informative markers have demonstrated the origin of various linguistic populations in India [4][13]. The phylogeny of Indian mitochondrial DNA (mtDNA) is characterized predominantly by several indigenous haplogroups dispersed exclusively throughout the Indian subcontinent and partially by West Eurasian lineages [8][13]. The autochthonous mtDNA haplogroups in Indian populations include: U2a-c R5-8, R30, R31, N1d and N5 in haplogroup N and M2-M6, M30-50 in haplogroup M [6][13]. Among the two founder haplogroups, M and N, the former is more prevalent than the latter in the Indian populace [8], [11]. An extrapolation of studies on the N haplogroup led to the discovery of R and several other haplogroups such as U2a-c, R5-R8, R30, R31, N1d and N5 [3], [8][9]. It has been estimated that the first footprint of haplogroup R in India took place ∼65Kya and is known as the third most frequent haplogroup, encompassing 11% of the total haplogroups in India after M and N [8][9]. Significantly, haplogroups R6 and R7 are more frequent among AA speakers than among other linguistic groups [3].

Though numerous studies have been carried out on the phylogenetic characterization of haplogroup R, there is a dearth of research on its subhaplogroups. To the best of our knowledge, only eight complete mtDNA sequences of haplogroup R8 are available in the database [3], [9]. Therefore, we aim to more accurately trace the genealogy and pre-historic expansion of haplogroup R8 into the Indian subcontinent.


We analyzed a total of 5,836 samples from 104 populations across the Indian subcontinent (Figure 1) and identified 54 samples containing haplogroup R8 (Figure 2 & 3). The R8 haplogroup is defined by 13215-9449-7759-3384-2755 sites in the coding regions and single site (195) in the control region. Those HVS-I motifs of Indian populations previously defined as West Eurasian haplogroup H, when matched with revised Cambridge Reference Sequences (rCRS) [6], [13] are now redefined as haplogroup R8. The topology of the previously characterized R8 samples A165, A190, S4, [9] and recently classified Ko74, CoB41, Ko30, Ko37 and Lam10 samples [3] deviates significantly with our samples. A190 [9] grouped with our samples of Panika, Mudiraj, Dommari and Sugali, whereas S4 grouped with Lam10. Upon complete sequencing of the 54 samples, we identified 9 novel sub-haplogroups of haplogroup R8. The coalescent age for haplogroup R8 is 41.7±7.3 Kya while for the two subclades R8a and R8b are 15.4±7.2 Kya and 25.7±10.2 Kya, respectively (Figure 2). R8a is characterized by the motif 709-5510-13782, whereas R8b is characterized by 16390-15326-13194-12007-6485-456. One of the most diverse subclades of R8a, R8a1a1, is separated from the other R8a subclades by a transition at 8646 position. Both of these subclades split into various fine branches (Figure 2). Most R8 subclades are present predominantly in members of the AA language family. The spatial autocorrelation analysis revealed that the highest frequency of this haplogroup occurred towards East India, especially within Orissa (12%) (Figure 4), whereas low frequencies occurred in the Gujarat (1.8%), Madhya Pradesh (0.53%), Uttar Pradesh (0.22%), Andhra Pradesh (0.9%), Chhattisgarh (6%), Jharkhand (1.04%) and Tamil Nadu (0.18%) populations (Figure 4). The Spearman's rank correlation analysis demonstrated a significant correlation between R8 haplogroup frequency and latitude and longitude with r = −0.398 and 0.241 (p<0.05), respectively.

Figure 1. Map of India presenting the total number of samples screened from different states.

Figure 2. Phylogenetic tree of 54 complete mtDNA sequences.

Mutations are scored relative to the rCRS [16]. Additional sequences were taken from the literature and referred by symbols CS#P and CS#C [3], [9]. Suffixes A, C, G and T indicate transversions; “d” denotes deletion and plus sign (+) denotes an insertion; recurrent mutations are underlined; since the variation at 16519 is extremely hypervariable and so not shown here.

Figure 3. The median joining network of 54 mtDNAs belonging to haplogroup R8.

Circle sizes are proportional to the number of mtDNAs with that haplotype.

Figure 4. Isofrequency map of mtDNA haplogroup R8.

The color gradient demonstrates the frequency in % whereas the points describe the region taken in the study.

HVS-I sequences of the individuals within the R8 haplogroup and who belonged to 30 different ethnic populations, were subjected to estimate intra-population diversity. The diversity indices and neutrality test values are presented in Table 1. The Tajima's and Fu's Fs values showed significantly negative values in 18 and 26 populations, respectively (Table 1). Most of the populations showed similar sequence diversity values ranging from 0.8995 (0.051) in Malayan to 0.9940 (0.009) in Kanwar. Orissa populations showed relatively higher values than other populations: Savara 0.9810 (0.022), Bhumia 0.9708 (0.027), Gadaba 0.9667 (0.035), Dhurva 0.9619 (0.039) and Bonda 0.9631 (0.023). A similar trend was also observed in the mean number of pairwise differences: Savara 6.561 (3.23), Bhumia 6.269 (3.11), Gadaba 4.808 (2.47), Dhurva 4.933 (2.54) and Bonda 4.837 (2.43).

Table 1. Diversity Indices and Neutrality Tests for the Populations with R8 haplogroup, based on HVS-I sequences.

We have carried out principal component analysis (PCA) to explore the affinities among the populations possessing haplogroup R8, based on the frequency distributions. The PCA plot identified close affinities among the Orissa tribes belonging to the Austro-Asiatic linguistic family (Figure 5). Combined, PC1 and PC2 accounted for a 63.70% variance in the data.

Figure 5. Principal component (PC) analysis of 30 populations from Indian subcontinent which shows the presence of R8.

This map accounts 63.70% of the genetic variation. Population codes: GO = gond, LM = lambadi, PA = panika, MU = munda, SU = sugali, BR = bharia, KR = korku, DU = dusadh, MA = Malayan, KN = kanwar, RK = rajkoya, CH = chenchu, VA = valmiki, GD = gadaba, SA = savara, BO = bonda, DH = dhurva, BH = bhumia, DO = dommari, RD = reddy, MR = muttarasui, KY = koya, KT = kotwalia, MD = mudiraj, TD = tadvi, PD = padmashali, JU = juang, PR = pardhan, TH = thoti, SO = sonr.


High-resolution analysis of the R8 haplogroup in a total of 5,836 HVS-I (16000–16400) and 54 complete mtDNA sequences characterized two subclades: R8a and R8b. We have further refined these subclades into several subhaplogroups (R8a1, R8a1a1, R8a1a2, R8a1a3, R8a1b, R8a2, R8b, R8b1 and R8b2) based on 38 novel R8 sequences.

Existence of a comparatively high frequency of R8 in Orissa populations, especially among the AA-speaking Mundari tribes, strongly suggests that this haplogroup might have originated among the maternal ancestors of the contemporary AA speakers of the region. To substantiate this hypothesis, we estimated the coalescence time and corroborated with archeological evidence. The time for most recent common ancestors (TMRCA) of R8 (41.7±7.3 Kya) and its subclades R8a (15.4±7.2 Kya) and R8b (25.7±10.2 Kya) divulges the ancient demographic history of this haplogroup (Figure 2).

This haplogroup (R8) is also present in low frequency among the Dravidian and Indo-European speaking family, which can be explained by a language shift or local admixture with the AA-speaking family. Interestingly, this haplogroup was not found in any of the Tibeto-Burman populations analyzed in the present study.

A contour map of the R8 haplogroup revealed its distribution in different geographical regions (Figure 4). It is quite evident from the map that the frequency of this haplogroup is concentrated towards Orissa, Gujarat, Chattisgarh and Jharkhand with highest frequency in Orissa (12%). The Spearman's rank correlation analysis demonstrated a significant correlation of R8 haplogroup frequency to latitude and longitude (p<0.05), strong evidence for the relation of genes and geography to this group.

The significant negative values obtained from neutrality tests support the hypothesis of population growth. The PCA plot (Figure 5) found close affinities among the Orissa (AA tribe) population, perhaps due to the high frequency and influence of the R8 haplogroup.

High-resolution study on the origin of the R8 haplogroup provides abundant evidence of its deep-rooted ancestry among the Orissa (AA) tribes. The TMRCA estimates revealed that the initial maternal colonization of this haplogroup occurred during the mid-to-late Paleolithic period, roughly 40 to 45 Kya. The significant relation between the genes and geography is attributed by the spatial analysis of this haplogroup. Moreover, the absence of haplogroup R8 and its subhaplogroups among the Tibeto-Burman speaking populations studied implies socio-cultural practices existing among the populations to be the principle factor for genetic demarcation. Thus, the phylogeographic reconstruction of 54 complete mitochondrial sequences containing haplogroup R8 furnished a better understanding of this partially-characterized haplogroup. Our high-resolution analysis again provided a detailed coding region information for proper classification of a sample, especially in the case of the South Asian haplogroups, which contain several deep-rooted lineages sharing identical coding region mutations with the exception of the HVS-I [14][15].

Materials and Methods

Ethics Statement

All DNA samples analyzed in the present study were derived from blood samples collected with informed written consent according to protocols approved by the Institutional Ethical Committee of CCMB, Hyderabad.

The samples used in this study were obtained from the DNA bank of CCMB. We have screened a total of 5,836 individuals belonging to 104 ethnic populations from 17 states of India (see Figure 1; Supplementary information Table S1), initially for HVS-I (16000 to 16400) followed by nucleotide position at 3384. Among the 5,836 mtDNA screened, 54 were found to contain basal mutations 13215-9449-7759-3384-2755 which define haplogroup R8. 24 sets of primers were used in sequencing the complete mtDNA. Sequencing of PCR amplicons was performed using the BigDye terminator cycle sequencing kit and ABI 3730XL DNA analyzer (Applied Biosystems, Foster City, USA). The sequences were edited and assembled using AutoAssembler (version 1.4) software (Applied Biosystems, Foster City, USA) to obtain a consensus sequence. These sequences were aligned with rCRS and the mutations were noted [16].

NETWORK (version 4.5) software ( was used for phylogenetic reconstruction [17]. The phylogeny obtained was reconfirmed by means of a neighbor-joining tree (1000×bootstrapped) [18], using MEGA (version 4.0) software [19]. We followed the nomenclature system of Richards et al. [20] for reconstructing the phylogenetic tree of haplogroup R8. The isofrequency map for haplogroup R8 was constructed using the Kringing method [21] in the Surfer (version 8.0) program designed by Golden software (Golden Software Inc., Golden, Colorado). Spearman's Rank correlation coefficients between mtDNA haplogroup frequency and latitude and longitude were calculated in StatistiXL (version 1.8) software (StatistiXL, Nedlands, Western Australia) with a p-value<0.05 considered statistically significant.

Principal Component (PC) analysis of R5-R8, R30 and R31 lineages in different Indian populations was performed using SPSS (version 11) software (SPSS Inc., Chicago, IL, USA) with mtDNA haplogroup frequencies as an input vector. Coalescence time was calculated using sequence positions between nucleotides 577 to 16023 considering one base substitution per 5,140 years, excluding insertions and deletions [22]. Standard deviation of the rho (σ) estimate was calculated based on Saillard et al. [23]. Descriptive statistical indices and Neutrality tests (Tajima's D, Fu's Fs) for HVS-I sequences were calculated using Arlequin (version 2.0) software [24]. Complete mtDNA genome sequences generated in this study were submitted to GeneBank (accession numbers FJ467940–FJ467993).

Supporting Information

Table S1.

List of the caste and tribal population studied.

(0.03 MB XLS)


We thank all voluntary donors for providing blood samples, and all students and institutions that contributed to sample collection.

Author Contributions

Conceived and designed the experiments: KT LS. Performed the experiments: AN VS VKS AGR. Analyzed the data: AN VS VKS ME. Contributed reagents/materials/analysis tools: LS. Wrote the paper: KT AN VS ME. Sample collection: PKP SS SR MD NV.


  1. 1. Gadgil M, Joshi N, Manoharan S, Patil S, Prasad UVS (1998) Peopling of India. In: Balasubramanian D, Rao NA, editors. The Human Heritage. Hyderabad: Hyderabad University Press. pp. 100–129.
  2. 2. Kumar V, Reddy BM (2003) Status of Austro-Asiatic groups in the peopling of India: An exploratory study based on the available prehistoric, linguistic and biological evidences. J Biosci 28: 507–522.
  3. 3. Chaubey G, Karmin M, Metspalu E, Metspalu M, Selvi-Rani D, et al. (2008) Phylogeography of mtDNA haplogroup R7 in the Indian peninsula. BMC Evol Biol 8: 227.
  4. 4. Kumar V, Reddy ANS, Babu JP, Rao TN, Langstieh BT, et al. (2007) Y-chromosome evidence suggests a common paternal heritage of Austro-Asiatic populations. BMC Evol Biol 7: 47.
  5. 5. Thangaraj K, Sridhar V, Kivisild T, Reddy AG, Chaubey G, et al. (2005) Different population histories of the Mundari-and Mon-Khmer-speaking Austro-Asiatic tribes inferred from the mtDNA 9-bp deletion/insertion polymorphism in Indian populations. Hum Genet 116: 507–517.
  6. 6. Cordaux R, Saha N, Bentley GR, Aunger R, Sirajuddin SM, et al. (2003) Mitochondrial DNA analysis reveals diverse histories of tribal populations from India. Eur J Hum Genet 11: 253–264.
  7. 7. Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, et al. (2003) The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. Am J Hum Genet 72: 313–332.
  8. 8. Metspalu M, Kivisild T, Metspalu E, Parik J, Hudjashov G, et al. (2004) Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet 5: 26.
  9. 9. Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, et al. (2004) Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of Indian subcontinent. Am J Hum Genet 75: 966–978.
  10. 10. Reddy BM, Langstieh BT, Kumar V, Nagaraja T, Reddy ANS, et al. (2007) Austro-Asiatic tribes of Northeast India provide hitherto missing genetic link between South and Southeast Asia. PloS One 2: e1141.
  11. 11. Thangaraj K, Chaubey G, Singh VK, Vanniarajan A, Thanseem I, et al. (2006) In situ origin of deep rooting lineages of mitochondrial Macrohaplogroup ‘M’ in India. BMC Genomics 7: 151.
  12. 12. Rajkumar R, Banerjee J, Gunturi HB, Trivedi R, Kashyap VK (2005) Phylogeny and antiquity of M macrohaplogroup inferred from complete mtDNA sequence of Indian specific lineages. BMC Evol Biol 5: 26.
  13. 13. Kivisild T, Bamshad MJ, Kaldma K, Metspalu M, Metspalu E, et al. (1999) Deep common ancestry of Indian and western-Eurasian mitochondrial DNA lineages. Curr Biol 9: 1331–1334.
  14. 14. Thangaraj K, Chaubey G, Kivisild K, Reddy AG, Singh VK, et al. (2006) Response to Comment on “Reconstructing the Origin of Andaman Islanders”. Science 311: 470b.
  15. 15. Barik SS, Sahani R, Prasad BVR, Endicott P, Metspalu M, et al. (2008) Detailed mtDNA genotypes permit a reassessment of the settlement and population structure of the Andaman Islands. Am J Phys Anthropol 136: 19–27.
  16. 16. Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, et al. (1999) Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat genet 23: 147.
  17. 17. Bandelt H-J, Forster P, Rohl A (1999) Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 16: 37–48.
  18. 18. Saitou N, Nei M (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425.
  19. 19. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 24: 1596–1599.
  20. 20. Richards MB, Macaulay VA, Bandelt HJ, Sykes BC (1998) Phylogeography of mitochondrial DNA in western Europe. Ann Hum Genet 62: 241–260.
  21. 21. Delfiner P (1976) Linear estimation of non-stationary spatial phenomena. In: Guarasio M, David M, Haijbegts C, Dordrecht , Reidel , editors. Advanced geostatistics in the mining industry. Austria: pp. 49–68.
  22. 22. Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, et al. (2003) Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci USA 100: 171–176.
  23. 23. Saillard J, Forster P, Lynnerup N, Bandelt HJ, Nørby S (2000) mtDNA variation among Greenland Eskimos: The edge of the Beringian expansion. Am J Hum Genet 67: 718–726.
  24. 24. Schneider S, Roessli D, Excoffier L (2000) Arlequin ver. 2.000: Software for population genetics data analysis. Genetics and Biometry Laboratory, University of Geneva, Geneva, Switzerland.