Janet S. Ziegle is an employee of Applied Biosystems. Ajay K. Royyuru, Laxmi Parida and Daniell E. Platt are employees of IBM. Asif Javed and Pandikumar Swamikrishnan, both members of the Genographic Consortium are also employees of IBM. There is no patenting or profit making to be declared. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials.
Conceived and designed the experiments: VJK AKR RSW RMP. Performed the experiments: GA VJK VSA AS KSA JSZ RMP. Analyzed the data: GA VJK DFSH LP CTS DEP RMP. Contributed reagents/materials/analysis tools: DFSH JSZ LP DEP. Wrote the paper: GA DFSH CR TGS CTS DEP RMP. Field work, sample identification and collection of samples and demographic data: GA VJK VSA AS KSA KTG KV MN MJ RMP.
¶ Consortium members are listed in Acknowledgements.
Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that both Y chromosome and caste designation are paternally inherited, and genotyped 1,680 Y chromosomes representing 12 tribal and 19 non-tribal (caste) endogamous populations from the predominantly Dravidian-speaking Tamil Nadu state in the southernmost part of India. Tribes and castes were both characterized by an overwhelming proportion of putatively Indian autochthonous Y-chromosomal haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) with a shared genetic heritage dating back to the late Pleistocene (10–30 Kya), suggesting that more recent Holocene migrations from western Eurasia contributed <20% of the male lineages. We found strong evidence for genetic structure, associated primarily with the current mode of subsistence. Coalescence analysis suggested that the social stratification was established 4–6 Kya and there was little admixture during the last 3 Kya, implying a minimal genetic impact of the Varna (caste) system from the historically-documented Brahmin migrations into the area. In contrast, the overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation were best explained by the emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new insights into the relative influences of past demographic events for the population structure of the whole of modern India.
Contemporary Indian populations exhibit a high cultural, morphological, and linguistic diversity, as well as some of the highest genetic diversities among continental populations after Africa
The distribution of deep-rooted Indian-specific Y-chromosomal and mitochondrial lineages suggests an initial settlement of modern humans in the subcontinent from the early out-of-Africa migration
The lack of consensus among previous studies may reflect difficulties associated with the conflicting relationships between genetics and the socio-cultural factors used to pool truly endogamous groups into broader categories, sometimes grouping Indian populations sampled from a wide variety of geographical locations together, such as a tribe-caste dichotomy or caste-rank hierarchy
Here, we attempted to apply this strategy to unravel the population structure and genetic history of the southernmost state of India, Tamil Nadu (TN), which is well known for its rigid caste system
In the present study, we examined the Y-chromosomal lineages of 1,680 individuals sampled from 12 tribal and 19 non-tribal well-defined endogamous populations. We first investigated whether tribal and non-tribal groups shared a common genetic heritage and characterized the proportion of putatively autochthonous and non-autochthonous Indian Y-chromosomal haplogroups. It is important to note that the total sample size used here is higher than those in other studies covering the entire Indian subcontinent. Further, the detailed anthropological annotation of endogamous populations sampled from a restricted region within India, together with the paleoclimatic, archeological and historical regional-background were all important aspects needed to reduce the confounding relationships among socio-cultural factors. This general approach allowed us to infer important genetic signals and the finer details of the population demographic histories. Therefore, we sought to determine which of the classifications based either on the Varna system (rank status, tribe-caste dichotomy), or social-cultural factors (reflecting subsistence, traditional customs and native language), or geography better indicated true endogamous groups by exhibiting higher between-population differences and lower within-population variation. Since both Y chromosome and caste designation are paternally inherited, we further explored whether any of these genetic differences could be attributed to the historical evidences of the establishment of the Hindu Varna system. In contrast, we found the overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation correlated better with archeological evidences and the demographic processes of Neolithic agricultural expansions into the region.
Tamil Nadu, the land of Tamils (Tamil has the most ancient literary tradition of all Dravidian languages), is the southeastern most province of India, measuring 130,058 km2 with a population of 62,405,679 (2001 Indian Census:
The majority of tribal populations are located in the mountains of the Western Ghats. The color codes are: Red – Hill Tribe Foragers (HTF); Turquoise – Hill Tribe Cremating (HTC); Green – Hill Tribe Kannada (HTK); Grey – Schedule Castes (SC); Pink – Dry-Land Farmers (DLF); Deep Blue – Artisan and Warriors (AW) and Yellow – Brahmin related (BRH). Population abbreviations are as shown in
While many previous Indian population studies aimed to elucidate the main processes involved in the genesis of the social stratification by pooling populations into broad classifications such as caste-tribe dichotomy and social hierarchy
Major Group | Code |
Population Name | Linguistic Family | Native Language | Social Rank |
Mode of Subsistence | Code |
Sampled District | Coordinates |
# | Census |
HTF-Hill Tribe Foragers | PNY | Paniya | DR | Tamil/Malayalam | Tribe | Foragers/Cultivators | PNY | Nilgiris | 10.6055 ; 77.4056 | 72 | 9121 |
PLN | Paliyan | DR | Tamil | Tribe | Honey Gatherers | PLN | Theni | 9.671 ; 77.2472 | 95 | 3,052 |
|
PLY | Pulayar | DR | Tamil/Malayalam | Tribe | Foragers | PLY | Coimbatore | 10.3514 ; 76.9068 | 63 | 8,406 |
|
IRL | Irula | DR | Tamil | Tribe | Foragers | IRL | Nilgiris | 10.6138 ; 77.4056 | 80 | 155,606 |
|
KDR | Kadar | DR | Tamil | Tribe | Foragers | KDR | Coimbatore | 10.2808 ; 76.9639 | 28 | 568 |
|
HTC-Hill Tribe Cremating | KNK | Kanikaran | DR | Malayalam | Tribe | Foragers/Shifting Cultivation | KNK | Tirunelveli | 9.0952 ; 77.3203 | 17 | 3,136 |
THD | Thoda | DR | Toda | Tribe | Domestication | THD | Nilgiris | 11.1721 ; 77.029 | 26 | 1,560 |
|
KOT | Kota | DR | Tamil | Tribe | Domestication/Metallurgy | KOT | Nilgiris | 11.1469 ; 76.9713 | 62 | 1,140 |
|
HTK-Hill Tribe Kannada | BTK | Betta Kurumba | DR | Kannada | Tribe | Honey Gatherers | BTK | Nilgiris | 11.6623 ; 76.5278 | 17 | 34,747 |
KTK | Kattunaickan | DR | Kannada | Tribe | Foragers | KTK | Nilgiris | 11.6124 ; 76.9349 | 46 | 45,227 |
|
KMB | Kurumba | DR | Kannada | Tribe | Honey Gatherers | KMB | Nilgiris | 11.7766 ; 76.9754 | 35 | 5,498 |
|
MKB | Mullukurumba | DR | Kannada | Tribe | Foragers | MKB | Nilgiris | 11.7081 ; 77.1066 | 29 | 4,354 |
|
SC-Schedule Caste | PRN | Parayar NTN | DR | Tamil | Low | Agriculture Labourers | PRN | N.Arcot | 12.4194 ; 79.1179 | 52 | 1,860,519 |
PRY | Parayar | DR | Tamil | Low | Agriculture Labourers | PRY | Madurai | 9.9392 ; 78.2544 | 24 | 1,117,197 |
|
PLR | Pallar | DR | Tamil | Low | Agriculture Labourers | PLR | Tirunelveli | 10.0183 ; 78.0292 | 51 | 2,272,265 |
|
PRV | Paravar | DR | Tamil | Low | Coastal Fishermen | PRV | Trichendur | 8.9904 ; 78.1978 | 27 | 2,035 |
|
DLF-Dry Land Farmers | YDV | Yadhava | DR | Tamil | Middle | DLF/Cattle keepers | YDV | Madurai | 9.8705 ; 78.1316 | 107 | 760,041 |
VNR | Vanniyar | DR | Tamil | Middle | DLF | VNR | Erode | 12.187 ; 78.837 | 21 | 760,041 |
|
VNN | Vanniyar NTN | DR | Tamil | Middle | DLF | VNN | N.Arcot | 12.3596 ; 79.2876 | 96 | 760,041 |
|
NDT | Nadar TNV | DR | Tamil | Middle | DLF/Toddy Tapping | NDT | Tirunelveli | 8.7659 ; 77.4824 | 59 | 603,189 |
|
NDC | Nadar Cape | DR | Tamil | Middle | DLF/Toddy Tapping | NDC | Kanyakumari | 8.1717 ; 77.6037 | 98 | 603,189 |
|
PLK | Piramalai Kallar | DR | Tamil | Middle | DLF | PLK | Madurai | 9.6733 ; 77.7706 | 53 | 260,000 |
|
MRV | Maravar | DR | Tamil | Middle | DLF | MRV | Ramnad | 9.3365 ; 78.8015 | 80 | 423,012 |
|
AW-Artisan&Warriors | VLR | Valayar | DR | Tamil | Low | Net Weavers/Hunter Gatherers | VLR | Madurai | 9.7465 ; 78.335 | 95 | 300,000 |
TML | Tamil Jains | DR | Tamil | Middle | Weavers of Mats/Wet Land Agriculture | TML | N.Arcot | 12.1719 ; 79.0377 | 100 | 100,000 |
|
EZV | Ezhava | DR | Tamil | Middle | Warriors/Toddy Tapping | EZV | Kanyakumari | 8.1554 ; 77.4322 | 95 | 300,000 |
|
MKV | Mukkuvar | DR | Tamil | Low | Fishnet Weaving/Fishing | MKV | Kanyakumari | 8.2144 ; 77.2772 | 17 | 100,000 |
|
BRH-Brahmins | SRT | Sourashtra | IE | Saurashtri | Middle | Wet Land Agriculture/Weavers | SRT | Madurai | 9.8777 ; 77.9301 | 40 | 87,149 |
BHC | Brahacharanam | IE | Sanskrit |
High | Wet Land Agriculture/Priests | BHC | Tirunelveli | 8.525 ; 77.4361 | 21 | 494,721 |
|
IGR | Iyengar | IE | Sanskrit |
High | Wet Land Agriculture/Priests | IGR | Madurai | 8.6117 ; 77.6522 | 11 | 494,721 |
|
VDM | Vadama | IE | Sanskrit |
High | Wet Land Agriculture/Priests | VDM | Tirunelveli | 8.5854 ; 77.7261 | 63 | 494,721 |
- 2001 Census, Government of India,
-1981 Indian Census.
-1931 Indian Census.
- Estimated census size.
-1901 Indian Census.
- All Brahmin-related castes in Tamil Nadu,
-No information available.
-Population code used in PCA & MDS plots,
-Sanskrit is the language of scriptures and ceremonies, but populations quickly adopted local cultures and languages.
-Lower, Middle & Higher social ranks are self-perceived/assigned classifications.
-Approximate coordinates.
NTN (North Tamil Nadu),TNV (Tirunelveli).
DR (Dravidian), IE (Indo-European).
DNAs were extracted from blood or mouth-wash samples using standard methods
The software ARLEQUIN 3.11
We considered the problem of how to quantify the significance of the difference between specific population group structures. AMOVA's resampling scheme compares individual group structures to the whole ensemble of randomly varied assignments of populations to groups, as well as of samples to populations. This tests the hypothesis that a specific group structure represents organization of the genetics among populations better than would be expected by chance. In our case, we had the different problem of testing whether one group structure was significantly better than another group structure. In this case, assignments were already determined, and likely are both already better than expected by chance. The question we tested was whether that variation in data randomly drawn from a population could have produced sufficient variation in the AMOVA results to account for the differences between the specific group assignments being compared by chance? Hence we resampled the STR haplotypes with replacement, modeled by a multinomial distribution, and computed the median and 95%CI's of the results using R, version 2.9.1. We tested resampling sizes up to 5,000 times, and found that 500 were sufficient to give reasonable accuracy on the median and confidence interval estimates. We therefore resampled each configuration only 500 times.
The phylogenetic relationships among Y-STR haplotypes drawn from individual haplogroups were estimated with the reduced-median (RM) network algorithm in the program Network 4.5.0
Coalescence methods, as implemented in BATWING
In addition, BATWING admixture validation tests
Besides assuming no gene flow, BATWING presupposes that the population samples are random. As a result, using BATWING to analyze the histories of individual HGs drawn from populations yields dramatically different estimates of coalescence times, times of expansion, and other population parameters because, as mentioned in the admixture modeling, BATWING is more sensitive to admixture than in-migration. Thus, BATWING may be applied to individual HGs to extract information about specific in-migration events. Further, HGs that tend to correlate strongly with overall population estimates are likely to be more representative of their common ancestral gene pool. These results may be expected in that selection of the modal population trees will tend to preserve configurations where the most common of the shared lineages comprise the strongest signals contributing to the likelihood function. Therefore, selection of modal trees acts as a filter that tends to exclude immigrating contributions, although it will be heavily influenced by inter-population migration.
In these BATWING estimates, mutation rate priors were those previously proposed
A total of 21 Y chromosome HGs were identified in the study populations (
POPULATIONS | N | C-M130 | E-M96 | F-M89 | G-M201 | H-M69 | H1-M52 | H1a-M197 | H2-Apt | J-M304 | J2-M172 | J2a1-M47 | J2a3-M68 | K-M9 | L1-M27 | L3-M357 | O-M175 | P-M45 | Q-M242 | R-M207 | R1a1-M17 | R2-M124 | Nei Gene Diversity (SD) | |
|
||||||||||||||||||||||||
Paniya | 72 | 15.28 | 0.00 | 75.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.39 | 1.39 | 0.00 | 0.00 | 0.00 | 1.39 | 1.39 | 2.78 | 1.39 | 0.418 (0.067) | 0.00 | |
Paliyan | 95 | 10.53 | 0.00 | 55.79 | 2.11 | 2.11 | 11.58 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.11 | 3.16 | 0.00 | 0.00 | 0.00 | 0.00 | 3.16 | 0.00 | 0.659 (0.049) | 9.47 | |
Pulayar | 63 | 1.59 | 0.00 | 57.14 | 0.00 | 6.35 | 11.11 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.59 | 1.59 | 0.00 | 1.59 | 3.17 | 0.00 | 0.00 | 0.00 | 0.640 (0.060) | 15.87 | |
Irula | 80 | 6.25 | 0.00 | 36.25 | 0.00 | 18.75 | 7.50 | 0.00 | 8.75 | 0.00 | 0.00 | 0.00 | 0.00 | 16.25 | 0.00 | 0.00 | 0.00 | 1.25 | 1.25 | 0.00 | 2.50 | 0.799 (0.028) | 1.25 | |
Kadar | 28 | 10.71 | 0.00 | 28.57 | 0.00 | 0.00 | 32.14 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.749 (0.032) | 28.57 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
Kanikaran | 17 | 0.00 | 0.00 | 11.76 | 5.88 | 0.00 | 29.41 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 23.53 | 0.00 | 0.00 | 0.00 | 5.88 | 5.88 | 5.88 | 5.88 | 0.875 (0.058) | 5.88 | |
Thoda | 26 | 7.69 | 0.00 | 3.85 | 0.00 | 0.00 | 11.54 | 0.00 | 0.00 | 0.00 | 0.00 | 38.46 | 0.00 | 7.69 | 3.85 | 3.85 | 3.85 | 0.00 | 0.00 | 0.00 | 11.54 | 0.834 (0.061) | 7.69 | |
Kota | 62 | 0.00 | 0.00 | 8.06 | 0.00 | 1.61 | 30.65 | 0.00 | 0.00 | 0.00 | 0.00 | 6.45 | 1.61 | 0.00 | 0.00 | 0.00 | 0.00 | 4.84 | 4.84 | 22.58 | 19.35 | 0.815 (0.026) | 0.00 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
Betta Kurumba | 17 | 0.00 | 0.00 | 58.82 | 0.00 | 0.00 | 11.76 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 17.65 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.88 | 5.88 | 0.640 (0.116) | 0.00 | |
Kattunaickan | 46 | 2.17 | 0.00 | 21.74 | 0.00 | 17.39 | 41.3 | 0.00 | 0.00 | 2.17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.17 | 4.35 | 0.00 | 4.35 | 0.761 (0.044) | 4.35 | |
Kurumba | 35 | 2.86 | 0.00 | 11.43 | 0.00 | 2.86 | 65.71 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.86 | 5.71 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.86 | 5.71 | 0.561 (0.096) | 0.00 | |
Mullukurumba | 29 | 0.00 | 0.00 | 20.69 | 0.00 | 0.00 | 34.48 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 3.45 | 24.14 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 17.24 | 0.776 (0.036) | 0.00 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
Parayar NTN | 52 | 7.69 | 0.00 | 3.85 | 1.92 | 3.85 | 34.62 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.62 | 1.92 | 1.92 | 0.00 | 1.92 | 1.92 | 3.85 | 9.62 | 0.836 (0.037) | 17.31 | |
Parayar | 24 | 4.17 | 0.00 | 0.00 | 8.33 | 0.00 | 20.83 | 0.00 | 0.00 | 4.17 | 0.00 | 0.00 | 0.00 | 12.50 | 4.17 | 4.17 | 0.00 | 0.00 | 8.33 | 12.50 | 8.33 | 0.920 (0.029) | 12.50 | |
Pallar | 51 | 1.96 | 0.00 | 5.88 | 7.84 | 5.88 | 11.76 | 0.00 | 1.96 | 0.00 | 0.00 | 1.96 | 0.00 | 15.69 | 5.88 | 0.00 | 0.00 | 1.96 | 1.96 | 13.73 | 9.80 | 0.914 (0.015) | 13.73 | |
Paravar | 27 | 0.00 | 0.00 | 3.70 | 0.00 | 0.00 | 14.81 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 18.52 | 0.00 | 7.41 | 0.00 | 0.00 | 3.70 | 3.70 | 11.11 | 0.815 (0.052) | 37.04 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
Yadhava | 107 | 2.80 | 0.00 | 5.61 | 1.87 | 3.74 | 19.63 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.87 | 20.56 | 0.00 | 0.93 | 0.00 | 0.00 | 0.93 | 14.95 | 10.28 | 0.860 (0.013) | 16.82 | |
Vanniyar | 21 | 0.00 | 0.00 | 9.52 | 4.76 | 0.00 | 4.76 | 0.00 | 0.00 | 0.00 | 0.00 | 9.52 | 0.00 | 28.57 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 14.29 | 14.29 | 0.876 (0.043) | 14.29 | |
Vanniyar NTN | 96 | 7.29 | 1.04 | 8.33 | 3.13 | 3.13 | 13.54 | 0.00 | 3.13 | 0.00 | 0.00 | 2.08 | 0.00 | 23.96 | 2.08 | 2.08 | 0.00 | 0.00 | 2.08 | 11.46 | 9.38 | 0.889 (0.016) | 7.29 | |
Nadar TNV | 59 | 0.00 | 0.00 | 8.47 | 8.47 | 11.86 | 15.25 | 0.00 | 1.69 | 0.00 | 0.00 | 0.00 | 0.00 | 28.81 | 0.00 | 0.00 | 0.00 | 3.39 | 0.00 | 6.78 | 10.17 | 0.861 (0.025) | 5.08 | |
Nadar Cape | 98 | 4.08 | 4.08 | 5.10 | 9.18 | 7.14 | 7.14 | 0.00 | 1.02 | 0.00 | 0.00 | 1.02 | 0.00 | 23.47 | 0.00 | 1.02 | 1.02 | 9.18 | 1.02 | 12.24 | 4.08 | 0.895 (0.015) | 9.18 | |
Piramalai Kallar | 53 | 9.43 | 0.00 | 5.66 | 3.77 | 3.77 | 16.98 | 0.00 | 1.89 | 0.00 | 0.00 | 0.00 | 1.89 | 47.17 | 1.89 | 0.00 | 0.00 | 0.00 | 0.00 | 1.89 | 3.77 | 0.745 (0.055) | 1.89 | |
Maravar | 80 | 0.00 | 0.00 | 3.75 | 8.75 | 5.00 | 10.00 | 1.25 | 1.25 | 0.00 | 0.00 | 0.00 | 3.75 | 10.00 | 0.00 | 1.25 | 0.00 | 2.50 | 7.50 | 16.25 | 15.00 | 0.904 (0.011) | 13.75 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
Valayar | 95 | 6.32 | 0.00 | 12.63 | 2.11 | 8.42 | 10.53 | 0.00 | 1.05 | 0.00 | 0.00 | 0.00 | 0.00 | 8.42 | 2.11 | 0.00 | 2.11 | 1.05 | 1.05 | 20 | 15.79 | 0.890 (0.012) | 8.42 | |
Tamil Jains | 100 | 4.00 | 0.00 | 2.00 | 2.00 | 3.00 | 22.00 | 0.00 | 3.00 | 0.00 | 0.00 | 0.00 | 1.00 | 9.00 | 2.00 | 2.00 | 0.00 | 1.00 | 0.00 | 18.00 | 20.00 | 0.862 (0.015) | 11.00 | |
Ezhava | 95 | 0.00 | 0.00 | 2.11 | 3.16 | 5.26 | 25.26 | 0.00 | 0.00 | 0.00 | 1.05 | 0.00 | 0.00 | 20.00 | 1.05 | 0.00 | 0.00 | 0.00 | 0.00 | 24.21 | 5.26 | 0.823 (0.017) | 12.63 | |
Mukkuvar | 17 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 17.65 | 0.00 | 11.76 | 0.00 | 0.00 | 0.00 | 0.00 | 5.88 | 0.00 | 0.00 | 0.00 | 0.00 | 11.76 | 11.76 | 23.53 | 0.890 (0.040) | 17.65 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
Sourashtra | 40 | 7.50 | 0.00 | 0.00 | 0.00 | 0.00 | 25.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 20.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 40.00 | 5.00 | 0.747 (0.041) | 2.50 | |
Brahacharanam | 21 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.52 | 0.00 | 9.52 | 0.00 | 0.00 | 0.00 | 0.00 | 4.76 | 0.00 | 0.00 | 4.76 | 0.00 | 19.05 | 33.33 | 4.76 | 0.848 (0.054) | 14.29 | |
Iyengar | 11 | 0.00 | 0.00 | 0.00 | 27.27 | 0.00 | 9.09 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.09 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 36.36 | 0.00 | 0.818 (0.083) | 18.18 | |
Vadama | 63 | 3.17 | 0.00 | 1.59 | 4.76 | 0.00 | 7.94 | 0.00 | 3.17 | 0.00 | 0.00 | 0.00 | 1.59 | 14.29 | 1.59 | 3.17 | 0.00 | 0.00 | 6.35 | 47.62 | 0.00 | 0.746 (0.052) | 4.76 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 populations TOTAL | 1680 | 4.4 | 0.3 | 16.25 | 3.10 | 4.70 | 17.38 | 0.06 | 1.49 | 0.12 | 9.35 | 0.06 | 1.19 | 0.77 | 13.99 | 1.13 | 0.83 | 0.36 | 1.55 | 2.02 | 12.74 | 8.21 | 0.886 (0.003) |
SD (Standard Deviation).
The geographical origins of many of these HGs are still debated. However, the associated high frequencies and haplotype variances of HGs H-M69, F*-M89, R1a1-M17, L1-M27, R2-M124 and C5-M356 within India, have been interpreted as evidence of an autochthonous origins of these lineages during late Pleistocene (10–30 Kya), while the lower frequency within the subcontinent of J2-M172, E-M96, G-M201 and L3-M357 are viewed as reflecting probable gene flow introduced from West Eurasian Holocene migrations in the last 10 Kya
AMOVA using both HGs and STR distances (R
Populations Grouping | No of groups | Among groups (Fct) | Among populations within groups (Fsc) | Within populations (Fst) | |||
SNPs | STRs | SNP |
STR |
SNP |
STR |
||
All 31 populations | 1 | 0.103 | 0.093 | ||||
Geography | 9 | 0.025 |
0.035 |
0.083 | 0.063 | 0.106 | 0.096 |
|
|||||||
7 Major Populations Groups (MPG) | 7 | 0.082 |
0.065 |
0.036 | 0.040 | 0.114 | 0.102 |
HTF excluded | 6 | 0.035 |
0.026 |
0.027 | 0.034 | 0.061 | 0.060 |
BRH excluded | 6 | 0.077 |
0.059 |
0.037 | 0.042 | 0.111 | 0.099 |
HTK excluded | 6 | 0.082 |
0.062 |
0.031 | 0.039 | 0.111 | 0.099 |
Caste vs Tribe | 2 | 0.075 |
0.062 |
0.069 | 0.065 | 0.139 | 0.124 |
TR-UP-MID-LOW | 4 | 0.057 |
0.047 |
0.065 | 0.063 | 0.119 | 0.107 |
|
|||||||
HTF-HTK-HTC | 3 | 0.110 |
0.095 |
0.081 | 0.079 | 0.182 | 0.167 |
|
|||||||
UP-MID-LOW | 3 | 0.019 |
0.015 |
0.024 | 0.030 | 0.042 | 0.044 |
SC-DLF-AW-BRH | 4 | 0.023 |
0.015 |
0.017 | 0.026 | 0.039 | 0.041 |
SC -DLF-AW | 3 | 0.009 |
0.004 |
0.016 | 0.027 | 0.025 | 0.031 |
No Significant,
TR (Tribes), HTF (Hill Tribe Foragers), BRH (Brahmins), HTK (Hill Tribe Kannada speakers), SC (Schedule Castes), DLF (Dry Land Farmers), AW (Artisan & Warriors).
HG, MID, LOW – High, Middle and Low caste-rank hierarchy as described in
Endogamous populations were grouped based on geography, tribe-caste dichotomy, caste-rank hierarchy, and socio-cultural features mainly reflecting subsistence (7 Major Population Groups, MPG). The maximal genetic variation among groups (
To determine if the number of groups taken into consideration had a significant impact on the
The PCA and MDS analyses of HG frequencies and R
(A) PCA plot based on HG frequencies. The two dimensions display 36% of the total variance. The contribution of the first four HGs is superimposed as grey component loading vectors: the HTF populations clustered in the direction of the F-M89 vector, HTK in the H1-M52 vector, BRH in the R1a1-M17 vector, while the HG L1-M27 is less significant in discriminating populations. (B) MDS plot based on 17 microsatellite loci
Interestingly, the same tribal groups showed greater genetic similarities to other Dravidian tribes from the southern states of Andhra Pradesh and Orissa, and TN BRH clustered with IE speaking populations from multiple regions, when the present data set was compared with 97 populations from India and neighboring regions by PCA (
Fisher exact tests indicated that various HGs were significantly predominant in one or another MPG (
In addition, Fisher exact tests were used to determine the probability of observing multiple populations within an MPG sharing the same over- or under-represented HGs by chance (e.g., random demic assimilation into a MPG from already differentiated endogamous populations) or because of the systemic inheritance of ancestral lineages among the constituting populations of MPGs. Our results rejected the hypothesis that random processes could have caused the significant over-representation of F*-M89 in HTF+HTK populations (p<0.0001), L1-M27 in DLF populations (p<0.001), H1-M52 in HTK populations (p<0.0001), and R1a1-M17 in BRH populations (
RM networks were constructed to evaluate HG diversification within TN populations. Here, low-reticulated networks with branches showing segregation by population were expected if strong founder effects had shaped variation in paternal lineages, particularly in the HGs overrepresented in MPGs. By contrast, reticulated networks exhibiting shared STR haplotypes between populations from different MPGs would indicate that contemporary populations were derived from descendants drawn from differing sources carrying disparate and diverse STR haplotypes, suggesting potential admixture among populations. Long branches with multiple unoccupied steps (internodes) connecting constituent haplotypes would suggest strong genetic drift or possibly sporadic intrusion from a genetically distinct source.
F*-M89 was the only HG showing clear population-specific clusters (Paniya, Paliyan and Irula of HTF) suggesting long-term isolation (
The network depicts clear isolated evolution among HTF populations with a few shared haplotypes between Kurumba (HTK) and Irula (HTF) populations. Circles are colored based on the 7 Major Population Groups as shown in
Tribes are generally considered as the descendants of the early settlers of India and, therefore, better depict the autochthonous genetic composition of India than non-tribal populations
Haplogroup | All MPG | HTF | HTK | HTC | SC | DLF | AW | BRH | |
|
Var (SE) | 0.801 (0.176) | 0.805 (0.220) | 0.682 (0.207) | 0.885 (0.181) | 0.474 (0.202) | 0.394 (0.076) | ||
Age (SD) | 29,029 (6,387) | 29,156 (7,987) | 24,723 (7,518) | 32,057 (6,571) | 17,175 (7,308) | 14,280 (2,752) | |||
|
Var (SE) | 0.810 (0.142) | 0.687 (0.126) | 0.674 (0.136) | 0.525 (0.207) | 0.704 (0.204) | 0.851 (0.194) | 0.773 (0.158) | |
Age (SD) | 29,345 (5,137) | 24,895 (4,560) | 24,418 (4,946) | 19,017 (7,515) | 25,504 (7,410) | 30,827 (7,021) | 28,026 (5,721) | ||
|
Var (SE) | 0.829 (0.182) | 0.939 (0.318) | 0.536 (0.124) | 1.048 (0.317) | 0.820 (0.267) | |||
Age (SD) | 30,037 (6,602) | 34,009 (11,531) | 19,413 (4,495) | 37,957 (11,488) | 29,696 (9,660) | ||||
|
Var (SE) | 1.327 (0.591) | 0.608 (0.226) | 0.550 (0.227) | 1.456 (0.521) | 0.906 (0.376) | 1.182 (0.372) | ||
Age (SD) | 48,073 (21,408) | 22,048 (8,177) | 20,641 (8,224) | 52,749 (18,888) | 32,822 (13,629) | 42,817 (13,479) | |||
|
Var (SE) | 0.413 (0.078) | 0.342 (0.096) | 0.294 (0.080) | 0.203 (0.039) | 0.27 (0.063) | 0.508 (0.113) | 0.508 (0.108) | 0.593 (0.122) |
Age (SD) | 14,961 (2,814) | 12,390 (3,475) | 10,652 (2,905) | 7,343 (1,423) | 9,782 (2,301) | 18411 (4,090) | 18,397 (3,921) | 21,483 (4,432) | |
|
Var (SE) | 0.594 (0.106) | 0.328 (0.176) | 0.441 (0.113) | 0.480 (0.226) | 0.672 (0.206) | |||
Age (SD) | 21,524 (3,825) | 11,874 (6,382) | 15,964 (4,107) | 17,405 (8,172) | 24,332 (7,475) | ||||
|
Var (SE) | 0.734 (0.101) | 0.420 (0.102) | 0.717 (0.131) | 0.687 (0.119) | 0.998 (0.136) | 0.762 (0.172) | ||
Age (SD) | 26,598 (3,654) | 15,205 (3,706) | 25,979 (4,748) | 24,898 (4,321) | 36,176 (4,946) | 27,605 (6,244) | |||
|
Var (SE) | 0.289 (0.109) | 0.266 (0.140) | 0.229 (0.114) | |||||
Age (SD) | 10,461 (3,937) | 9,629 (5,069) | 8,312 (4,119) | ||||||
|
Var (SE) | 0.414 (0.095) | 0.354 (0.124) | 0.218 (0.104) | 0.309 (0.117) | 0,464 (0.13) | 0.420 (0.099) | 0.416 (0.097) | 0.458 (0.132) |
Age (SD) | 15,007 (3,460) | 12,812 (4,483) | 7,890 (3,755) | 11,189 (4,242) | 16,811 (4,710) | 15,236 (3,585) | 15,090 (3,531) | 16,601 (4,781) | |
|
Var (SE) | 0.220 (0.056) | 0.348 (0.153) | 0.176 (0.062) | 0.182 (0.071) | ||||
Age (SD) | 7,982 (2,021) | 12,610 (5,542) | 6,394 (2,252) | 6,607 (2,585) | |||||
|
Var (SE) | 0.972 (0.183) | 0.730 (0.191) | 0.582 (0.126) | 1.254 (0.396) | 0.985 (0.204) | |||
Age (SD) | 35,203 (6,633) | 26,463 (6,921) | 21,099 (4,558) | 45,444 (14,351) | 35,691 (7,382) | ||||
|
Var (SE) | 0.413 (0.060) | 0.335 (0.073) | 0.387 (0.088) | 0.500 (0.135) | 0.456 (0.074) | 0.365 (0.047) | 0.369 (0.062) | |
Age (SD) | 14,974 (2,169) | 12,148 (2,653) | 14,006 (3,200) | 18,124 (4,878) | 16,510 (2,684) | 13,229 (1,721) | 13,387 (2,261) | ||
|
Var (SE) | 0.652 (0.111) | 1.048 (0.237) | 0.328 (0.137) | 0.584 (0.121) | 0.597 (0.115) | 0.642 (0.171) | ||
Age (SD) | 23,638 (4,023) | 37,960 (8,588) | 11,880 (4,975 | 21,164 (4,401) | 21,622 (4,182) | 23,246 (6,211) |
Var (Variance), SE (Standard Error), SD (Standard Deviation).
Haplogroup age estimates are given in years; groups with less than 5 STRs (samples) were excluded from calculations. Non-tribal groups (castes) displayed the oldest age estimates for most of the Y chromosome haplogroups.
We configured several BATWING runs using different subsets of data to estimate the dates of population differentiation and explore the different demographic processes and affinities among the MPGs and their constituent populations. The first set of BATWING runs analyzed haplotypes from all HGs among all of the MPGs to investigate whether tribal and non-tribal MPGs have an independent origin or instead descended from a common ancestral gene pool. If tribal and non-tribal groups have independent origins, then it would be expected that population tree bifurcations marking the differentiation of these two groupings would exhibit very old divergence time estimates and non-overlapping confidence intervals (CIs).
BATWING estimates suggest that all populations groups started to diverge 7.1 Kya (95% CI: 5.5–9.2 Kya), with limited admixture among them for the last 3.0 Kya (2.3–4.3 Kya), the youngest diverge time estimate. The modal tree shows two differentiated nodes with clear overlapping estimates of the splits: a first node including one of the tribal groups (HTC) together with all the non-tribal MPGs (castes) with a divergence time of 6.2 Kya (4.7–8.0 Kya), while the second node embraces the HTF and HTK tribal groups with an estimated divergence between then of 4.9 Kya (3.6–7.1 Kya).
The second set of BATWING runs included only haplotypes from one of the most common HGs among MPGs. In this regard, we would like to emphasize that BATWING results using haplotypes from only one HG cannot be interpreted as population divergence times, but rather reflect the demographic histories of the specific paternal lineage among populations. Also, deviations from population estimates among the different runs could reflect in-migrations (gene flow) involving a particular HG rather than multiple paternal lineages obtained from assimilation from a common ancestral gene pool. For these reasons, we explored whether the paternal lineages for each HG originated from the MPG that exhibits the highest frequency of this HG as a way to identify sources and recipients of these Y-chromosomes. In addition, similar splitting patterns obtained for the different HG trees could be interpreted as demonstrating that the paternal lineages entered into the general gene pool from the same demographic event. BATWING constructed clear modal trees for three HGs (F*-M89, L1-M27 and H1-M52) but not for the others (R1a1-M17, H-M69, J2-M172 and R2-M124). The three modal trees (
Finally, a third set of BATWING runs were performed using all HGs from individual populations within selected MPGs to test whether the grouping of these populations could have affected BATWING estimates of population divergence and phylogenetic relationships (
The study populations from Tamil Nadu were characterized by an overwhelming proportion of Y-chromosomal lineages that likely originated within India, suggesting a low genetic influence from western Eurasian migrations in the last 10 Kya. Although non-tribal groups exhibited a slightly higher proportion of non-autochthonous lineages than tribal populations, the common paternal lineages shared by TN populations are likely drawn from the same ancestral genetic pool that emerged in the late Pleistocene and early Holocene. We also noted that the current modes of subsistence have shaped the genetic structure of TN groups, with non-tribal populations being more genetically homogeneous than tribal populations likely due to differential levels of genetic isolation among them. Coalescence methods, employed to identify specific and distinctive periods when genetic differentiation among populations occurred, indicated a time scale of ∼6,000 years. We discuss below whether the timing of the male genetic differentiation of the populations fits better with archeological and historical records for the implementation of the Hindu Varna system or with agriculture expansions in the TN region.
Previous studies of Indian populations have grouped and analyzed the genetic data in the light of the Hindu Varna system
Literary works from the Sangam period (300 BCE to 300 CE) describes a heterogeneous society that predates incorporation of already established populations into the Hindu Varna system
The present study shows that the MPG classification reflects the genetic structure of the TN populations slightly better than other models, and that both tribal and non-tribal populations possess predominantly autochthonous lineages derived from a common gene pool established during the Late Pleistocene and Early Holocene. The distribution of over- and under-represented HGs suggests that populations within MPGs tend to share common genetic backgrounds. Using BATWING analysis, we estimate that social stratification for both tribal and non-tribal MPGs began between 6 Kya and 4 Kya, and detectable admixture between them has not occurred over the past 3 Kya, thereby allowing them to retain their genetic identity through cultural endogamy.
Both the overall Y-chromosomal HG distribution and the divergence estimates for tribal and non-tribal groups, are consistent with the archaeological dates and the demographic processes involved in the expansion of agriculture in South Asia. The South Deccan region near southern Karnataka and southwest Andhra Pradesh contains the earliest evidence for an integrated agro-pastoral system in South India, and likely acted as agricultural center and source of dispersion around 5 Kya
In addition to this moving frontier, broader and more static agricultural frontier zones could also have arisen at later stages. In this area, stable and growing farming populations interacted with local foragers and created new cultural traditions, with some potential inter-marriage and assimilation through trade taking place. Southern Tamil Nadu and the Kerala zone represent one such agricultural frontier zone that has persisted to the present after local foragers began to adopt cultivation based on agricultural sedentism around 3 Kya
The overall Y-chromosomal landscape of TN suggests a complex process of agricultural expansion, which can be explained in terms of the formation of moving and static frontiers since 6 Kya, followed by migrations structured by habitat and occupation. However, because gene flow and differential assimilation of incoming migrations could alter the estimated divergence dates, they should be treated with caution. Our BATWING simulations and others from a previous study
Although previous genetic studies have already drawn some of the conclusions presented here
Thus, the sampling and analytical approach employed here suggest that detailed local genetic studies within India could give us new insights about the relative influences of past demographic events in relation to other socio-cultural and economic factors that might have influenced the population structure of the whole of India that is observed today. Nevertheless, it cannot be assumed that the same demographic processes or socio-cultural factors affected Indian populations from different regions in a similar manner. Whether corresponding Y chromosome genetic patterns can be also detected in other tribal and non-tribal populations within the South Deccan or in other Indian regions that have already been identified as centers of agricultural expansions, are open questions that future studies could potentially address using the methods presented here. Finally, it would also be important to investigate the relative impact of the processes explained here on the diversity patterns in other genomic regions by studying mtDNA and autosomal variation.
(TIF)
(TIFF)
(TIFF)
(TIFF)
(XLS)
(XLS)
(XLS)
(XLS)
The authors gratefully acknowledge all participants from Tamil Nadu, whose collaboration made this study possible. We thank all the field work assistants who helped us with sampling in various expeditions. We thank Prof N. Sukumaran and Dr. D.Ramesh for their help in sampling logistics at Tirunelveli and north Tamil Nadu, respectively. We thank Chella Software, Madurai, for developing and providing the “Input” programs for Arlequin and Network softwares. We also thank Prof. Francesc Calafell, Late Prof. V.Sudarsen and Dr. Sumathi for helpful discussions, Dr. Peter Forster for kindly providing the Network Publisher software and Mrs. Mathuram for the secretarial assistance at the Madurai Genographic Center.
Christina J. Adler (University of Adelaide, South Australia, Australia), Elena Balanovska (Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia), Oleg Balanovsky (Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia), Jaume Bertranpetit (Universitat Pompeu Fabra, Barcelona, Spain), Andrew C. Clarke (University of Otago, Dunedin, New Zealand), David Comas (Universitat Pompeu Fabra, Barcelona, Spain), Alan Cooper (University of Adelaide, South Australia, Australia), Clio S. I. Der Sarkissian (University of Adelaide, South Australia, Australia), Matthew C. Dulik (University of Pennsylvania, Philadelphia, Pennsylvania, United States), Jill B. Gaieski (University of Pennsylvania, Philadelphia, Pennsylvania, United States), Wolfgang Haak (University of Adelaide, South Australia, Australia), Marc Haber (Lebanese American University, Chouran, Beirut, Lebanon), Angela Hobbs (National Health Laboratory Service, Johannesburg, South Africa), Asif Javed (IBM, Yorktown Heights, New York, United States), Li Jin (Fudan University, Shanghai, China), Matthew E. Kaplan (University of Arizona, Tucson, Arizona, United States), Shilin Li (Fudan University, Shanghai, China), Begoña Martínez-Cruz (Universitat Pompeu Fabra, Barcelona, Spain), Elizabeth A. Matisoo-Smith (University of Otago, Dunedin, New Zealand), Marta Melé (Universitat Pompeu Fabra, Barcelona, Spain), Nirav C. Merchant (University of Arizona, Tucson, Arizona, United States), R. John Mitchell (La Trobe University, Melbourne, Victoria, Australia), Amanda C. Owings (University of Pennsylvania, Philadelphia, Pennsylvania, United States), Lluis Quintana-Murci (Institut Pasteur, Paris, France), Daniela R. Lacerda (Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil), Fabrício R. Santos (Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil), Himla Soodyall (National Health Laboratory Service, Johannesburg, South Africa), Pandikumar Swamikrishnan (IBM, Somers, New York, United States), Pedro Paulo Vieira (Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil), Miguel G. Vilar (University of Pennsylvania, Philadelphia, Pennsylvania, United States), Pierre A. Zalloua (Lebanese American University, Chouran, Beirut, Lebanon).