Polymorphic Alu Insertion/Deletion in Different Caste and Tribal Populations from South India

Seven human-specific Alu markers were studied in 574 unrelated individuals from 10 endogamous groups and 2 hill tribes of Tamil Nadu and Kerala states. DNA was isolated, amplified by PCR-SSP, and subjected to agarose gel electrophoresis, and genotypes were assigned for various Alu loci. Average heterozygosity among caste populations was in the range of 0.292–0.468. Among tribes, the average heterozygosity was higher for Paliyan (0.3759) than for Kani (0.2915). Frequency differences were prominent in all loci studied except Alu CD4. For Alu CD4, the frequency was 0.0363 in Yadavas, a traditional pastoral and herd maintaining population, and 0.2439 in Narikuravars, a nomadic gypsy population. The overall genetic difference (Gst) of 12 populations (castes and tribes) studied was 3.6%, which corresponds to the Gst values of 3.6% recorded earlier for Western Asian populations. Thus, our study confirms the genetic similarities between West Asian populations and South Indian castes and tribes and supported the large scale coastal migrations from Africa into India through West Asia. However, the average genetic difference (Gst) of Kani and Paliyan tribes with other South Indian tribes studied earlier was 8.3%. The average Gst of combined South and North Indian Tribes (CSNIT) was 9.5%. Neighbor joining tree constructed showed close proximity of Kani and Paliyan tribal groups to the other two South Indian tribes, Toda and Irula of Nilgiri hills studied earlier. Further, the analysis revealed the affinities among populations and confirmed the presence of North and South India specific lineages. Our findings have documented the highly diverse (micro differentiated) nature of South Indian tribes, predominantly due to isolation, than the endogamous population groups of South India. Thus, our study firmly established the genetic relationship of South Indian castes and tribes and supported the proposed large scale ancestral migrations from Africa, particularly into South India through West Asian corridor.


Introduction
India is served as the important corridors for human migration and evolution. A small group of modern humans ventured 'out-of-Africa' through the southern coastal route to colonize the Middle East, India, Southeast Asia, Australia and subsequently the other parts of the globe [1,2]. The Indian populations are stratified as tribes and castes. In India approximately 4,635 populations, among which 532 are tribes, including 72 primitive tribes (including 36 huntergatherers) [3]. The tribal group constitutes about 8% of the total Indian population [4]. The Tamil Nadu population can be divided based on migrational history, genetics and anthropology [5]. According to 2011 census reported, of the 72.14 million and 7.2 lakhs of Tribal population. The majority of the population groups of Tamil Nadu belongs to Proto-Australoid ethnicity. The Indo-Aryan people of northern India were considered to be members of the White race; the southern Indian people were biologically distinct Indo-Dravidian race, also known variously known by anthropologists as Veddoid, Indigenous Australians or Palaeo-Indid. Thus, the people of India are a blend of Whites, Central and East Asians and Indigenous Australians (Aboriginal peoples) races. Reich et al. reported that an 'ancestral North Indian (ANI)' population shared 30-70% similarities of Middle East, Central Asia and Europe and an 'ancestral South Indian (ASI)', has no relation with any population outside of India [6].
The genomic variations among individuals may help to understand the evolutionary and migrational course of populations. The genetic variations and/or polymorphism at loci that code for expressed profess are commonly deleterious and therefore, are often negatively selected and hence eliminated. On the other hand, allelic polymorphisms, especially in the non-coding regions of the human genome are expected to be evolutionarily neutral. Of late, several insertion/deletion polymorphism have been discovered in the human genome. Alu sequences are thought to be ancestrally derived from 7SL RNA gene was mobilized through a RNA polymerase III derived transcript by a process called 'retro position' process [7,8]. Alu insertion polymorphisms identify the patterns of human genetic diversity and history, race determination, gender identification, personal identification, paternity testing. Alu insertional elements are a family of SINEs and presence of an AluI recognition site in the sequence [9]. The human genome contains 1,100,000 Alu repeats, which represents~11% of nuclear DNA [10]. It is often located in non-coding regions (intergenic spacers and introns) [11,12]. Alu insertions are 300 bp length, dimeric in structure, composed of 3' oligo (dA)-rich tail and short flanking repeats [13][14][15]. The insertion Alu polymorphism has an important application in phylogenetic analyses of human populations [16][17][18]19]. To determine the genetic differentiation among populations, Gst values (a measure of the interpopulation variability), Ht (a measure of genetic variability in total populations) and Hs (a measure of Intra population of genetic variability) for each polymorphic locus were determined. A number of 'Alu' polymorphic loci were previously been studied for many Indian populations [20,21,22,23]. However, studies on South Indian castes and tribes are meager [24,25,26,27,28]. The present study is an attempt to analyze the seven polymorphic autosomal DNA loci such as Alu ACE, Alu TPA25, Alu FXIIIB, Alu Apo, Alu D1, Alu Pv92 and Alu CD4 among castes and tribes of the state of Tamil Nadu, South India.

Materials and Methods
Population Samples and Autosomal Markers 5 ml of blood samples were collected from 574 unrelated volunteers from twelve different population groups from South India. The populations selected for the present study includes Pallan, Nair (Kerala) Namboothiri (Kerala), Kani, Vanniyar, Paliyar, Narikuravar, Sourashtra, Iyer, Vettuva Gounder, Kallar and Yadava. They belong to different geographical locations in the states of Tamil Nadu and Kerala. The sample size, location of sampling and anthropological information was given in Fig 1. The Ethnographical notes of the studied population were listed in S1 File. Institutional ethical clearance was obtained from Madurai Kamaraj University Ethical and Review Board Committee (ERC) and the informed written consent was obtained from all the individuals who participated in the study which includes demographical details such as age, gender and family history for major illness or disease if any.
The DNA samples were extracted from peripheral blood lymphocytes using a standard salting-out procedure [29]. Each DNA samples were amplified by polymerase chain reaction (PCR) using locus-specific primers for the insertion-deletion polymorphism of seven Alu elements (Alu ACE, Alu TPA25, Alu FXIIIB, Alu CD4, Alu APO, Alu D1, Alu PV92). The protocols for these markers have been described elsewhere [20][21][22]30] (S1 Table). Amplified PCR products were run on agarose gel and visualized under UV light.

Statistical analysis
Allele frequencies were calculated by direct counting for each population. Heterozygosities at individual loci and the overall average heterozygosity were calculated by using the allele frequencies for each population. Hardy-Weinberg equilibrium was tested using a χ 2 goodness of fit test, with Bonferroni's correction for multiple comparisons. The dendrograms were constructed by neighbourjoining (NJ) method [31]. Principal Component Analysis (PCA) was performed to generate a distances between populations using the raw data of allele frequencies by means of the NTSYS (Numerical Taxonomy and Multivariate Analysis System).

Allele Frequencies in Tribal Populations
The allele frequencies and heterozygosities for the insertion (+) and/or deletion (-) alleles for two tribal populations, Kani and Paliyar (Nilgiri hills, Western Ghats, Tamil Nadu) are presented in Table 1. Many of the loci studied revealed a higher level of heterozygosity in two tribal populations studied. Out of the 7 loci studied, heterozygosity was similar for loci ACE (0.

Genomic Diversity between Tribal Populations
The genetic differentiation or genetic differences among tribal populations, Gst values (a measure of the interpopulation variability) for each Alu locus was determined. The gene diversity analysis was performed based on the polymorphism data for eleven tribal populations, two from the present study (Kani and Paliyar), five South Indian Tribes (SITs) such as Badga, Irula, Kota, Kurumba and Toda [24] and four North Indian Tribes (NITs) such as Lodha, Munda, Santal and Tipperah [25] from previous studies ( Table 2). The total genomic diversity (Ht) in all the populations studied was high except for CD4 locus. The Gst value ranges from 5.7 (minimum) for CD4 to 10.4 (maximum) for APO in SITs and from 3.9 for D1 to 16.8 for APO in NITs. Thus the 'range of genetic variability' is broader for NITs than the SITs when the two groups were considered separately. However, for combined South and North Indian tribes (CSNITs), Gst value ranges from 6.0 for CD4 to 16.5 for APO. When all the loci are jointly considered, the total genetic diversity is 8.3% for SITs, 7.3% for NITs and 9.5% for CSNITs (Table 2).

Allele frequencies in Caste Populations
The allele frequencies for the insertion (+) and deletion (-) for loci Alu ACE, TPA25, FXIIIB, APO, D1, Pv92 and CD4 of 12 caste populations were presented in Table 3. Alu CD4 exhibits low level of polymorphism in many of the populations studied. All the populations showed very high levels of polymorphism for all Alu loci except CD4. Among 12 population groups studied, the heterozygosity of seven loci ranged between 0.291 (Kani)-0.468 (Yadava).

Genomic Diversity between Populations
Gst values for each polymorphic locus were determined among populations and the results were presented separately for each locus and also for all loci taken together ( Table 4). The total genomic diversity (Ht) among the populations was quite high. The Ht value ranged between 0.255 (CD4) to 0.499 (D1). When all loci are jointly considered between populations the total genetic diversity (Gst) was 3.6%.

Genetic Affinities among Tribal Populations
The genetic affinities among eleven tribal groups, 2 tribal populations of the present study namely Kani and Paliyar and the 9 tribal populations studied previously [24,25] were reconstructed using the neighbor-joining (NJ) method (Fig 2). The maximum-likelihood tree, revealed that the Dravidian speaking South Indian tribes Kurumba and Kota exhibiting close genetic affinities. The Tibeto-Burman speaking Tipperah stand out as a unique genetic entity, while Santal and Munda of Central India showed close genomic affinities. The South Indian Toda and Irula formed a clearly distinct cluster as was evident from the dendrogram which includes another South Indian tribal group Kani. The Paliyar overlaps with Kani cluster while another South Indian tribe Badga, overlaps with a North Indian Austro-Asiatic speaking Santal and Munda tribes. These different levels of clustering of North and South Indian tribal groups is highly interesting and supports different levels of admixture due to the historical and migrational histories, of populations of India. Thus the genetic diversity was higher among tribes than the castes when all the loci are jointly considered. Thus, our findings supported the fact that the tribes are isolated from the caste groups for long in Indian subcontinent.

Genetic Affinities among Endogamous Caste Populations
The phylogenetic relationship of 10 caste and 2 tribal population groups studied in the present work was presented in Fig 3. The 12 population groups from South India were grouped themselves in 7 clusters: (i) Kallar and Pallan cluster; (ii) Yadava and Sourashtra cluster; (iii) Vanniyar, Kani, Paliyar and Vettuva Gounder cluster; (iv) Nair cluster; (v) Namboothiri cluster; (vi) Iyer cluster; and (vii) Narikuravar cluster formed a separate cluster. The genetic relationships of these ethnic populations of Tamil Nadu castes and tribes were compared with other Indian populations using the polymorphic data on seven Alu insertion marker [21,23,32]. The NJ tree of 48 populations (including 12 populations from the present study and 36 populations from the previous studies (S2 Table) was presented in Fig 4. A PCA plot for 48 caste and tribal Indian populations was constructed. The total variance analysis of  allele frequencies were 33.26 and 21.66% respectively for PC1 and PC2 for the seven polymorphic Alu insertion loci (Fig 5).

Discussion
Of late, the Alu polymorphisms studies have gained importance in the making of genetic structure of human populations because of the fact as new alleles are not generated and as there is no selection pressure acting upon these loci. The present work was carried out to explore the genetic variations at a number of polymorphic Alu loci among tribal and caste populations from the state of Tamil Nadu, South India (77°and 80°E latitude and 8°and 13°N longitude). The allele frequencies and heterozygosity vAlues observed in the present study are comparable, however with minor deviations with the frequencies of other South Indian endogamous populations studied previously. The overall average heterozygosity of all the loci analyzed ranges between 0.292-0.468. The lowest level of heterozygosity in the present study was observed in Kani (0.292), a primitive tribe inhabiting Western Ghats (altitude: 2,695, mts above the sea level) Tamil Nadu. A classical study was undertaken by Watkins et al [33] to elucidate the extent of genetic differentiation of Indian populations and to trace their ancestry. This study place South Indian caste and tribal populations between European and East Asian populations. Further, this study also documented a relatively high between-group differentiation among Indian tribal groups, probably attributable to geographic and reproductive isolation ('taboo' or 'totem') and subsequent drift [34]. Previously, the Kani tribes have documented with less number of HLA alleles (immune response genes) [35]. High level of homozygosity in a number of HLA loci could be the reason for the reduced polymorphism. Thus, the observed low level of heterozygosities at Alu and HLA loci were attributed to isolation from other populations for considerably long periods of time and entertaining a tribal life in hill regions. The heterozygosity vAlue of Kani (0.292) was very close to Vysyas (0.299) reported previously by Watkins et al., [33]. Nonetheless, the caste populations showed a high level of heterozygosity that ranges between 0.325-0.469. The average heterozygosity of South Indian caste populations was similar to the vAlues (0.351-0.449) observed for different population groups of India [20,[24][25][26]33].
The observed average genetic differences (Gst) within the each caste populations of the present study was 3.6%. Stoneking et al. [20] have reported a Gst value of 8.8% for Africans, 5.8% for South East Indians, 3.6% for Western Asians, 1.1% for Europeans and 0.1% for Australians and New Guniea populations [20]. In the present study, the average Gst value (3.6%) was much higher than the Europeans, and Australians and lower than the Southeast Indians and Africans and rather surprisingly, it matches exactly with Western Asians (3.6%). Similar Gst value (3.4%) was documented for Tamil Nadu caste populations studied earlier for Alu loci such as mtNUC, Alu ACE, Alu APO, Alu FXIIIB, Alu D1, Alu CD4, Alu PLAT, Alu TPA25, Alu PV92 [32] and Western Indian populations (3.6%) studied previously [20]. Thus, our study strongly confirmed the genetic similarities between West Asian and Tamil Nadu populations studied earlier and supported the large scale coastal migrations of African populations into India through West Asian corridor. In a study, Watkins et al. [33] have reported 2.4% Gst for 12 Indian populations [33]. Previous report on castes and tribes of Andhra Pradesh and South India, have reported the average Gst value 4.8% [27]. However, the Gst of North Indian populations supposedly originated from Indo-Europeans, was observed to be 6.8% [25]. Further, Vishwanathan et al. [24] have documented a Gst value of 6.7% for South Indian tribal populations [24]. However, in our analysis, the Gst value of 7 South Indian tribals (SITs) was 8.3%. Further, the average Gst among CSNITs such as North Indian Lodha, Munda, Santal (West Bengal) and Tipperah (Tripura) [25] and the South Indian Badga, Kota, Kurumba, Irula and Toda [24] and Kani and Paliyar (present study) based on different Alu markers was 9.5%. Thus, the present and previous studies have confirmed the higher Gst values for north and South Indian tribes confirmed the presence of different genetic elements in Indian caste and tribal populations.
Phylogenetic analysis as depicted in Fig 2 revealed that the South Indian tribe inhabiting Nilgiri hills (Western Ghats), the Badaga overlap genetically with North Indian Austro-Asiatic speaking groups Santal and Munda. This is highly interesting and striking. Previously published data sets have pointed out that the Badagas might have migrated from Central or Eastern Europe. The Y-chromosome DNA marker (NRY) based study reported that Badaga tribe have a broader R1a and R1a1 Haplogroup. The R1a1 Haplogroup have spread in people from regions of Central Europe, East Europe, Scandinavia and Punjab. These and other findings have reiterated the fact that the Badaga tribe of Nilgiri hills of South India might have originated from the Eurasia. The tribal groups presently studied such as Kani and Paliyar are linguistically similar but are living in distant geographical locations in Western Ghats, of South India. The phylogenetic analyses have revealed that these two tribes overlap genetically with two other South Indian tribes, Toda and Irula. In the present study, Paliyar tribe forms a separate cluster, and overlap with North Indian Bagdi which belongs to Indo-Aryan ancestry. Kani tribe also overlaps with North Indian middle class Agharia and South Indian Narikuravars (an economically low and nomadic group), two populations that possess Indo-Aryan ancestry. The Narikuravars speak an Indo-Aryan language called Vagriboli which is a western Indian language (regions of Gujarat, Rajasthan and Maharashtra) belongs to the Indo-Aryan linguistic family. Alu marker based studied have documented that the North Indians are genetically highly diverse populations with variations scattered between individuals. With a glut of human migratory episodes and admixtures, the paternal genealogy of North Indians have revealed the genetic foot-prints and legacy of the Indo-Aryan speaking populations.
The Principal Components Analysis provides alternative methods for examining the interpopulation relationships. The present study of endogamous caste populations revealed close relationships and/or proximity in PCA as well as in NJ phylogenetic tree. The results suggested that these populations might have a common ancestry. It is highly interesting to note that an upper class population, Iyers of Madurai, overlaps genetically with UP-Brahmin. Balakrishnan et al. [36] have reported that, Iyers of Madurai, anthropologically a western Brachycephal Armenoids, having HLA similarities with many of the south East Asians, originated either from the Eurasian steppes or Central Asia [36]. Interestingly, the middle class Vettuva Gounder overlaps genetically with Iyers. Further, the Sourashtrans, an Indo-European language speaking population group that migrated from Gujarat region of West India to Tamil Nadu overlaps genetically with upper class Brahmins of West Bengal. The upper class Brahmins of Kerala, Namboothiris show a separate cluster as outer elements. Nairs overlap and cluster with North Indian endogamous groups and not with Namboothiries of Kerala. Previous studies on Nairs have reported that, they are more similar to Western European populations. Interestingly, the Piramalai Kallar population from Madurai (Tamil Nadu state, South India) forms a separate cluster as an outer branch of NJ tree. One previous study [37] have reported that, the homeland of Piramalai Kallars was somewhere in the Middle East [37]. It is possible to believe that they might have come in the first 'Out-of-Africa' migration to India, moved further and settled in south India (particularly in Madurai region). Anthropologically, Piramalai Kallars belongs to the Major Group-II (non-Brahmin low rank), thought to be of paleo-Mediterranean origin [38,39]. The clustering of castes such as Vanniyar, Pallan, Iyer and Kallar at two different points in the dendrogram/phylogenetic tree could be due to differences in the strategies of sampling and/or genotyping methodologies adopted by various research groups. These issues need to be addressed in a future multicentric study.
Indian sub-continent has witnessed a massive gene flow from varied ethnic sources over the historical periods. The gene flow could be occurred prior to the subdivision of this population into largely endogamous caste groups. Thus, it was suggested that, after the migration of modern humans from Asia, there were many rapid population explosion (s) following an initial period of isolation. To conclude, the studied Indian populations have revealed higher heterozygosities as compared to African populations. The present study thus concluded that the endogamous populations of South India have showed the amalgamation of various populations coming-in from different directions and geographical locations by the process of admixture and miscegenation. Thus, the Alu polymorphism based affinities of the South Indian populations (castes and tribes) forms a potential genetic data for mapping population migrations, histories, genetic similarities and gene-disease linkage analysis in a country known for the practice of strict endogamy and higher level of prevalence of infectious diseases. Our study thus provides (i) an evidence of presence of North/South differences in the frequencies of Alu alleles and (ii) affinities of South Indian endogamous caste and tribes with middle East and West Asian populations. These observations thus confirmed the well established notion of peopling of South India by coastal migrations of man from Africa.
Supporting Information S1 File. Ethnographic Notes on the Samples Studied. (DOCX) S1