A Microsatellite Guided Insight into the Genetic Status of Adi, an Isolated Hunting-Gathering Tribe of Northeast India

Tibeto-Burman populations of India provide an insight into the peopling of India and aid in understanding their genetic relationship with populations of East, South and Southeast Asia. The study investigates the genetic status of one such Tibeto-Burman group, Adi of Arunachal Pradesh based on 15 autosomal microsatellite markers. Further the study examines, based on 9 common microsatellite loci, the genetic relationship of Adi with 16 other Tibeto-Burman speakers of India and 28 neighboring populations of East and Southeast Asia. Overall, the results support the recent formation of the Adi sub-tribes from a putative ancestral group and reveal that geographic contiguity is a major influencing factor of the genetic affinity among the Tibeto-Burman populations of India.


Introduction
Northeast India has always been a hotspot for population geneticists due to its unique and strategic geographic location and the presence of linguistically, culturally and demographically diverse populations practicing varied occupations (from huntinggathering to settled agriculture) [1][2][3]. Due to their relative geophysical isolation (flanked by the Eastern Himalayas in the northern and the Bay of Bengal in the southern region), leading to limited external gene flow, these diverse populations retain a unique population structure which in turn is expected to reflect in their gene pools.
This region exhibits linguistic diversity (represented by Tibeto-Burman, Austro-Asiatic and Indo-European language families) which can be attributed to diverse socio-cultural influences, extensive population interactions and putative long history of migrations experienced by the region in the past [4][5]. The Tibeto-Burman speaking populations predominate the region, about 2% of the total Indian population [6], representing a significant component of the biological diversity of the peopling of India. They exhibit vast diversity with respect to culture, language, subsistence economy and population structure variables like size, growth, distribution, marriage patterns and degree of endogamy [2,[7][8]. These populations are of significance in understanding the peopling of India and in comprehending the relationship prevailing among the regional populations, as well as the relationship of these populations with the neighboring East/ Southeast Asian groups to whom they are morphologically, ethnohistorically and linguistically affiliated [2,[4][5][7][8]. In view of their importance, many researchers have earlier attempted, using classical and molecular genetic markers, to address various population genetic issues pertaining to these regional groups. The studies were however sporadic and restricted to only few regional populations [9][10][11][12][13][14][15][16][17][18][19][20]. In this regard, the Tibeto-Burman speaking populations inhabiting the easternmost tip of northeast India, Arunachal Pradesh, (sharing the international border between India and Bhutan, Tibet, Myanmar) were hardly dealt with and hence there exist a dearth of population genetic studies in this region [21][22][23][24]. However, Arunachal Pradesh is of importance from a population genetic perspective, as this region has experienced cultural contacts and population interactions due to multitude waves of migration, during different periods, from the adjoining regions [25].
Arunachal Pradesh (situated between latitude 26u309N and 29u309N and longitude 91u309E and 97u309E) is the abode of 26 major Tibeto-Burman speaking tribes and 110 sub-tribes and minor tribes [25], majority claiming their descent from the Tibetan region during different time periods (evident from the available ethno-historical accounts and folklore tradition). One of the largest tribe of the region is Adi, a collective tribe distributed in the temperate and sub-tropical regions within the districts of West Siang, East Siang, Upper Siang, Upper Subansiri and Dibang Valley in central Arunachal Pradesh [25][26]. They share similar physical features of that of East Asian populations and speak Adi dialects which belong to North-Assam branch of Tibeto-Burman sub-linguistic family [27]. The ethno-history suggests their origin from southern regions of Tibet (China) and traces their migration and settlement history of their ancestors (the 'Tani' group) at different time periods during about 5 th -7 th century AD [26,[28][29][30]. There are about 12 sub-tribes of Adi, categorized under two

Results
Adi tribe, a Tibeto-Burman speaking population of northeast India, is of importance in understanding the genetic affinity among the Tibeto-Burman speaking tribes of India and neighboring populations of East/Southeast Asia. The results of the genetic affinity and diversity among Adi sub-groups, their differentiation as well as sub structuring have been presented in this study. Also, the genetic relationship of Adi with neighboring Tibeto-Burman populations of north and northeast India as well as with the linguistically divergent populations of East/Southeast Asia have been discussed.
AMOVA results, presented in Table 3, reveal that irrespective of any grouping, 2.38% of variation is attributable to differences among populations, while 11.6% of variation result from differences among the individuals within populations. The corresponding F ST value of 0.02379 indicates a low degree of genetic differentiation, among the studied groups, which might probably be attributed to the recent formation of the different factions from a common ancestral group. Two important factors that might possibly have played a key role in the genetic differentiation of Adi are: fission due to inter-tribal conflicts and relative geographic isolation of the formed contemporary Adi sub-tribes [28][29][30]. So AMOVA analyses were performed to understand the relative influence of both these factors towards the genetic differentiation of Adi (Table 3). The grouping of populations based on their geophysical location (F SC : 0.02885) as well as their ethno-history (F SC : 0.02743) did not reveal any significant differences among the groups. In both cases, the variation among the populations and within the groups was around 2.8% and among the individual within the populations was around 11.6% as in case of the single group analysis. The variation within individuals was found to be around 86% at different levels of analyses.
To understand the extent of sub structuring among the Adi subtribes we have performed structure analysis with different values of K. Simulation summary for K = 2 and K = 3, including the logarithm of estimated probability of data (Ln Prob) values, values of proportion of membership of each pre-defined populations in each of the two or three clusters and the corresponding a values are given in Table 4. The pattern of sub structuring among the studied subpopulations is depicted in Figure 2

Genetic affinity among the studied populations
The pattern of clustering and the genetic affinity between the six Adi sub-tribes are shown in the D A -NJ phylogenetic trees (supplementary Figure S1) and the PCA plot ( Figure 3). The studied populations depict a single close cluster of four populations (Panggi, Komkar, Padam and Pasi-Lower), the remaining two populations, viz., Adi Pasi-Upper and Adi Minyong separating away from the others. The PCA plot also show a similar pattern of clustering, substantiating the pattern obtained from the dendrogram, with the exception of Adi Pasi-Lower which was distantly located from the Panggi-Komkar-Padam cluster.
Genetic relationship of Adi with other populations a) Tibeto-Burman speaking populations of India. The D A -NJ phylogenetic tree, depicting the genetic relationship of Adi subpopulations with the sixteen neighboring Tibeto-Burman speaking populations of north and north-east India, is shown in supplementary Figure S2 and the corresponding PCA plot is depicted in Figure 4. Overall, the geographically proximate Tibeto-Burman populations tend to cluster together. The phylogeny exhibits 3 distinct major clusters. Cluster I consists of two sub-clusters, where the first subcluster, Ladakh-Sikkim sub-cluster, includes 4 populations from Ladakh

Discussion
Adi tribe comprises of several sub-tribes settled in relative geophysical isolation since several generations. They exhibit socio- Table 2. Pair-wise comparison of studied populations, at the analyzed loci, to investigate the extent of population differentiation. Significant values are in bold cultural as well as linguistic diversity coupled with wide variation in subsistence pattern (ranging from hunting-gathering to settled agriculture). There were very few sporadic biological studies among some sub-tribes of Adi [18][19][21][22][23][24][32][33][34][35][36] and several of the isolated sub-tribes are yet to be investigated. This is perhaps the first ever comprehensive molecular genetic study attempted to investigate the genetic affinity and diversity among the sub-tribes of Adi and their relationship with other Tibeto-Burman tribes of India and the populations of East and Southeast Asia. The results of the allele frequency variation at 15 STR loci reveal the underlying microsatellite diversity among the studied sub-tribes of Adi. The extent of deviation of the studied loci from HWE, among Adi, differs at the sub-structural level that is indicative of their unique population structure. For instance, Pasi-Upper and Panggi deviate from HWE at maximum number of loci (6 and 4 respectively) whereas Minyong show deviation at only one locus and Padam shows none. This scenario in case of Pasi-Upper and Panggi populations is probably due to their small size and relative isolation in remote Upper Siang hilly regions. In contrast, least deviation in Minyong (one locus) and absence of deviation in Padam might be the resultant of their comparative large population size distributed over plain East Siang regions, in proximity to the urban area.
The least average heterozygosity value among Panggi (,74%) might be explained by their small size, strategic location and preferential marriage practices among clans prohibiting external gene flow [37]. Strikingly, the maximum average heterozygosity value (,78%) was observed among Pasi-Upper in spite of their small population size and relative isolation. The results obtained from the exact test of population differentiation, for the 15 STR loci, also show wide diversity among the Adi sub-tribes. The least significant difference between Adi Padam and other subpopulations (at almost 7 loci) might be due to the fact that the other sub-tribes have formed from the larger Padam tribe, one of the earliest settlers of the region; however this requires further validation.
According to the folklore tradition of Adi, formation of their sub-groups was guided by fission-fusion process as a result of inter tribal war fares in the recent past. The above ethno-historical information is supported by the low average G ST value (2.34%) among Adi which is an indication of the low degree of genetic differentiation among the sub-groups. This low degree of differentiation among the sub-groups is further substantiated by the clustering pattern obtained from the PCA plot and phylogeny, where all the sub-tribes form a single close cluster.
A close cluster of Komkar and Panggi sub-tribes, observed in the PCA plot and the phylogenetic tree, corroborates with their geographic proximity and also the clustering of Padam with this group yields further support to the ethno-historical account that Komkar and Panggi sub-groups were formed from the larger Padam group [26,[29][30]. The separation of Pasi-Upper and Pasi-Lower in the phylogeny, despite belonging to the same ancestral group, could be the consequence of the migration of a few close kin-groups from their ancestral population of Adi Pasi at the Upper Siang district to the plain areas at the East Siang district. As a result of isolation, there have been changes in the marriage  patterns leading to higher endogamy among the Adi Pasi-Upper as against the inter-tribal marriages and low endogamy among the Adi Pasi-Lower. AMOVA results also show least genetic differentiation among the sub-tribes of Adi. The low F SC values (around 2.7%) irrespective of ethno-historical or geographical grouping of the populations suggest that the formation of the sub-populations was a recent phenomenon and that the ethno-history and geography had less influence on the overall genetic make up of the populations. So in spite of the sociocultural, geographic and linguistic diversity, Adi sub-groups remain genetically less differentiated. However, this observation needs to be speculated as the increase in the number of samples and the microsatellite loci might contradict the above observation. The STRUCTURE analysis also support the findings of AMOVA, wherein no clear sub-structuring was observed among the Adi subpopulations (for both K = 2 and K = 3 runs). Overall, the low average G ST values, close clustering in PCA plot and phylogeny, low F ST and F SC values of AMOVA and absence of discrete substructuring among Adi sub-tribes support their recent formation from a common ancestral group.
The   common migration history of these populations and the Manipur tribes. This further confirms the preliminary results of our earlier microsatellite study on Adi Pasi-Lower and other Tibeto-Burman populations of India [24].
The inclusion of populations from East and South-east Asia in the phylogenetic analyses reveals the clustering of the Luoba ethnic group of Tibet with the Adi groups of Arunachal Pradesh. According to the ethnologue information, Luoba Tibetan (Boga'er Luoba), categorized under the North-Assam branch of the Tibeto-Burman sub linguistic family, is also alternatively referred to as Adi/Abor and is supposed to have been derived from the 'Tani' group, the putative ancestral population of Adi. They are located in southern fringes of central Tibetan region, which is adjacent to the Upper Siang district of Arunachal Pradesh. The clustering of Luoba with Adi further supports the ethno-historical accounts of their putative common origin.
The fifty populations from East and Southeast Asian countries along with the Adi and other Indian Tibeto-Burman populations show an interesting pattern of clustering. Some Tibeto-Burman populations of India (e.g. Adi tribes of Arunachal Pradesh, populations from Ladakh, Mizoram, Sikkim, and Garo of Meghalaya) get clustered with the Tibetan populations from Tibet and China whereas some others (e.g. Drokpa, Balti of Ladakh and Bhutia of Sikkim) cluster along with Southeast Asian populations. All the morphologically similar populations, irrespective of their linguistic affiliation, cluster together possibly with respect to their geography and ethno-historical account of migration.
Overall, Adi and other Tibeto-Burman speaking populations of India are regionally well differentiated and exhibit genetic affinity with the neighboring populations of East/Southeast Asia, based on their shared ethno-history. However, a clearer picture will possibly emerge from the analysis of increased number of informative genetic markers and from the uniparental markers like mitochondrial DNA and Y chromosome. Further, to understand the genetic relationships between Adi sub-tribes and other neighboring Tibeto-Burman speaking populations, the generated autosomal STR data of Adi was compared with the published allele frequency data (for the nine common loci) of other sixteen Tibeto-Burman speaking populations from north (Ladakh) and northeast (Mizoram, Manipur, Sikkim, Nagaland and Meghalaya) India [39][40][41][42][43][44]. Also the observation that Tibeto-Burman speakers of the Indian subcontinent share similar physical features with that of the East and Southeast Asian populations instigated us to comprehend the genetic status of Tibeto-Burman speakers of India (including Adi) amidst the linguistically diverse but physically akin populations of East/Southeast Asia. So we compared the populations of north and northeast India, based on the available allele frequency data of nine common STR loci, along with that of East/Southeast Asia . Due to the unavailability of the genotype data for the studied reference populations, we had no other option but to restrict our analyses based on the available allele frequency data. Details of all the studied populations, their sample size, ethnic and linguistic affiliations, geographical locations, subsistence patterns and their literature sources are given in the supplementary Table S1.

DNA isolation and microsatellite typing
High molecular weight DNA was isolated, from the collected blood samples of Adi sub-tribes, using the standard phenol/ chloroform method [69]. One to 10 ng of individual DNA template were amplified for fifteen tetranucleotide repeat loci (D5S818, FGA, D8S1179, D21S11, D7S820, CSF1PO, D3S1358, THO1, D13S317, D16S539, D2S1338, D19S433, vWa, TPOX, and D18S51) on Gene-Amp PCR 9700 thermal cycler (Applied Biosystems, Foster City) by using the AmpFI STRH Identifiler kit (Applied Biosystems, Foster City) according to manufacturer's instructions. While the amplified products of Pasi and Minyong sub-tribes were separated on a 4% polyacrylamide gel using the ABI Prism 377 automated DNA sequencer (Applied Biosystems), the amplified fragments of Panggi, Komkar and Padam sub-tribes were separated and detected using the ABI PrismH 3100-Avant Genetic Analyzer (Applied Biosystems, Foster City). The resultant data was then analyzed using GeneScan TM Analysis Software (Version 3.7) and the allele designations were done with Genotyper TM DNA Fragment Analysis Software (Version 3.7) (Applied Biosystems, Foster City). The laboratory experiments were carried out following all the quality control measures.

Statistical Analyses
The allele frequencies of the 15 STR loci were calculated, from the obtained genotype data of the Adi sub-tribes, using the DNATYPE software [70]. The observed heterozygosity (h) at each locus and the probability of homozygosity (P) were estimated to evaluate the extent and magnitude of genetic diversity among the sub-groups of Adi. Also, likelihood ratio test (LR) and the exact test (ET) were performed to test the possible divergence of each locus from the Hardy-Weinberg Expectations (HWE) [71][72]. Based on the allele frequencies of the 15 autosomal STR loci of six Adi populations, the locus-wise genetic diversity (G ST ) [73][74] and the population-wise average heterozygosity were estimated to understand the degree of genetic differentiation and the within-population heterogeneity respectively. Locus-wise exact test of population differentiation, using Arlequin 3.01, was performed to analyze the extent of genetic diversity at the studied loci among the six Adi sub-tribes [75].
To understand the genetic relatedness between the six studied Adi populations; pair-wise genetic distances using the modified Cavalli-Sfroza distance (D A ) and the standard genetic distance (D ST ) measures of Nei et al., [76] were computed using the software DISPAN [77]. Subsequently, the conventional rectangular form of two phylogenetic trees: the unweighted pair group method with arithmetic mean (UPGMA) tree and neighborjoining (NJ) tree were constructed based on the D A and D ST distance measures by employing the software DISPAN. To check for the reliability and consistency of the clustering pattern of the obtained dendrograms, a total of 1000 and also 10,000 bootstrap replications were separately performed. In order to further explore the topology of the obtained phylogenetic trees including the positions and lengths of the branches, branching patterns as well as the cluster formation, the radiation form of the trees were also constructed using the phylogenetic software Mega v3.1 [78]. Since D A distance measure is the most efficient for obtaining correct phylogenetic trees under various evolutionary conditions and also is least affected by small size [79], and because UPGMA and the NJ phylogenies depict a similar pattern of relationship between the populations, our discussions are based only on the D A -NJ trees.
To characterize the clustering trends exhibited by these studied populations, the data dimensionality was reduced by performing a covariance analysis between factors [Principle Component Analysis (PCA)]. This analysis was performed based on the D A distance matrix, of the six Adi sub-groups, using SPSS software (Version 11.0), Chicago, IL, USA. The PCA plot further substantiates the dendrogram clustering method, and especially when bootstrap values of the dendrogram are considerably low, the similar clustering in both the PCA plot and the dendrogram indicates the consistency of the results obtained.
In order to investigate the genetic variation within and between the sub-populations of Adi, Analysis of Molecular Variance (AMOVA) was performed using Arlequin 3.01 [75]. Also, the significance of the AMOVA values was estimated by use of 10,000 permutations. Three levels of analyses were performed, wherein at the first level the six Adi sub-groups [Pasi-Upper, Pasi-Lower, Minyong, Panggi, Komkar and Padam] were considered as a 'single group'. At the second level, the six Adi populations were categorized into 'two groups' based on their geophysical locations To obtain a vivid insight into the sub structuring among Adi sub-tribes, a model-based clustering method was employed, using genotype data consisting of unlinked markers, as implemented in Structure 2.1 program [80]. The program was performed by using 100,000 MCMC replications after a 20,000 burn-in length. Simulations were done with different values of K (from 1 to 5) under the assumption of admixture model and correlated allele frequencies among populations. Each run was carried out several times to ensure consistency of the results.
In addition to the above analyses performed on Adi populations, we also conducted the comparative analyses of Adi sub-groups with sixteen Tibeto-Burman speaking populations of north and northeast India and also with other neighboring East and Southeast Asian populations that share similar physical features with that of Adi. Phylogenetic analysis (as described above) as well as Principle Component Analysis, were performed on these populations, based on the available allele frequency data, to understand their underlying genetic affinity and also to obtain a better clarity of the genetic status of Adi populations with the Tibeto-Burman speaking regional populations and linguistically diverse other global populations of East/Southeast Asia.

Supporting Information
Table S1 Sample size, geographical distribution, linguistic affiliation and the subsistence pattern of the studied populations. Government of Arunachal Pradesh especially of the Siang districts. We thank Dr. Kashyap, CFSL for providing the laboratory facilities and the required materials to carry out the experiments and Dr. R Trivedi, CFSL for technical support. We also thank the research scholars of CFSL for their help in laboratory experiments. We acknowledge all the reviewers for their valuable comments which have helped us to improve our manuscript substantially.