Revisiting the Diego Blood Group System in Amerindians: Evidence for Gene-Culture Comigration

Six decades ago the DI*A allele of the Diego blood group system was instrumental in proving Native American populations originated from Siberia. Since then, it has received scant attention. The present study was undertaken to reappraise distribution of the DI*A allele in 144 Native American populations based on current knowledge. Using analysis of variance tests, frequency distribution was studied according to geographical, environmental, and cultural parameters. Frequencies were highest in Amazonian populations. In contrast, DI*A was undetectable in subarctic, Fuegian, Panamanian, Chaco and Yanomama populations. Closer study revealed a correlation that this unequal distribution was correlated with language, suggesting that linguistic divergence was a driving force in the expansion of DI*A among Native Americans. The absence of DI*A in circumpolar Eskimo-Aleut and Na-Dene speakers was consistent with a late migratory event confined to North America. Distribution of DI*A in subtropical areas indicated that gene and culture exchanges were more intense within than between ecozones. Bolstering the utility of classical genetic markers in biological anthropology, the present study of the expansion of Diego blood group genetic polymorphism in Native Americans shows strong evidence of gene-culture comigration.


Introduction
Genetic diversity in Native Americans is an inexhaustible field of investigation. In the premolecular biology era, protein, enzyme, and red cell antigen polymorphisms were extensively studied [1,2]. An outstanding achievements using such classical genetic markers was demonstration of a biological continuum between Siberian and Native American populations [3] in concordance with archaeological, craniofacial and molecular similarities on either side of the Bering Strait [4].
The Diego blood group system was the first, and perhaps most convincing, genetic marker linking Amerindians to Siberia [5,6]. It was discovered in 1953 in a Venezuelan woman who experienced three obstetric accidents. Study showed that severe hemolytic anemia after childbirth was due to isoimmunization caused by an irregular antibody. This antibody along with gene-counting method with the formula p DI Ã A = 1-p [1-f(Di(a+))], where f(Di(a+)) is the proportion of Di a antigen carriers in the population [23]. By assuming Hardy-Weinberg equilibrium, this method allows calculation of gene frequencies from phenotype frequencies even in the presence of ambiguous cases (e.g. recessive). This approach prevents any further selection tests from those frequencies. It should also be noted that the method lacks reliability for genetic systems with more the three alleles [24].
The following cultural and environmental traits were collected for each of the 144 populations: linguistic family, subsistence strategy, crop landraces, climate conditions, and mosquito infestation (S1 File, S1 Table). As a basis for linguistic family assignment, we used Ruhlen's classification that divided the studied populations into 15 Native American language groups [25]. Pre-Columbian subsistence strategy was defined as hunter-gatherer/forager, marine hunter-gatherer, and agriculturist [26][27][28]. This 3-category breakdown is admittedly debatable since strategies surely changed over time and also because populations may have used several strategies or even switched back and forth on a seasonal basis. Subtropical Central and South America were subdivided into areas harboring wild taxa of either maize (Zea genus [29]) or cassava (Manihot genus) [30]. Lastly, ecological areas were defined according to the updated Köppen-Geiger climate classification [31] and consideration of eight Anopheline species [32].
Depicting the geographical dispersal of DI*A All DI Ã A allele frequency data were plotted onto a single map using the Kriging algorithm of SURFER software 8.0. Spatial autocorrelation analysis was performed with the GENALEX software [33]. Using this approach, it was possible to process genetic similarity in allele frequencies in function of geographic separation [34]. We were thusly able to depict the correlation coefficient (r) of DI Ã A frequency between pairs of populations showing geographic separation falling within a specified distance class.

Analysis of variance
In order to detect a pattern of the DI Ã A allele frequency amongst geographical, cultural or ecological groups, we first ran an analysis of molecular variance (AMOVA) using ARLEQUIN software 3.5 [35]. To ensure symmetrical distribution of DI Ã A variance across the Americas, a preliminary AMOVA was performed assuming that the 144 Native American populations were a single group. Subsequent analyses tried to identify the highest proportion of variance among several pool groups according to S1 Table. We then tried to determine whether culture explained a significant proportion of DI Ã A variance even when geography was held constant. This was done by considering geographic groups that showed significant proportion of variance of DI Ã A and running three procedures, i.e., a two-way ANOVA (S1 Fig), an ANOVA on geographical residuals scores, and mixed models with XLSTAT Version 2014.6.05. ANOVA models were evaluated using the Fisher F test to determine whether the amount of information provided by the selected factors was significant enough to explain the variation of DI Ã A allele frequency in comparison with a supposedly constant rudimentary model and correspond to mean DI Ã A allele frequency. The use of residuals scores allows testing of factors possibly underlying deviation of geographical residuals from normal distribution. Mixed models attempted to a linear explanation for quantitative variables based on factors associated with fixed and random effects [36]. Here we set geography and culture as fixed and random effects, and looked at the significance of Z, the matrix of the random effects.

Supplemental analysis of genetic variation associated with SLC4A1 locus
In order to distinguish patterns of variation amongst carriers of DI Ã A and DI Ã B, we looked for higher resolution genetic markers surrounding the Diego locus. Assuming that variation could have been driven alongside SLC4A1, we reappraised the variation at 8 autosomal microsatellites (STRs) around SLC4A1 in 28 Native American populations previously screened for 678 genome-wide STRs [37]. Genetic markers are D17S1294, GATA169F02, GGAA19G04, D17S1299, D17S2180, AAT245_17, and D17S1290678. They are embedded in a 20Mb-wide window on both sides of SLC4A1. Information can be found in the Mammalian Genotyping Marshfield Screening Sets 16 and 54 (http://research.marshfieldclinic.org/genetics/). We ran STRUCTURE [38] based on the admixture model from K = 2 to K = 10 with a burn-in period of 20,000 iterations followed by 10,000 iterations and displayed the results using DISTRUCT 1.1 [39]   When the Diego polymorphism (S1 Table) was portrayed in relation to language, highest DI Ã A frequencies were observed for the Equatorial-Tucanoan and Ge-Pano-Karib branches of the Southern Amerind linguistic family, particularly among Tucanoan speakers (mean frequency = 0.229), Panoan (0.223), Karib (0.194), Tupi (0.184), and Ge (0.171). The DI Ã A allele was almost completely absent (less than 2%) in Eskimo-Aleut, Na-Dene, and Keresiouan speakers from North America and Chibchan speakers from Lower Mesoamerica. Portrayal of polymorphism in relation to subsistence mode and ecological conditions indicated that DI Ã A was more frequent in hunter-gatherer/forager (mean frequency = 0.133) than in farmer (0.061) and marine hunter/gatherer (0.018) groups. Equatorial regions accounted for up to 12% of DI Ã A and frequencies decreased in gradient fashion from warm temperate (mean = 0.076) to arid (0.045), snowy (0.032) and polar (0.000) regions. A higher frequency of DI Ã A was observed in populations living in areas where the domesticated crop was cassava (0.176) rather than maize (0.081) or neither (0.039).
In the second phase of study particular attention was paid to determine if areas with similar DI Ã A allele frequencies occurred randomly or rather coincided with geography, linguistics, lifestyle, or ecoregions. Table 1 presents the percentage of DI Ã A allele frequency variation obtained using various one-way models to identify the factor most closely associated with allele variance. Overall, the Diego blood group exhibited a significant variation in the Americas accounting for 11.95% of the total frequency variance between the 144 Native American populations.
Geography accounted for significant between-group allele variation (2.77%, p = 0.043; 3.24%, p = 0.006). Grouping according to the three main linguistic phyla, i.e. Eskimo-Aleut, Na-Dene, and Amerind, failed to reveal a significant genetic structure for the system of interest (variation, 3.54%, p>0.05) whereas grouping according to Native American linguistic family as defined by Ruhlen was significant (variation, 4.71%, p = 0.001). Lifestyle accounted for 3.61% of DI Ã A allele frequency variation among Amerindians, with highest values being reached whether the populations inhabited areas with wild taxa of cassava or not (variation, 9.59% and 13.36%, p = 0.000). Grouping according to putative mode of subsistence and alimentary crop in relation to maize showed little (Zea species: p = 0.029) and no significant pattern of variance of DI Ã A (presence/absence: p = 0.489). Climate did not account for the observed DI Ã A allele differences (variation, 1.23%, p = 0.257 and 2.13%, p = 0.152) but ecological areas defined by anopheles mosquito was correlated with DI Ã A allele frequency (variation range, 7.85% to 12.38%, p = 0.000). In the third phase of study, we tested the gene-culture concept with respect to the effects of four cultural traits in the dispersal of DI Ã A amongst geographically determined populations. Table 2 presents the results obtained by ANOVA and mixed models using four cultural traits and two geographic divisions, and S1   Lastly, we tried to determine if highly polymorphic genetic markers in the vicinity of the SLC4A1 locus could help understand the observed variation (S2 Fig). Except with regard to exclusion of the Aché, Surui, Kogi, Pima, and a few Waunana individuals, the bar plots do not show any peculiar genetic structure.

Discussion
Six decades ago, demonstration that the Di a antigen was a shared red cell feature of most Native American and Asian populations served as proof that present-day North, Central and South Amerindians originated from Siberia. Despite its significant anthropological interest, the causative factors underlying the current distribution of Di a in Native Americans populations has remained unclear [21]. This study was designed to gain insight into this aspect by correlating reported data on the frequency of the allele coding for Di a (DI Ã A) with environmental and cultural factors. Our assumption was that expansion of the Diego blood group from North to South America was also driven by culture.

Diego reflects the cultural divergence of South Native Americans
Whether or not human genetic diversity occurred randomly is a fundamental question in the field of biological anthropology. Mapping genetic variation amongst populations has proven to be a useful technique to identify meaningful concordances with historic populations or ecological niches [2,40]. The data on reported here indicate that DI Ã A allele frequencies show striking contrasts between areas and that variations tend to correlate significantly with environmental and cultural traits. The strongest correlations were observed in the Amazon region as opposed to the rest of the continent and in function of linguistic family. Analysis according to linguistic family revealed a significant correlation between with the phylogenetic structure of the Amerind language family with high DI Ã A levels in Southern Amerind language speakers. High correlation with cassava and Tupi-speaking areas agrees with expansion driven by agriculture [41]. The current genetic structure of the Diego blood group may mirror the congruent divergence mechanisms that have shaped the South Amerindian genetic and linguistic diversity of Andean, Lower Meso, and Meso Amerindians [37,42,43]. Observation of significant genetic kinship between geographically proximate linguistically similar groups has been interpreted as evidence of North-to-South population migration followed by divergence resulting in genetic and linguistic variation during the complex demographic history of the Andes and Amazon Basin [37,44,45].
Another striking finding of the present study involves areas with low DI Ã A levels. Local genetic differentiation associated high genetic drift and uneven gene flow could explain this finding especially in the Yanomama populations [46] who provide an outstanding example for gene-culture modeling. Villages form due to social or political tensions as dissident groups of related individuals broke off from the main population and settled in nearby locations where they possibly mixed with other people before returning to the original population. This process involves several neighboring villages and recurs over generations. Fission-fusion is a formidable enhancer of genetic differentiation and may explain the differences in peripheral genetic behavior between the Yanomama and surrounding populations [47][48][49].
Previous studies have described the special genetic and craniometric characteristics of the Ayoreo (a.k.a. Moro) in the Chaco region. Like the Aché, Emerillon, and Yaghan people, the Ayoreo exhibit limited within-population diversity that is among the lowest in South America. These genetic peculiarities are most likely due to a severe founder effect, but the exact causative events remain unclear [50,51].
In the Isthmus of Panama, craniofacial, classical and molecular investigations have documented the genetic distinctiveness of present-day Chibchan-speakers suggesting arguing that their predecessors have lived in isolation from Central and South Amerindians since the early Holocene [37,52,53]. It is noteworthy that Chibchan-speakers are thought to be latecomers to cultivation of maize and manioc [53]. Hence, the Isthmus of Panama could represent a nexus for the southward expansion of maize cultivation and northward spread of arrowroot and manioc cultivation without gene flows probably since 7500 cal BP [54,55]. The lack of Di a antigen in the Isthmus of Panama would be consistent with the long-term preservation of the Chibchan genetic substratum.

Did comigration act alone?
Based on dispersal patterns coupled with the demographic history of Native Amerindians, it is not unexpected to see a loss of genetic diversity in most of the Eastern South Amerindians [18,44,45,56]. Herein, parceling out of DI Ã A amongst South Amerindians reinforced by spatial autocorrelation analysis suggests that isolation by distance and genetic drift in Eastern South Amerindians populations also played a non-negligible role. These factors may have led to random accumulation or depletion of DI Ã A in the same way as for private polymorphisms [46,57]. This process is highly likely and could have occurred concomitantly with the above-mentioned population expansion.
Our study also revealed a significant correlation between DI Ã A allele frequency and warm tropical conditions, domesticated crop type, and presence of disease-carrying vector species. The circumscribed areas defined by these factors compose a mosaic of specific biocenoses and pathocenoses [31,32]. It is thus reasonable to consider natural selection in the distribution of human genetic polymorphisms [58]. However, testing for selection was not possible with the frequency data and no pattern could be drawn from the reappraised autosomal microsatellites-certainly because of their distance from SLC4A1 (the closest being D17S2180 at >1.6 Mb). An alternative explanation for the correlation with tropical conditions is that ecology may have contributed to higher gene flow rates within a similar environment and induced distortion in the distribution of the DI Ã A allele frequency between distinct ecological centers [44]. Analysis of spatial autocorrelation and systematic observation of significant correlation with subtropical conditions support greater inter-ethnic exchanges within than between ecological environments [44].

Insights into peopling of the Americas
Unraveling the genetic structure of Native Americans holds the key to depicting the patterns of the initial peopling of the Americas. Deeper screening of the variation of the ABO-RH blood group, classical genetic markers, or Y-chromosome failed to show clear-cut genetic patterns amongst the O+ phenotypes or Q-M3 male lineages [59,60]. Two likely explanations involve either sample bias preventing accurate evaluation of the distribution of these sublineages or insufficient sampling not covering enough populations to allow tracing back genetic relationships at the continental level [17,19,61]. Another highly plausible reason is that most North, Central and South Amerindians have a common ancestry and thus share common genetic features throughout the Americas [16,[62][63][64][65]. Our results concord with previous screening of high-resolution genetic markers that demonstrated genetic similarities in Native Americans populations in correlation with shared geographical, linguistic and even demographic background [37,42,45,59,66,67].
Absence of DI Ã A in the northernmost and southernmost regions of the American continent is another noteworthy finding. While isolation by distance with ensuing drift likely accounts for genetic variation in Fuegians [26], an alternative interpretation can be proposed for the present-day gene pool around Beringia. It is that additional gene flows occurred as new populations continued to cross over and spread eastward and southward after the initial migratory wave that supposedly colonized the Americas 15-20 ky ago [3,42]. Secondary migration would account for the presence of non-Amerind-speakers, i.e., Eskimo-Aleut and Na-Dene speakers, from Alaska to Greenland and Southwestern North America. Though still controversial [68], reverse-migration of Beringian populations back into Siberia is also thought to have occurred [66], thereby explaining the relationship observed between the Yeniseian languages in Central Siberia and Na-Dene languages in North America [69,70]. Interestingly, absence of DI Ã A is a shared characteristic of Eskimo-Aleut, Na-Dene, and Yeniseian-speakers [2]. Our results are compatible with the hypothesis that an additional gene flow involving mostly DI Ã B allele carriers occurred in the northernmost areas of North America.
Whether or not it is realistic to attempt to reconstruct the genetic history of peopling from a single locus is pertinent question. It is difficult to ascertain that deviation of allele frequency is due to population displacement only and not to genetic drift [59,60]. Paucity in encapsulating the genetic variability also weakens the inferences. Fortunately, study of single genetic markers has furnished relevant insight into population ancestry, genetic relationships and migration. Blood group systems [16,20,71] and haplogroup-specific works [72][73][74], that involve study of expansion of no more than one lineage of SNPs, are devoted in this purpose. Uniparental loci are drift-sensitive but gain in accuracy when enriched with highly polymorphic linked genetic markers. Herein we assumed that DI Ã A and DI Ã B were sufficiently informative for preliminary study of Native American population dispersal. In this era of fine-grain genetic screening [42], additional experiments on DI Ã A carriers will be needed to confirm our approach.

Conclusion
The aim of this study was to assess the geographical distribution of the DI Ã A allele of the Diego blood group system in Amerindians. Our data demonstrates large variations in frequency ranges and a significant correlation with environmental and cultural parameters. Our findings suggest that DI Ã A carriers crossed into Americas from Siberia and have diverged along the lines of the main linguistic families. Subsequent genetic differentiation appears to have been stronger in Lower Meso and South Amerindians depending on environment conditions. In the Isthmus of Panama, Chibchan-speakers appear to have experienced intense cultural interactions with neighboring populations without relevant gene flow. A secondary migration by mostly DI Ã B carriers may have covered North America. In addition to providing strong evidence of gene-culture comigration in the Americas, the present study illustrates the extent to which geographical maps of genetic and cultural phenomena dovetail.
Supporting Information S1 File. Readme file for S1 Table. (DOC) S1 Table. Distribution of the 144 populations considered in the present study according to geographic, cultural and environmental parameters, listed according to latitude. The file is a MS Excel spreadsheet 97-2003 (.xls) and is coupled with a Readme file (S1 File). (XLS) S1 Fig. Interactions between culture and geography from the two-way ANOVA of DI Ã A allele frequencies ( Table 2). Left panels: three main areas where controlled. Right panels: eight main areas where controlled. Ruhlen's linguistic classes ( Figure A). Main Pre-Colombian subsidence strategies ( Figure B). Areas with/without cassava ( Figure C). Areas with/without maize or cassava crops ( Figure D). See S1 Table and S1 File for complete references.