Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Demographic history and selection at HLA loci in Native Americans

  • Richard M. Single ,

    Contributed equally to this work with: Richard M. Single, Diogo Meyer

    Roles Conceptualization, Formal analysis, Methodology, Software, Writing – original draft

    Affiliation Department of Mathematics and Statistics, University of Vermont, Burlington, Vermont, United States of America

  • Diogo Meyer ,

    Contributed equally to this work with: Richard M. Single, Diogo Meyer

    Roles Conceptualization, Formal analysis, Funding acquisition, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Departmento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, Brazil

  • Kelly Nunes,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Departmento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, Brazil

  • Rodrigo Santos Francisco,

    Roles Data curation, Investigation

    Affiliation Departmento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, Brazil

  • Tábita Hünemeier,

    Roles Investigation, Writing – original draft

    Affiliation Departmento de Genética e Biologia Evolutiva, Universidade de São Paulo, São Paulo, Brazil

  • Martin Maiers,

    Roles Investigation, Writing – review & editing

    Affiliation Center for International Blood and Marrow Transplant Research, Minneapolis, Minnesota, United States of America

  • Carolyn K. Hurley,

    Roles Data curation, Writing – review & editing

    Affiliation CW Bill Young Marrow Donor Recruitment and Research Program, Georgetown University, Washington, DC, United States of America

  • Gabriel Bedoya,

    Roles Data curation

    Affiliation Instituto de Biología, Universidad de Antioquia Medellín, Medellín, Colombia

  • Carla Gallo,

    Roles Data curation

    Affiliation Laboratorios de Investigación y Desarrollo, Universidad Peruana Cayetano Heredia, Lima, Peru

  • Ana Magdalena Hurtado,

    Roles Data curation, Writing – review & editing

    Affiliation School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona, United States of America

  • Elena Llop,

    Roles Data curation

    Affiliation Programa de Genética Humana, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile

  • Maria Luiza Petzl-Erler,

    Roles Data curation, Writing – review & editing

    Affiliation Departamento de Genética, Universidade Federal do Paraná, Curitiba, Paraná, Brazil

  • Giovanni Poletti,

    Roles Data curation

    Affiliation Facultad de Medicina, Universidad Peruana Cayetano Heredia, Lima, Peru

  • Francisco Rothhammer,

    Roles Data curation

    Affiliations Programa de Genética Humana, Instituto de Ciencias Biomédicas, Facultad de Medicina, Universidad de Chile, Santiago, Chile, Instituto de Alta Investigación, Tarapacá University, Arica, Chile

  • Luiza Tsuneto,

    Roles Data curation

    Affiliation Department of Basic Health Sciences, Universidade Estadual de Maringá, Maringá, Paraná, Brazil

  • William Klitz,

    Roles Conceptualization, Data curation, Writing – review & editing

    Affiliation Department of Integrative Biology, University of California, Berkeley, California, United States of America

  •  [ ... ],
  • Andrés Ruiz-Linares

    Roles Conceptualization, Data curation, Investigation, Writing – review & editing

    Affiliations Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, School of Life Sciences and Human Phenome Institute, Fudan University, Shanghai, China, CNRS, EFS, ADES, D Aix-Marseille University, Marseille, France

  • [ view all ]
  • [ view less ]


The American continent was the last to be occupied by modern humans, and native populations bear the marks of recent expansions, bottlenecks, natural selection, and population substructure. Here we investigate how this demographic history has shaped genetic variation at the strongly selected HLA loci. In order to disentangle the relative contributions of selection and demography process, we assembled a dataset with genome-wide microsatellites and HLA-A, -B, -C, and -DRB1 typing data for a set of 424 Native American individuals. We find that demographic history explains a sizeable fraction of HLA variation, both within and among populations. A striking feature of HLA variation in the Americas is the existence of alleles which are present in the continent but either absent or very rare elsewhere in the world. We show that this feature is consistent with demographic history (i.e., the combination of changes in population size associated with bottlenecks and subsequent population expansions). However, signatures of selection at HLA loci are still visible, with significant evidence selection at deeper timescales for most loci and populations, as well as population differentiation at HLA loci exceeding that seen at neutral markers.

1. Introduction

The American continent was the last to be colonized by humans. Most genetic studies corroborate the hypothesis that the first individuals came to America from Northeast Asia, now Siberia, about 15,000 to 18,000 years before present (YBP) through the Beringia land bridge [13].

According to this hypothesis, the extant Native American populations are the result of a single migratory wave that entered the American continent at the end of the last glacial period, after a period from 5,000–8,000 YBP in Beringia, which allowed the genetic differentiation of the First Americans [1, 46]. After the expansion of this population into the American continent, subsequent waves of migration came to America from Siberia, leaving genetic traces in the current North American populations (e.g., Eskimos and Na-Dene populations) [3, 79]. Some studies have also detected a Polynesian genetic component in extinct Native South Americans [10, 11], as well as an Austromelanesian genetic component in contemporary [7] and ancient [12] South Americans. Although the hypothesis of direct contribution of these groups by marine routes was proposed a while ago [13], bioanthropological studies suggest that at the end of the Pleistocene the paleoamerican populations had greater morphological diversity shared with common ancestors of South Asia and Oceania, and that these characteristics may have been retained in some populations or individuals [14, 15].

This recent progress in our knowledge regarding Native American history and demography relies on studies with genomic datasets. These have refined the identification of genetic differences within the American continent, making it possible to cluster the autochthonous populations in major groups, such as Mesoamericans, Andeans, Amazonian and Eskimos [3, 16], and to provide reliable dates and ancestry sources.

While the study of small numbers of autosomal loci is less powerful for testing evolutionary scenarios with large numbers of parameters, specific loci with well-known functional roles can provide insights into the combination of demographic and selective events that shaped the distribution and variation of a specific gene or genomic region [17].

In this study, we compare the patterns of variation and differentiation at HLA loci with a set of putatively neutral markers, in Native American populations. Our primary goal is to provide a deeper understanding of the evolutionary forces that have shaped variation at HLA genes within the American continent. The demographic history of Native Americans, well explored in many studies [3, 18, 19], provides the context within which we analyze variation of three HLA class I genes (HLA-A, -B, and -C) and one HLA class II gene (HLA-DRB1).

Previous studies have shown that demographic history and selective forces interact and leave strong signatures in the variation of HLA genes [2022], but understanding details regarding this interaction remains a challenge. Outstanding questions concern whether demographic effects can override selective signatures and if selection can revert gradients of allele frequencies generated by geography [23]. Native American populations provide a valuable case study to understand the interaction of natural selection and demographic history for several reasons. First, when occupying the American continent, populations encountered novel pathogens, types of food, and climates. Jointly, these factors posed new selective challenges to the populations which reached the continent. Second, the founding of the Americas likely involved an extreme bottleneck, causing a marked reduction in diversity with respect to other world regions [19, 2427], highlighting the importance of analyzing the relative intensity of selection and drift in shaping extant Native American HLA diversity. Finally, for more than two decades we have known that Native Americans exhibit HLA alleles which are present exclusively in this continent, and are not found in any other region [28, 29], raising the question as to whether this pattern is a consequence of selection favoring locally advantageous alleles or a consequence of intense genetic drift.

In order to disentangle the relative contributions of demography and selection on specific loci of interest it is essential to have a large dataset, which includes markers that document the demographic history of the sampled populations. In the present study we take such an approach, using HLA sequence data and a genome-wide microsatellites dataset from a panel of 424 Native American individuals. We use the microsatellites as a demographic control and evaluate the differences between variation at these neutral markers and that of the HLA genes. Specifically, we examine how well the putatively neutral microsatellites predict variation and differentiation at the strongly selected HLA genes. We also systematically survey the geographic distribution of HLA alleles, with particular emphasis on those that are restricted to the American continent, so as to investigate the relative importance of selection and drift on patterns of geographic variation.

2. Materials and methods

Population sampling

We studied microsatellite and HLA variation in 424 individuals from 23 Native American and one Siberian population (Table 1, Fig 1). The Siberian population was not used in analyses concerning diversity and differentiation within the Americas, but provides insight into the differences between the Americas and a population likely related to their close ancestors.

Fig 1. Map of North and South America with location of Native American populations studied.

Locations indicated based on geographic coordinates given in Table 1.

The populations were divided into five groups based on geographic and linguistic criteria, according to Hünemeier et al. [16] and Reich et al. [3]–North America (NAM), Mesoamerica (MEA), South America Lowland (SAL), the South America Andean (SAA), and Siberian (SIB). A previous in depth study of these populations showed that most completely lacked European or African ancestry, or at most had contributions lower than 5%.

Microsatellite dataset

We assembled a dataset consisting exclusively of individuals typed for the set of 678 genome-wide microsatellites and also typed for HLA loci (described below). Details about the set of microsatellite markers are described in [18].

HLA typing

To identify the class I HLA-A,-B,-C alleles carried by each individual, PCR primers were used to amplify each locus as previously described [30]. HLA typing was carried out on DNA samples previously extracted for the study of Wang et al. [18]. Applied Biosystems Big Dye terminator chemistry and sequencing primers were used to obtain the sequences of both strands of exons 2 and 3. Exon 2 of the class II HLA-DRB1 alleles were amplified and sequenced using the AlleleSEQR class II kit (Abbott Molecular Inc, Des Plaines, IL). Additional in-house PCR and sequencing primers were added when needed to obtain resolution. Reaction products were identified with Applied Biosystems 3730xl DNA analyzer (PE Applied Biosystems, Foster City, CA) and sequence interpretation used Assign software (Conexio Genomics, Applecross, Western Australia).

For subjects with multiple possible class I genotypes, either allele specific sequencing primers or allele specific PCR amplification were used to identify the specific allele combination present. Interpretation of alternative genotypes (i.e. allele pairs) used the IPD-IMGT/HLA Database (database release 2.19.0, October 2007). In-house primer sequences are available at Alternative alleles identical in exons 2 and 3 (class I) or exon 2 (DRB1) were not resolved. Unresolved alleles of this type that differ in two-field names, i.e., encode allelic products that vary in amino acid sequence outside of the antigen binding site, are indicated by the use of a “g” following the name of the lowest numbered allele in the group. For example, A*02:01:01G includes alleles A*02:01:01:01, A*02:09, A*02:43N, A*02:66 as well as synonymous alleles A*02:01:01:02L and A*02:01:08. A listing of these unresolved alleles can be found at under database release 2.21.0. A few DRB1 alleles differing in the last three codons of exon 2 were also not distinguished.

The HLA-B alleles of a subset of Native Americans (n = 148) were sequenced in Dr. Meyer’s laboratory as part of another project. After resolution of discrepancies, typing for four samples (2.7%) was corrected.

For most analyses (see below) we transformed the allele calls to two-field allele definitions, where the first field identifies the serological group and the second field defines the peptide sequence, specifying HLA proteins. This results in a reduction in the amount of information about the molecular level definition of the allele’s sequence, but allowed us to compare our dataset with that of previously published work, which is almost exclusively at the two-field level. The mapping between the molecular-level and two-field level of resolution was one-to-one for all but a small number of alleles (HLA-A: 24:03:01G 24:03:02; HLA-B: 39:06:01,39:06:02, 51:13:01, 51:13:02, 52:01:01G, 52:01:02; HLA-DRB1: 04:05:01, 04:05:04), implying that the information loss in minimal. For a subset of the analyses we used sequence information available from our typing to carry out tests that use this level of information (tests for equilibrium-neutrality that explore the site-frequency spectrum).

The genotype data used in this study are available in S1 Table and at the Allele Frequency Net Database (AFND) repository for immune-related gene polymorphisms in worldwide populations. The AFND accession numbers for the 24 populations in this study are 3692–3715. For example, the TundraNentsi population from Siberia can be accessed at A listing of the 24 hyperlinked accession URLs can be found at

Ethics statement

The samples analysed here were collected as part of a previous research project [3] with informed consent encompassing genetic studies of population history. Institutional approval in the country of collection was obtained for the use of each set of samples in such research. Ethical oversight and approval for this project was provided by the National Health Service National Research Ethics Service, Central London committee (reference no. 05/Q0505/31).

Data analyses

Population variability.

We used the PyPop (v.0.7.0) software package [31, 32] to estimate summary statistics (the number of alleles (k), and sample heterozygosity (H)) and to test for deviation from Hardy-Weinberg proportions (HWP) using an exact test [33].

Tests of neutrality.

We tested for departure from neutrality-equilibrium conditions using two methods, which capture different aspects of the time scale for selection. We used the Ewens-Watterson test [34, 35] implemented in PyPop to test for deviations from neutrality in the direction of balancing selection. The full molecular-level dataset was analyzed using Tajima’s-D, as implemented in Arlequin (v.3.5) [36]. For each method deviations in the direction of positive selection are indicated with p-values close to one.

Differentiation among populations.

Differentiation among pairs of populations was estimated using AMOVA, as implemented in the ADE4 (v.1.6–2) package of R to compute FST values. Pairwise estimates were then used to estimate mean differentiation among populations within a region, and among populations from different pairs of regions.

We developed an empirical approach to place the FST results for HLA loci in the context of differentiation of putatively neutral microsatellites. For each pair of populations, each HLA FST result was normalized by subtracting the mean FST for the 678 microsatellites and then dividing by the standard deviation of the FST s for the microsatellites: This measure summarizes how many standard deviations the HLA FST values deviates from that of the neutral loci. An empirical p-value was computed as the proportion of the 678 microsatellite loci with a higher FST than HLA.

Haplotype frequencies.

Haplotype frequencies were estimated using the EM algorithm [37] as implemented in PyPop.

Allele sharing between the Americas and other world regions.

For each allele in our dataset (at the two-field or peptide level of resolution), we compared the mean frequency across the Native American populations with the mean frequency from 364 non-Native American populations from a published meta-analysis of 497 worldwide populations [21] representing approximately 66,800 individuals. These mean frequencies, computed after omitting migrant populations and those whose continental origin was classification was uncertain, were used to estimate a ratio of frequencies between the Americas and other world regions. We assigned alleles to one of three categories:

  1. endemic alleles: those present only in the Americas (and completely absent in all other regions);
  2. large frequency differences (LFD) alleles: those present both in the Americas and other world regions, but at least 3-fold more common in the Americas with respect to other regions were referred to as large frequency difference (LFD) alleles;
  3. Other: non-endemic, non-LFD alleles.

We next applied a set of filters to this classification. First, if an endemic or LFD allele was present in 3 or fewer copies in the Americas, we considered it "poorly supported". This category contains alleles that were rare in both the Americas and other world regions, but yielded large ratios due to small sample sizes and large sampling variances for frequencies in the Americas. Second, we performed a similar classification comparing allele frequencies in the Native American and non-Native American populations of the Solberg et al. [21] study, so as to validate the findings obtained from our dataset. If an LFD allele was poorly supported in both datasets, we removed it from our final set of classified alleles (21 LFD alleles were poorly supported– 3 HLA-A, 13 HLA-B, and 5 HLA-DRB1). This yielded a final list of alleles found in our Native American data which were classified as either endemic or LFD.

Because the HLA typing for the Solberg et al. [21] dataset was carried out over an extended period of time, it is possible that the same allele could have received different names, depending on when it was typed, since earlier methods may not have been able to unambiguously distinguish among certain alleles. A consequence of this naming inconsistency is the possibility that a subset of alleles we classify as endemic are in fact shared among regions, but have been given different names across studies. To minimize the possibility that changes in allele names had any impact on our analyses, we used an empirical approach to quantify the degree of ambiguity in allele calling, and we report this information in S2 Table.

3. Results and discussion

3.1 HLA alleles within the Americas

For the 23 Native American populations and one Siberian we identified 36 alleles at HLA-A, 80 at HLA-B, 29 at HLA-C, and 38 at DRB1, at the two-field level of resolution (S1 Table). This dataset was used in all analyses, with the exception of neutrality tests, which were carried out on a version of the data which was recoded at the molecular level (by assigning DNA sequences corresponding to each allele).

3.2 Deviation from HWP

Out of the total of 96 tests for deviation from HWP (23 Native Americans and one Siberian population, at 4 loci), five were significant at the 0.05 level of significance, which is close to that expected by chance alone (4.8 tests). The deviations are spread over all loci (one in each and two in HLA-B) and occur in five different populations (HLA-A: Zapotec, HLA-B: Kogi and TicunaTarapaca, HLA-DRB1: Waunana). In each case small, but significant, overrepresentation of specific individual genotypes contributed to the overall locus-level deviation (A*02:06+A*31:01, B*35:43+B*40:02, C*04:01+C*05:01, DRB1*04:07+DRB1*04:11, DRB1*04:04+DRB1*14:02). Deviations are due to an excess of heterozygotes, arguing against the presence of null alleles or allele dropouts. The observed deviations were small and non-significant when we account for multiple testing, and thus differ from a classic finding for Native Americans which identified HWP deviations consistent with heterozygote advantage [38]. No population or locus was consistently overrepresented among the deviations, arguing against the possibility that demographic or technical factors account for deviations.

3.3 Geographic variation in heterozygosity

We examined the degree to which demographic history accounts for differences in heterozygosities among populations and regions of the Americas. We ordered populations within geographic regions based on the least cost path (as reported by [18]), which measures how far each population travelled from a putative Siberian source, providing a proxy for the amount of drift experienced. For the microsatellites there is a general decrease of heterozygosity, with the South American Lowland populations (SAL) having lowest heterozygosities, Meso Americans (MEA) showing slightly larger values, followed by the Andeans (SAA) (Fig 2a). North Americans (NAM) have higher heterozygosities than all South Americans. These differences are supported statistically, with pairwise contrasts in heterozygosity being greatest in contrasts between NAM-MEA (p = 0.067), SAL-NAM (p = 0.001), SAL-SAA (p = 0.017, tests controlled for multiple comparisons at the 0.05 level using Tukey’s HSD method; we conservatively excluded the Aché to avoid effects driven by this one extreme population, cf. [18]).

Fig 2. (a, b) Heterozygosity values per population and geographic region (a) Microsatellites, (b) HLA.

Populations are ordered within geographic regions based on the least cost path (Wang et al, 2007), which indicates the distance from Siberia along likely migration routes.

The variation in HLA genes among regions does not show a strong relationship with distance. Only the SAL-NAM contrast in heterozygosity was significant at the 0.05 level for all loci (p < 0.05 for all loci combined; Fig 2b).

Under the assumption that demographic forces shape variation throughout the entire genome, we expect a correlation in measures of diversity at different loci. We tested this by comparing heterozygosity between HLA loci and the microsatellites (Fig 3, Table 2). All HLA loci have heterozygosities which are highly correlated with that of microsatellites (p<0.01 for all contrasts), and the average heterozygosity for the 4 HLA loci combined has a correlation of r = 0.91 (p<0.01, Pearson correlation) with that of microsatellites (Fig 3e). Fig 3 also allows investigation of microsatellite diversity while conditioning on HLA heterozygosity. Taking a vertical slice of Fig 3 (e.g., HLA heterozygosity between 0.8 and 0.85) one finds lower microsatellite heterozygosity in SAL followed by MEA populations and higher microsatellite heterozygosity in SAA followed by NAM populations. This trend is consistent across populations with different ranges of HLA heterozygosity (i.e., different vertical slices) indicating that trends due to recent demographic history which are visible in microsatellites are also visible regardless of the level of HLA diversity.

Fig 3. (a–d) Relationship between heterozygosity for microsatellites and HLA.

Results are shown separately for each HLA locus in panels (a)-(d). Panel (e) shows results that are averaged over the four HLA loci.

Table 2. Correlation between mean microsatellite heterozygosity and HLA diversity (heterozygosity and abundance of endemic alleles).

These results underscore the fact that demographic processes overwhelm the idiosyncratic selective regimes which affect each locus individually. Thus, although HLA loci have unusually high heterozygosity as a consequence of balancing selection, the signatures of recent demographic history are clearly visible in differences of polymorphism levels among populations.

Individual-based levels of heterozygosity.

The availability of HLA and microsatellite data for a matched set of individuals allows us to also examine correlations in diversity defined at the level of individuals, in addition to the population level. In order to do this, we quantified the proportion (or absolute number, for HLA) of loci within an individual which are heterozygous and determined a correlation with microsatellite heterozygosity.

For each population, the relationship between mean individual homozygosity at HLA and microsatellites is shown in Fig 4, which presents a conditional view of microsatellite variability based on categories of HLA heterozygosity at the individual level. The correlation among the proportion of heterozygous loci for HLA and non-HLA markers is low, with most populations showing a minimal increase in genome-wide proportion of homozygous loci along with the equivalent measure at HLA. Of the 22 populations tested, only 3 showed a significant association between HLA and microsatellite homozygosity, after correction for multiple testing (Ticuna Tarapaca, Huilliche, Embera; p < 0.0025).

Fig 4. Relationship between the proportion of homozygous loci at HLA and homozygosity at microsatellites averaged over individuals.

Each line represents the trend in a specific population.

Another way to assess the relationship between microsatellite and HLA diversity is to compare the proportion of microsatellite loci which are homozygous for different groups of individuals: (0) those heterozygous for all HLA loci, (1) those homozygous at one HLA locus, and (2) those homozygous for two or more HLA loci (Fig 5). Only one of the regions (SAL, p<0.001) had a significant increasing trend in microsatellite homozygosity for increasing numbers of HLA homozygous loci. Overall, these results show that being heterozygous at HLA loci is not a strong predictor for heterozygosity of the microsatellite loci at the within-individual level.

Fig 5. Relationship between zygosity at HLA loci and the average proportion of microsatellite loci for which individuals are homozygous.

The proportion of microsatellite loci that are homozygous in each individual is summarized over individuals from each geographic region, separately for individuals with zero, one, or at least two homozygous HLA loci. The number of individuals in the geographic region contributing to each boxplot is shown on the horizontal axis.

At first sight these results might appear to contradict the finding of a strong correlation between HLA and microsatellite heterozygosity (Fig 3). However, this discrepancy can be understood by considering that the correlation among HLA and microsatellite variation is explained by overall differences in demographic histories among the populations and not by the inter-individual differences in heterozygosity. Thus, while differences in the history among populations shape HLA and microsatellite variation, within each population (i.e., among individuals) variation is largely stochastic and there are few genome-wide systematic differences in diversity among HLA homozygous and heterozygous individuals. This underscores the importance of population history in shaping variation at both neutral and HLA markers.

3.4 Deviation from equilibrium-neutrality expectations

Although there is little controversy regarding the importance of balancing selection in shaping variability at HLA loci [20, 21, 39, 40], it is not clear how evidence for selection varies among populations and what the timescale is for selection on the HLA loci. We addressed these issues by applying two tests for selection to the HLA data. We first tested for deviations from the infinite alleles model using the Ewens-Watterson test (Table 3). We found weak evidence of selection, with the only significant deviations in the direction of balancing selection (HLA-B in Chipewyan, DRB1 in Wayuu, and HLA-C in Chipewyan and Zapotec). As was the case for tests of deviation from HWP, the number of significant results approaches that expected for a given level of significance.

The levels of deviation from neutrality at the allele level found in the present study are substantially lower than for other world regions, where up to 20% of populations deviate from neutrality-equilibrium [21]. This suggests that the greater strength of drift in the recent history of Native Americans, which is reflected in the decreased polymorphism with respect to other world regions [18], has resulted in a weaker signature of recent allele-level selection at the HLA loci. In this sense, if we take into account that the Native American populations have undergone a substantial bottleneck in the last 500 years (post-contact) [41], it is expected that the demographic signal is more intense than the selective one.

The finding that HLA variability is well accounted for by demographic history, as captured by the microsatellite data, is not incompatible with a role of selection on HLA loci at a deeper timescale [42]. To examine this possibility, we analyzed the HLA data after recoding it in the form of molecular level variation (i.e., by transforming each allele call into a DNA sequence), and using a method which is sensitive to selective and demographic history on a deeper timescale [43]. The DRB1 locus was not included since the typing resolution did not allow unique assignments of sequences to each allele. In strong contrast to the findings for the Ewens-Watterson (EW) test, for HLA-A, -B and -C alleles defined at the sequence level, we found positive and significant deviations from the neutrality-equilibrium expectation in 8, 5 and 10 of the Native American populations, respectively (Table 4), corresponding to 35%, 22% and 43% of the 23 sampled populations with a p-value < 0.05 for a one sided test with an alternative hypothesis of balancing selection. These results confirm earlier findings which showed that tests using sequence-level data are substantially more powerful in detecting balancing selection at HLA loci than tests based on the infinite-alleles model (e.g., [39]). We also found a strong trend in the sign of D values: over all loci, at least 21 of the 23 populations show D>0. This overall trend is conservative with expectations under neutrality, since genome-wide there is a shift to negative values of Tajima’s D, likely a consequence of population expansions [2, 19].

Table 4. Tajima’s D neutrality test for molecular-level dataa.

This can be explained by the fact that tests based on the site frequency spectrum, such as Tajima’s D, document the cumulative effect of balancing selection acting on much longer timescales than the EW test, and is thus more powerful [42].

3.5 Population differentiation

Adaptation of populations to their local environments is expected to favor alleles that are advantageous in that specific context, and thus drive an increase in the degree of differentiation at the selected loci [44]. On the other hand, models of balancing selection via heterozygote advantage show that genetic differentiation among populations under balancing selection is expected to be decreased with respect to neutral expectations, since this selective regime slows down the rate of genetic drift [45].

To better understand the patterns of genetic differentiation at HLA loci in the Americas we used two approaches. First, we catalogued alleles with large frequency differences between the Americas and other regions. Second, we examined how the degree of interpopulation differentiation at HLA differs from that seen at putatively neutral microsatellite loci.

Endemic alleles in the Americas.

In order to catalog alleles with marked geographic discontinuities, we compared HLA allele frequencies in the Native American dataset to those from a large meta-analysis [21].

Endemic alleles are present in appreciable number in our dataset: 8, 15, and 5 alleles were endemic for HLA-A, B and -C, respectively (Table 5). However, these alleles contribute relatively little to the total frequency in each population, with mean values per region typically below 2% (Table 6). The exception is HLA-B, for which endemic alleles have a mean value of 8% in the Americas, and close to 20% in specific Central and South American populations. This differs from the original reports of endemic alleles [28, 46], according to which the total frequency of endemic alleles was much higher. Since then, many alleles that were previously assumed to be endemic have been found outside the Americas, often at low frequencies.

Table 5. Alleles classified as endemic or "large frequency difference" (LFD).

Table 6. Sum of frequencies for alleles in endemic and LFD categories.

Alleles with large frequency differences (LFD) on the other hand, make up a substantial proportion of Native American alleles at all loci, with the exception of HLA-C (Table 6). For HLA-A, the mean frequency of all LFD alleles taken together is 50% in MEA and 45% in SAL. For HLA-B, MEA, SAA, and SAL have mean frequencies for the set of LFD alleles of 43%, 39%, and 49%, respectively (Table 7; the Aché population was excluded from these calculations due to its outlier status). For DRB1 the results are even more extreme, with MEA, SAA, and SAL having mean values of LFD alleles over 55%. These results imply that, even for populations and loci in which endemic alleles are not common, the pool of alleles carried by Native American populations is frequently made up of alleles which are rare in other world populations, implying that Native Americans have a distinctive genetic profile at HLA loci. The exceptions to this pattern are the North American populations, which on average display the lowest frequencies of alleles in the LFD class among the Native Americans groups. The populations of North America show different demographic history than the other populations of America. Several studies indicate that they are derived from subsequent migration flow from Siberia up to 5,000 YBP [3, 5, 9]. This more intense migratory flow between North American and East Asian populations likely reflects the greater HLA allele sharing between these regions.

Table 7. Mean total frequency for endemic and LFD alleles.

The sum of frequencies for LFD and endemic alleles varies substantially among populations and loci. To quantitatively assess the contribution of demographic factors, we estimated the correlation between mean microsatellite heterozygosity and the total frequency of endemic and LFD alleles, across all populations (Table 2). With the exception of HLA-C, which has very few endemic or LFD alleles, all other loci show a strong negative correlation over all populations between mean microsatellite heterozygosity and endemic/LFD frequency. Squaring these correlations shows that microsatellite heterozygosity (or homozygosity, since the correlation is negative) accounts for a substantial fraction of the endemic and LFD variance (43%, 34%, 24% for HLA-A, -B and -DRB1). Taken together, these results show that demographic factors play a central role in explaining the abundance of LFD alleles in Native American populations.

Differentiation based on FST.

We quantified genetic differentiation among populations and regions by computing FST for both HLA and microsatellites. We first averaged all pairwise FST values within or between populations, to obtain estimates of mean pairwise FST for different geographic scales (Table 8). Nearly all pairs of regions have high levels of differentiation, particularly for contrasts involving SAL and NAM. The only case in which regions showed low differentiation was for the contrast between the single SIB population and the NAM populations. This result once again shows the pattern of secondary migratory flow between these regions.

Table 8. Fst among pairs of regions, averaged over all population pairs.

The FST values at HLA loci are consistently higher than those for microsatellites (Fig 6), a result which remains unchanged when the RST measure is used for the microsatellites, and despite the fact that our method controls for differences in heterozygosity values at these two sets of markers (see methods). The higher differentiation at HLA loci differs from the expectation under balancing selection, which is that of decreased differentiation relative to neutral markers [45]. These results suggest that HLA loci show high levels of variation, deviations from neutrality-equilibrium and relatively high levels of population differentiation, even though under balancing selection. This scenario is consistent with that described by Brandt et al. [47], in which populations of the same geographic region tend to show higher FST values for HLA SNPs, while populations of distinct geographic regions have lower FST values. Suggesting that local adaptation of HLA alleles may contribute to population differentiation on a regional scale, however, without erasing signatures of long-term balancing selection.

Fig 6. FST for population pairs at HLA and microsatellites.

FST values were computed for each population pair and locus. FST values were summarized over the microsatellite and HLA loci separately. Results are shown for microsatellites in the highest (third) quartile of heterozygosity. Pairs involving the Ache are in red since this population has been excluded from other analyses due to their outlier status.

Standardized differentiation.

Comparing the FST and standardized ZST values allows us to examine whether populations from different regions are more (or less) differentiated at HLA relative to the set of putatively neutral microsatellite loci (Fig 7). The general trend for HLA FST results is that the South American regions have larger FST values than the NAM and MEA populations, with the MEA populations slightly higher than NAM populations. Based on ZST results, the difference between MEA and NAM populations is slightly larger. In South America, the Andean populations stand out based on the ZST results, with much higher HLA differentiation relative to the microsatellite background.

Fig 7. ZST values by region and locus.

Empirical p-value from the distribution of microsatellite FSTs. ZST measures the number of standard deviations the HLA FST values are from that of the neutral loci. The line at–log(p-value) = 3 corresponds to the 0.05 level of significance.

Previous studies with uniparental and genome-wide data had also shown that Andean populations have higher within and between population variability than Amazonian natives [4850]. Besides that, the Andeans show an excess of rare alleles with respect to the mutation-drift equilibrium expectation. This finding could be explained by the expansion of the Andean population after the rise of complex societies around 4,000 YBP and population displacements from conquests in the Inca Empire. Although Mesoamerica has gone through the same transition, followed by population growth, in this region the regional dynamics, with forced migration, led to greater genetic flow between Andean groups [50, 51]. This process has triggered low population differentiation for genomic markers, but not for HLA, suggesting that forces other than demography may be acting in this system.

3.6 Haplotype distributions in the Americas

In order to gain further insight into the history and dynamics of the endemic, large frequency difference, and non-LFD/non-endemic alleles, we analyzed the allele specific heterozygosity (ASH) for these allelic classes. ASH is the heterozygosity of alleles found in haplotypes with the allele that has been conditioned on. Because the allele specific heterozygosity is strongly affected by the heterozygosity of the conditioned upon locus (i.e., alleles at low frequencies typically have less diversity at linked sites) we stratified our analysis by frequency classes (Table 9).

Table 9. Allele specific heterozygosity by allele category.

Overall, we find that LFD alleles reside on less diverse haplotypes than non-LFD/non-endemic alleles, again regardless of the frequency of the conditioned upon allele (Table 9). This result is consistent with the endemic, LFD and non-LFD/non-endemic status of alleles serving as a proxy for allele age: younger alleles are expected to be more regionally restricted, and to be associated to a reduced number of haplotypes. Older alleles, on the other hand, are expected to be associated to diverse haplotypic contexts, as is the case for the non-LFD/non-endemic alleles. Also, in the presence of selection, alleles on a selected haplotype might be less diverse.

4. Conclusion

We have analyzed HLA variation in Native American populations while controlling for demographic history by using a large set of microsatellite loci from the same individuals. Our neutrality tests show that, as expected, there is evidence for long-term balancing selection at the HLA loci (as documented by Tajima’s test of neutrality). Thus, much of the signature of selection seen in America for HLA genes may reflect selection that took place much earlier, before America was occupied.

We also documented a higher degree of among-population genetic differentiation for HLA loci, as compared to microsatellites. Although this direct comparison is challenging to interpret due to the differences in mutational processes underlying these types of markers, we controlled for a major confounding factor upon differentiation, which is within population variability, and showed that the finding of increased differentiation was robust. Lindo et al. [52] also identified marked differentiation at an HLA locus (DQA1), but in the context of differences between an ancestral and a descendant population (based on sequencing of ancient samples). They argued that the pattern results from a shift of selective regimes, such that previously advantageous variants became negatively selected after European contact. Together, these results favor the interpretation that selection on HLA genes in the Americas involves a mixture of long-term balancing selection and a combination of episodes of more recent positive and balancing selection.

Andean populations stood out based on an empirically standardized measure of differentiation with much higher HLA differentiation relative to the microsatellite background for Class I loci. For genomic markers demographic events, including Incan Empire conquests and forced migrations, have been hypothesized for lower population differentiation observed among Andean populations. The higher differentiation for HLA class I loci suggests non-demographic forces have shaped these allele frequency distributions.

Our analyses are based on sequence level variation in exons 2 and 3 of class I loci, and exon 2 of the HLA-DRB1 class II locus (see Methods). It is clear that contemporary techniques based on NGS can provide a higher resolution of HLA diversity, uncovering alleles defined by variation in exons, introns and regulatory sequences which were not surveyed in this study. Our analysis is not a complete molecular survey of HLA diversity in Native American populations, but it does provide a reliable survey of how HLA variation in exons 2 and 3 is related to neutral variation (captured by microsatellite polymorphism in the same samples). Our focus on a subset of coding regions also implies that alleles which we consider the same in America and in other regions of the world, may in fact have originated from different ancestral sequences. To distinguish such identify by state from common origin, it will be necessary to either survey a longer portion of each locus, or their haplotypic contexts.

Despite the evidence for selection seen in our dataset, we found that many features of HLA variation are accounted for by the recent demographic history of these populations. This is a recurrent feature of human evolutionary history, and there are in fact few cases where selection overrides the patterns of genetic differentiation which originate due to demographic processes [23]. Specifically, we show that the abundance of alleles which are unique to America (or much more common in this continent than others) is to a large degree explained by demographic history, as are overall levels of polymorphism. Thus, despite the intense selection pressures acting on HLA loci, recent demographic history has played a substantial role in shaping their overall patterns of variations within America.


We thank Ramiro Barrantes, Maria Catira Bortolini, Kim Hill, Damian Labuda, Julio A. Molina, Maria V. Parra, and Winston Rojas for contributing DNA samples.


  1. 1. Tamm E, Kivisild T, Reidla M, Metspalu M, Smith DG, Mulligan CJ, et al. Beringian standstill and spread of native American founders. PLoS ONE. 2007. pmid:17786201
  2. 2. Fagundes NJR, Kanitz R, Eckert R, Valls ACS, Bogo MR, Salzano FM, et al. Mitochondrial Population Genomics Supports a Single Pre-Clovis Origin with a Coastal Route for the Peopling of the Americas. American Journal of Human Genetics. 2008. pmid:18313026
  3. 3. Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, et al. Reconstructing Native American population history. Nature. 2012. pmid:22801491
  4. 4. Perego UA, Achilli A, Angerhofer N, Accetturo M, Pala M, Olivieri A, et al. Distinctive Paleo-Indian Migration Routes from Beringia Marked by Two Rare mtDNA Haplogroups. Current Biology. 2009:1–8. pmid:19135370
  5. 5. Raghavan M, Steinrücken M, Harris K, Schiffels S, Rasmussen S, DeGiorgio M, et al. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science. 2015. pmid:26198033.
  6. 6. Pinotti T, Bergström A, Geppert M, Bawn M, Ohasi D, Shi W, et al. Y Chromosome Sequences Reveal a Short Beringian Standstill, Rapid Expansion, and early Population structure of Native American Founders. Current Biology. 2019. pmid:30581024
  7. 7. Skoglund P, Mallick S, Bortolini MC, Chennagiri N, Hunemeier T, Petzl-Erler ML, et al. Genetic evidence for two founding populations of the Americas. Nature. 2015;525(7567):104–8. Epub 2015/07/22. pmid:26196601.
  8. 8. Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, et al. Upper palaeolithic Siberian genome reveals dual ancestry of native Americans. Nature. 2014. pmid:24256729
  9. 9. Flegontov P, Altınışık NE, Changmai P, Rohland N, Mallick S, Adamski N, et al. Palaeo-Eskimo genetic ancestry and the peopling of Chukotka and North America. Nature. 2019. pmid:31168094
  10. 10. Malaspinas AS, Lao O, Schroeder H, Rasmussen M, Raghavan M, Moltke I, et al. Two ancient human genomes reveal Polynesian ancestry among the indigenous Botocudos of Brazil. Curr Biol. 2014;24(21):R1035–7. Epub 2014/12/03. pmid:25455029.
  11. 11. Goncalves VF, Stenderup J, Rodrigues-Carvalho C, Silva HP, Goncalves-Dornelas H, Liryo A, et al. Identification of Polynesian mtDNA haplogroups in remains of Botocudo Amerindians from Brazil. Proc Natl Acad Sci U S A. 2013;110(16):6465–9. Epub 2013/04/12. pmid:23576724
  12. 12. Moreno-Mayar JV, Vinner L, de Barros Damgaard P, de la Fuente C, Chan J, Spence JP, et al. Early human dispersals within the Americas. Science. 2018;362(6419). Epub 2018/11/10. pmid:30409807.
  13. 13. Rivet P. Les origines de l’Homme Américain. Paris: Gallimard; 1957.
  14. 14. Strauss A, Hubbe M, Neves WA, Bernardo DV, Atui JP. The cranial morphology of the Botocudo Indians, Brazil. Am J Phys Anthropol. 2015;157(2):202–16. Epub 2015/02/11. pmid:25663638.
  15. 15. Hubbe M, Harvati K, Neves W. Paleoamerican morphology in the context of European and East Asian late Pleistocene variation: implications for human dispersion into the New World. Am J Phys Anthropol. 2011;144(3):442–53. Epub 2011/02/09. pmid:21302270.
  16. 16. Hünemeier T, Amorim CEG, Azevedo S, Contini V, Acuña-Alonzo V, Rothhammer F, et al. Evolutionary responses to a constructed niche: Ancient mesoamericans as a model of gene-culture coevolution. PLoS ONE. 2012. pmid:22768049
  17. 17. Akey JM. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Research. 2009;19:711–22. pmid:19411596
  18. 18. Wang S, Lewis CM, Jakobsson M, Ramachandran S, Ray N, Bedoya G, et al. Genetic variation and population structure in native Americans. PLoS Genet. 2007;3(11):e185. Epub 2007/11/28. pmid:18039031.
  19. 19. Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics. 2009. pmid:19851460
  20. 20. Meyer D, Single RM, Mack SJ, Erlich Ha, Thomson G. Signatures of demographic history and natural selection in the human major histocompatibility complex Loci. Genetics. 2006;173:2121–42. pmid:16702436.
  21. 21. Solberg OD, Mack SJ, Lancaster AK, Single RM, Tsai Y, Sanchez-Mazas A, et al. Balancing selection and heterogeneity across the classical human leukocyte antigen loci: a meta-analytic review of 497 population studies. Human immunology. 2008;69:443–64. pmid:18638659
  22. 22. Fernandez Vina MA, Hollenbach JA, Lyke KE, Sztein MB, Maiers M, Klitz W, et al. Tracking human migrations by the analysis of the distribution of HLA alleles, lineages and haplotypes in closed and open populations. Philos Trans R Soc Lond B Biol Sci. 2012;367(1590):820–9. Epub 2012/02/09. pmid:22312049.
  23. 23. Coop G, Pickrell JK, Novembre J, Kudaravalli S, Li J, Absher D, et al. The Role of Geography in Human Adaptation. PLoS Genetics. 2009;5. pmid:19503611
  24. 24. Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, Bonatto SL, et al. Statistical evaluation of alternative models of human evolution. Proceedings of the National Academy of Sciences of the United States of America. 2007. pmid:17978179.
  25. 25. Amos W, Hoffman JI. Evidence that two main bottleneck events shaped modern human genetic diversity. Proceedings of the Royal Society of London B: Biological Sciences. 2009. pmid:19812086
  26. 26. Ray N, Wegmann D, Fagundes NJR, Wang S, Ruiz-Linares A, Excoffier L. A statistical evaluation of models for the initial settlement of the american continent emphasizes the importance of gene flow with Asia. Molecular Biology and Evolution. 2010;27:337–45. pmid:19805438.
  27. 27. Gravel S, Zakharia F, Moreno-Estrada A, Byrnes JK, Muzzio M, Rodriguez-Flores JL, et al. Reconstructing native american migrations from whole-genome and whole-exome data. PLoS genetics. 2013;9:e1004023. pmid:24385924.
  28. 28. Belich MP, Madrigal JA, Hildebrand WH, Zemmour J, Williams RC, Luz R, et al. Unusual HLA-B alleles in two tribes of Brazilian Indians. Nature. 1992;357:326–9. pmid:1317015.
  29. 29. Parham P, Arnett KL, Adams EJ, Little AM, Tees K, Barber LD, et al. Episodic evolution and turnover of HLA-B in the indigenous human populations f the Americas. Tissue Antigens. 1997;50:219–32.
  30. 30. Tu B, Mack SJ, Lazaro A, Lancaster A, Thomson G, Cao K, et al. HLA-A, -B, -C, -DRB1 allele and haplotype frequencies in an African American population. Tissue Antigens. 2007;69(1):73–85. Epub 2007/01/11. pmid:17212710.
  31. 31. Lancaster A, Nelson M, Meyer D, Thomson G, Single R. PyPop: a software framework for population genomics: analyzing large-scale multi-locus genotype data. Pac Symp Biocomput. 2003:514–25. pmid:12603054
  32. 32. Single RM, Meyer D, Mack SJ, Lancaster A, Erlich HA, Thomson G. 14th International HLA and Immunogenetics Workshop: report of progress in methodology, data collection, and analyses. Tissue Antigens. 2007;69 Suppl 1:185–7. Epub 2007/04/21. pmid:17445197.
  33. 33. Guo SW, Thompson EA. Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics. 1992;48:361–72. pmid:1637966
  34. 34. Ewens WJ. The sampling theory of selectively neutral alleles. Theor Pop Biol. 1972;3(1):87–112. pmid:4667078
  35. 35. Watterson GA. The homozygosity test of neutrality. Genetics. 1978;88:405–17. pmid:17248803
  36. 36. Excoffier L, Lischer HEL. Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources. 2010;10:564–7. pmid:21565059.
  37. 37. Excoffier L, Slatkin M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution. 1995;12:921–7. pmid:7476138
  38. 38. Black FL, Salzano FM. Evidence for heterosis in the HLA system. Am J Hum Genet. 1981;33:894–9. pmid:7325154
  39. 39. Buhler S, Sanchez-Mazas A. HLA DNA sequence variation among human populations: molecular signatures of demographic and selective events. PloS one. 2011;6:e14643. pmid:21408106.
  40. 40. Meyer D, Vitor VR, Bitarello BD, Débora DY, Nunes K. A genomic perspective on HLA evolution. Immunogenetics. 2018. pmid:28687858.
  41. 41. Lindo J, Haas R, Hofman C, Apata M, Moraga M, Verdugo RA, et al. The genetic prehistory of the Andean highlands 7000 years BP though European contact. Science Advances. 2018. pmid:30417096.
  42. 42. Garrigan D, Hedrick PW. Detecting adaptive molecular polymorphism: lessons from the MHC. Evolution; international journal of organic evolution. 2003;57:1707–22. pmid:14503614.
  43. 43. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–95. pmid:2513255.
  44. 44. Lewontin RC, Krakauer J. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics. 1973. pmid:4711903.
  45. 45. Schierup MH, Vekemans X, Charlesworth D. The effect of subdivision on variation at multi-allelic loci under balancing selection. Genetical research. 2000;76:51–62. pmid:11006634.
  46. 46. Watkins DI, McAdam SN, Liu X, Strang CR, Milford EL, Levine CG, et al. New recombinant HLA-B alleles in a tribe of South American Amerindians indicate rapid evolution of MHC class I loci. Nature. 1992;357(6376):329–33. Epub 1992/05/28. pmid:1589035.
  47. 47. Brandt DYC, César J, Goudet J, Meyer D. The Effect of Balancing Selection on Population Differentiation: A Study with HLA Genes. G3&#58; Genes|Genomes|Genetics. 2018;8. pmid:29950428
  48. 48. Tarazona-Santos E, Carvalho-silva DR, Pettener D, Luiselli D, Stefano GFD, Labarga CM, et al. Genetic Differentiation in South Amerindians Is Related to Environmental and Cultural Diversity: Evidence from the Y Chromosome. Methods. 2001:1485–96.
  49. 49. Lewis CM Jr., Tito RY, Lizarraga B, Stone AC. Land, language, and loci: mtDNA in Native Americans and the genetic history of Peru. Am J Phys Anthropol. 2005;127(3):351–60. Epub 2004/12/08. pmid:15584069.
  50. 50. Gnecchi-Ruscone GA, Sarno S, De Fanti S, Gianvincenzo L, Giuliani C, Boattini A, et al. Dissecting the pre-Columbian genomic ancestry of Native Americans along the Andes–Amazonia divide. Molecular Biology and Evolution. 2019. pmid:30895292
  51. 51. Harris DN, Song W, Shetty AC, Levano KS, Cáceres O, Padilla C, et al. Evolutionary genomic dynamics of Peruvians before, during, and after the Inca Empire. Proceedings of the National Academy of Sciences of the United States of America. 2018. pmid:29946025
  52. 52. Lindo J, Huerta-Sanchez E, Nakagome S, Rasmussen M, Petzelt B, Mitchell J, et al. A time transect of exomes from a Native American population before and after European contact. Nat Commun. 2016;7:13175. Epub 2016/11/16. pmid:27845766.