Evolutionary Responses to a Constructed Niche: Ancient Mesoamericans as a Model of Gene-Culture Coevolution

Culture and genetics rely on two distinct but not isolated transmission systems. Cultural processes may change the human selective environment and thereby affect which individuals survive and reproduce. Here, we evaluated whether the modes of subsistence in Native American populations and the frequencies of the ABCA1*Arg230Cys polymorphism were correlated. Further, we examined whether the evolutionary consequences of the agriculturally constructed niche in Mesoamerica could be considered as a gene-culture coevolution model. For this purpose, we genotyped 229 individuals affiliated with 19 Native American populations and added data for 41 other Native American groups (n = 1905) to the analysis. In combination with the SNP cluster of a neutral region, this dataset was then used to unravel the scenario involved in 230Cys evolutionary history. The estimated age of 230Cys is compatible with its origin occurring in the American continent. The correlation of its frequencies with the archeological data on Zea pollen in Mesoamerica/Central America, the neutral coalescent simulations, and the FST-based natural selection analysis suggest that maize domestication was the driving force in the increase in the frequencies of 230Cys in this region. These results may represent the first example of a gene-culture coevolution involving an autochthonous American allele.


Introduction
Human cultural practices have drastically modified environmental conditions and behaviors, promoting rapid and substantial genomic changes often associated with positive selection and adaptation (gene-culture dynamics [1,2]). In the history of Homo sapiens sapiens, a particularly important event that triggered a new and striking gene-culture-coevolution cycle was the development of agriculture and animal domestication during the Neolithic period (,10,000 years ago). Further, the human gene-culture coevolution mediated by the domestication of plants and animals has been argued to provide some of the clearest and most spectacular examples of niche construction. The Niche Construction Theory can be defined as a branch of evolutionary biology that emphasizes on the ability of organisms to modify the pressure of natural selection in their environment and thereby act as codirectors of their own evolution, as well as that of other directly associated species [3][4][5][6].
Although more than 100 regions/genes had been identified as the likely targets of recent positive selection resulting from cultural pressures in newly constructed niches [1], well-documented examples are scarce. One of the best-known cases of gene-culture coevolution is lactase persistence (LP; the ability of adult humans to digest the lactose found in fresh milk) and dairying. High frequencies of LP are generally observed in traditional pastoralist populations. For example, LP reaches ,64% in Beni Amir pastoralists from Sudan, whereas its frequency in a neighboring non-pastoralist community is only ,20%. In Europe, LP varies from 15-54% in eastern and southern regions, 62-86% in central and western regions, and 89-96% in northern regions [5,[7][8][9][10][11][12][13].
Multiple independent mutations have been associated with this characteristic, some of which are located in an intron of the MCM6 gene, a region fundamental to lactase expression [5,14]. The alleles that led to lactose persistence in Europe, such as MCM6 13,910*T, first underwent selection among dairying farmers around 7,500 years ago, possibly in association with the dissemination of the Neolithic Linearbandkeramik culture over Central Europe [13]. The high copy number variation of the amylase gene and the spread of the corresponding alleles in agricultural societies are another well-studied example [7,8,11,15,16]. Additionally, the West African Kwa-speaking agriculturalists cut and clear the forest to grow yams, increasing the amount of standing water after rain, therefore providing better breeding grounds for malaria-carrying mosquitoes [1] favoring the HbS allele, which confers protection against malaria in heterozygous individuals [17].
America was the last continent colonized by modern humans in prehistoric times. In less than 15,000 years before present (YBP), these first migrants had to adapt to an immensely wide variety of environments. In some regions during this evolutionary trajectory, as in Mesoamerica and the Andes, hunter-gatherer/forager societies gave rise to agriculturalist and urban communities, while others remained with a hunter-gatherer/forager subsistence system until the time of contact with Europeans or even until the present day. Thus, studies with Native American populations can provide useful information for better understanding geneculture coevolution and the niche construction processes.
Based on studies with blood groups and other classical genetic polymorphisms, J. V. Neel and F. M. Salzano were pioneers in identifying complex population processes highly dependent on cultural factors in Native Americans (e.g., fission-fusion dynamic; [18]). Other examples are related to the coevolution of genes and languages [19,20], but only two more recently reported examples might be associated with positive selection: (1) Tovo-Rodrigues et al. [21] investigated the distribution of D4 dopamine receptor (DRD4) alleles in several South Amerindian populations and found a significant difference in the allelic distributions between huntergatherers and agriculturalists, with an increase of the 7R allele among the former; and (2) Acuñ a-Alonzo et al. [22] showed that the 230Cys allele (Arg230Cys, rs9282541) of the ATP-binding cassette transporter A1 (ABCA1) gene, which was previously associated with low HDL-cholesterol levels and obesity-related comorbidities, was exclusively present in Native American and mestizo individuals. These authors verified that cells expressing the ABCA1*230Cys allele showed a 27% cholesterol efflux reduction, confirming that this Native American autochthonous variant has a functional effect in vitro. Other investigations have shown that the presence of ABCA1*230Cys explains almost 4% of the variation in plasma HDL-C concentrations in Mexican admixed populations [23]. This variation in HDL-C concentration was the highest one associated with a single nucleotide polymorphism (SNP) among different continental populations in these genome-wide association studies, corroborating its functionality [23].
Acuñ a-Alonzo et al. [22] demonstrated that 230Cys resides in a haplotype that was the target of an ongoing directional selective sweep, suggesting that 230Cys conferred an advantage during periods of food deprivation in the past. On the other hand, under the current modern lifestyle, 230Cys may have become a major susceptibility allele for low HDL levels and has been correlated with metabolic diseases [22]. This study provides an example of the ''thrifty'' genotype hypothesis, which postulates that variants that increase the efficiency of energy use and storage during periods of famine would have been positively selected in prehistoric times but can be associated with diseases of affluence in contemporary societies, where food is usually abundant [24].
Here, we expand the investigations of the Arg230Cys polymorphism in Native Americans and integrate the thrifty genotype concept with the gene-culture coevolution process, considering the human ability to create new ecological niches that may lead to the selection of genetic variants.

(a) Populations
New data for the Arg230Cys polymorphism were generated for 19 Amerindian populations (n = 229) from Meso/Central America and South America (Table 1). Additional information about these tribes can be found in Bortolini et al. [25,26], Wang et al. [27], and Mazières et al. [28,29]. These new Arg230Cys data were then analyzed together with those of an earlier published report [22], providing a total of 1905 investigated individuals. One hundred and twenty-six individuals of our sample were also previously genotyped for ,680,000 SNPs using Illumina Human 610-Quad BeadChips(Ruiz-Linares et al., unpublished data), and a part of this information was used in some of our analyses. The populations were clustered according to their geographical location and ancient mode of subsistence as follows: (1) Mesoamerican agriculturalists, (2) Andean agriculturalists, and (3) South American hunter-gatherers/foragers. Of course, caution is needed regarding this classification, since subsistence modes are not stable over time and may not be unique. However, the two categories adopted here (agriculturalists and hunter-gatherers/foragers) represent general pre-Colombian subsistence conditions, providing a starting point for research related to gene-culture dynamics in Native Americans.
Ethical approval for the present study was provided by the Individual and tribal informed oral consent was obtained from all participants, since they were illiterate, and they were obtained according to the Helsinki Declaration. The ethics committees approved the oral consent procedure as well as the use of these samples in population and evolutionary studies.

(b) SNP Genotyping and Intra-and Inter-subdivision Structures
The Arg230Cys polymorphism was genotyped using TaqMan assays (ABI Prism 7900HT Sequence Detection System; Applied Biosystems). Allele frequencies were obtained by direct counting. The level of the population structure observed within and between Mesoamerican agriculturalist, Andean agriculturalist, and South American hunter-gatherer/forager groups was estimated using F statistics and the Arlequin 3.5.1 software [30]. Allele frequencies were compared between the 3 population subdivisions with the student's t-test (alfa = 0.05) using the R Stats package (R Development Core Team; http://www.R-project.org/).

(c) Allele Age and Neutrality/selection Tests
A large Asian and Native American sample, including the 126 individuals investigated here, were genotyped for a major panel of  (Tables S1, S2 and S3), the following analyses were performed: (c.1) ABCA1*230Cys allele age. Since estimates of allele age depend on assumptions about demographic history and natural selection, we have performed two approaches to estimate the age of the variant allele: (1) Kimura and Ohta [31] were the first to consider the relations between allele age and its frequency. With this purpose they developed the equation E(t1) = [22p/(12p)]ln(p), where E(t1) = expected age, time is measured in units of 2N generations, and p = population frequency [31]. For ABCA1*230Cys, we considered the average of frequencies of all populations and only those from Mesoamerica/Central America (p = 9.6 and 15.4, respectively; Table 1). A generation time of 25 years and N = 720 (number of generation considering the upper limit for the peopling of America, 18,000 YBP [32]) were assumed. (2) Slatkin and Rannala [33] began to exploit Linkage Disequilibrium (LD) to estimate allele ages, based on variation among different copies of the same allele, where the age of an allele is estimated by the intra-allelic variation following the LD exponential decay due to recombination and mutation rates. Rannala and Reeve [34], on the other hand, explored the use of LD to map genes, as well as to obtain the allele age using a Markov Chain Monte Carlo framework. We applied this method to obtain a second ABCA1*230Cys age estimative using the DMLE+ v2.2 software (http://www.dmle.org). This program allows a Bayesian inference of the mutation age using an intra-allelic coalescent model to assess LD across the nineteen SNPs that occur around ABCA1*230 (rs2065412, rs2515601, rs2472386, rs2274873, rs2487054, rs4149290, rs2487039, rs2472384, rs2253174, rs2230806, rs2230805, rs2249891, rs4149281, rs4743764, rs1929841, rs2000069, rs2275542, rs3904998, and rs4149268). Taken  (c.2) Test to detect deviations from neutrality. Based on the long-range haplotype test [35] and integrated haplotype scores [9,36] Acuñ a-Alonzo et al. [22] suggested that the autochthonous Native American ABCA1*230Cys allele could have been positively selected. However, demographic events, population structure, and other stochastic processes can create complex patterns in the genome, obscuring signals of natural selection or mimicking adaptive processes [37]. Additionally, positive and balancing selections show different effects on the genetic diversity patterns within and between populations [38]. Therefore, we performed additional analyses to explore these issues and elucidate the factors responsible for the eventual effect of natural selection on the ABCA1*230 locus.
To detect loci under selection, we used a method that contrasted the observed population differentiation (F ST ) with that generated for a null simulated distribution under a hierarchical island model using a coalescent approach. In this model, demes exchange more migrants within groups than between groups to generate the joint distribution of genetic diversity within and between populations [38]. Thus, a p value can be estimated from the joint distribution for the population heterozygosity (H e ) and F ST using a kernel density estimation procedure [30]. The analysis was performed using Arlequin 3.5.1 in consideration of 126 Native Americans whose results for the ABCA1*230 locus were known. Data from 20 other autosomal SNPs (rs6559725, rs11140096, rs4877767, rs4014024, rs11140109, rs7872891, rs7850633, rs17086298, rs10746709, rs5014093, rs10868019, rs11140116, rs3860938, rs3860941, rs4097644, rs9942844, rs12551103, rs7863524, rs4877785, and rs70439590) from these same individuals were compiled from a major SNP panel (Table S2). These 20 additional SNPs were selected based on their location (chromosome 9: from position 85252250 to 85317359) inside the putative neutral region, defined by Schroeder and colleagues [39], which comprises ,76,000 bp around the D9S1120 locus. Using this database (Table S2 and Figure S1), we were able to evaluate whether the joint distribution of the observed H e and F ST for the ABCA1*230 Caution is needed regarding the classification of these modes of subsistence, since they are not stable over time and may not be unique. However, the two categories adopted here (agriculturalist and hunter-gatherer/forager) represent general pre- American hunter-gatherer/foragers. In addition, we simulated 100,000 neutral genealogies for a region containing two distinct sets of 20 biallelic markers under demographic scenarios mimicking the settlement of the Americas [32,40]. To accomplish these simulations, we used the msABC software [41], a modification of ms software [42]  (2) three demes that corresponded to the sampled subdivisions (Mesoamerican agriculturalists, Andean agriculturalists, South American hunter-gatherers/foragers); (3) a single ancestral population that existed from 6,350 to 18,000 YBP [32,40]; and (4) different exponential growth rates to include the possibility of an ancestral population of 70 to 830 individuals (i.e., constant population size [40]).
The genetic diversity obtained in the simulations-summarized by intra-(heterozygosity) and interpopulation (global and pairwise F ST ) statistics-was compared to the observed genetic diversity of two genetic datasets: (1) that of the ABCA1*230 locus plus the same 19 flanking SNPs listed in item c.1; (Table S3), and (2) the same 20 SNPs mentioned in section c.2 (Table S2; Figure S1). The summary statistics calculated for each simulation (S vectors) were then compared to the summary statistics of the observed data (S* vectors) using an Euclidean distance measure h = ||S-S*|| with the ABCestimator software, implemented with the ABCtoolbox [43]. The rationale of the analysis was to check which observed dataset could be reproduced with higher fidelity among the range of neutral simulations.

(d) Allele Frequencies vs. Maize Domestication
Genetic, archeological, botanical, and paleoecological data furnished evidence that maize (Zea mays ssp. mays) had a single domestication origin from the wild grass teosinte (Zea mays ssp. parviglumis) in the Río Balsas region, southwestern Mexico, approximately 6,300-10,000 calendar years before present [44][45][46][47][48][49][50][51][52][53]. Pollen samples taken from sediments in lakes, swamps, and archeological deposits have provided evidence for the presence or absence of Zea (maize and/or teosinte) in the Americas and have been used to estimate the age of maize domestication and dispersion [44]. Blake [44] summarized the Zea pollen dates from several American archeological sites, and we selected this data set to perform our analysis. To test the connection between maize culture and the ABCA1*230Cys variant, we used allele frequencies from Mesoamerica/Central America populations (Zapotec, Maya, Nahuatl,Kaqchikel-Quiche, Totonac, Cabecar, and Guaymí) as well as Zea pollen dates obtained in archeological sites located geographically near these populations ( Table 2). Spearman rank order correlations between the two data sets (Zea pollen archeological records and ABCA1*230Cys allele frequencies) were obtained using the Statistica 7.0 software (StatSoft, Incß). Table 1 presents the genotype and allele frequencies for the 1905 individuals analyzed, including the new samples genotyped in the present study. A molecular analysis of variance (AMOVA) test was performed to quantify the level of population structure observed within and between the 3 subdivisions adopted here (Mesoamerican agriculturalists, Andean agriculturalists, and South American hunter-gatherer/forager; Table 1). A significant difference was observed between the subdivisions (F CT = 0.036; p = 0.000). On the other hand, the highest F ST among populations within subdivisions was observed in the South American huntergatherers/foragers (0.053; p = 0.005), and a value 5 times lower was found among Mesoamerican agriculturalists (0.013; p = 0.008); no sign of structuration was found in the Andes area (p = 0.36).

SNP Genotyping and Intra-and Inter-group Structures
No significant differences in allelic frequencies were found when subsistence modes (hunter-gatherer/foragers vs. agriculturalists) were compared using the student's t-test (p = 0.1316; Table 1). However, significant differences were observed between Mesoamerican agriculturalists and Andean agriculturalists or South American hunter-gatherers/foragers (p = 0.0022 and p = 0.0174, respectively). A comparison of South American hunter-gatherers/ foragers and Andean agriculturalists revealed no significant differences (p = 0.351).
(c.1) ABCA1*230Cys allele age. The ABCA1*230Cys allele age estimates, using population frequency information, were 12,097 YBP and 19,409 YBP, considering data from all populations and only those from Mesoamerica/Central America, respectively. But the allele age obtained using the observed LD and a Bayesian approach, is relatively younger, 7,540 YBP, with a posterior probability of 99%. Although the numbers generated with these two distinct methods are compatible with an American origin of the ABCA1*230Cys allele [32], the last seems more realistic since methods based on LD, rather than frequencies, have the property of reflecting what happened to an allele more accurately [33]. Discrepancies between estimates obtained from these two approaches are usually taken as evidence that selection has increased the frequency of the allele to higher levels than expected by random genetic drift [33,54].
(c.2) Detecting candidate loci for selection. Patterns of genetic diversity between populations can be used to detect loci under selection [30]. The joint distributions of He and F ST of the ABCA1*230 locus and 20 other SNPs (listed in item c.2 in the Materials and Methods section) were examined to test whether the ABCA1*230 locus and these SNPs departed from neutral expectation. The values obtained indicated that only the ABCA1*230 polymorphism departed significantly from neutral expectation (p = 0.02; Figure 1). However, when the comparisons excluded the Mesoamericans (e.g., Andean agriculturalists vs. South American hunter-gatherer/forager subdivisions), no significant departure from the expected under neutrality was found (data not shown). These results suggest that the ABCA1*230 allele frequencies in Mesoamerica are incompatible with a simple neutral model.
Our neutral demographic simulation analysis showed results in the same direction. The region containing the ABCA1*230 polymorphism and 19 flanking SNPs presented a slightly lower average heterozygosity than the putative neutral region dataset (0.32 vs. 0.34 respectively); but global and pairwise F ST were higher for the ABCA1 region (global F ST 0.03 vs. 0.01; average pairwise F ST 0.05 vs. 0.01). Considering that both genomic regions were studied using the same quantity of markers and the same sampling strategy, in populations that were subjected to the same demographic history, the observed differences may occur due to diverse factors, one of them being natural selection. To test this hypothesis each dataset, summarized by the above-mentioned statistics, was also compared to each of 100,000 neutral simulations by means of Euclidian distances. The empirical dataset containing the ABCA1*230 polymorphism presented a poorer fit to neutrality than the putative neutral region dataset, showing Euclidian distances that were twofold higher than those of the neutral simulations (70.64 vs. 35.12). Interestingly, when the Mesoamerican agriculturalist subdivision was excluded from the analysis, this difference dramatically decreased (45.80 vs. 35.12). Thus, the poor fit to neutrality observed at the ABCA1*230 site and its flanking regions may be associated with the genetic pattern found in Mesoamerica.
(d) Allele frequencies vs. maize domestication. A significant correlation was observed between the ABCA1*230Cys allele frequencies and the distribution of the Zea pollen relics in Mesoamerica (Figure 2; r = 0.9, p = 0.002). It is important to note that the populations used in the correlation analysis performed here (Zapotec, Maya, Nahuatl, Kaqchikel-Quiche, Totonac, Guaymí, and Cabecar) were investigated for microsatellites and other genetic markers in previous studies conducted by our and other groups [27,55,56]. These studies indicated that these populations have a substantial Amerindian substrate, a generally small European contribution, and almost no African influence. For instance, Wang et al. [27] showed that the Maya and Guaymí showed the highest and the lowest numbers of individuals with some level of recent European and African admixture, respectively. This indicates an opposite trend from what would be expected if the level of admixture with non-Indians was influencing our findings, since the variant allele is absent in Europeans and Africans.

Discussion
We can now examine some hypotheses in an attempt to explain the results and to draw the evolutionary scenario associated with the pattern of diversity of the ABCA1* Arg230Cys polymorphism.
Other crops were also present in the pre-Columbian Mesoamerican civilizations (squash and beans; [50,59]), but maize was the dietary base for most of these civilizations. For example, Benedict and Steggerda [60] showed that 75% of the calories consumed by the Mayas were derived from maize. In addition, Mesoamerica was the only region in the world where an ancient civilization lacked a domesticated herbivore. Therefore, protein from domesticated animal sources would have been scarce in Pre-Hispanic Mesoamerica in comparison to other parts of the ancient urbanized world, including the Andes [61]. As a whole, these studies demonstrated that the diet of the first Mesoamerican sedentary communities was extremely dependent on maize. These early farmers, however, suffered periods of plantation loss, questioning the common assumption that farming and sedentary lifestyle brought increased dietary stability and health homeostasis [62]. Several studies have revealed that homeostasis should have declined with sedentary farming, and bioarcheologists and paleopathologists have also detected a deterioration in Mesoamerican health indices from ,8,000 to ,500 years before present-YBP ( [63] and references therein). Domestic crops are more vulnerable than wild ones, crowding promotes crop diseases, and storage systems often fail (estimates suggest that as much as 30% of stored food is lost even in a modern sophisticated system [62]). In other study, based on the molecular analysis of dietary diversity for three archaic Native Americans, Poinar et al. [64] found evidence that, as compared to individuals dependent on agriculture, the diet of hunter-gatherers seems to have been more varied and nutritionally sound. Clearly, a diet based on one or only a few crops should have been deleterious to health in the pre-Columbian era [65]. These different lines of evidence illustrate that the incipient farming niches of Mesoamerica, when communities of hunters/gatherers/foragers started to cultivate and domesticate wild plants, could have been remarkably unstable like those of other pre-industrial societies [6]. Based on what was discussed above, as well as in our results (allele age and neutrality/selection tests), it is reasonable to suppose that ABCA1*230Cys has an American origin and it could have had a selective advantage during the periods of food scarcity experienced by Mesoamericans during the implementation of the sedentary life style based on maize. The strong correlation between maize culture propagation and 230Cys frequencies in this region reinforces this suggestion, even when considering that the advantage of the allele may have been lost after technological innovations had been implemented and agricultural production stabilized. Peng et al. [66] presented evidence for a similar case of gene-culture coevolution, suggesting that positive selection for the ADH1B*47His allele was caused by the emergence and expansion of rice domestication in East Asia.
Noteworthy is that other environmental factors may also have been involved in the distribution of the ABCA1 alleles, since cholesterol plays an important role in various infectious processes, such as the entry and replication of Dengue virus type 2 and flaviviral infection [67]. Additionally, the ABCA1 transporter participates in infectious and/or thrombotic disorders involving vesiculation, since homozygous ABCA1 gene deletions confer complete resistance against cerebral malaria in mice [68,69]. These findings can be considered as additional causal factors to the ABCA1*230Cys selective sweep associated with agricultural development. A sedentary village lifestyle with a corresponding growth in the density of the local population can promote an increase in the mortality rate, particularly in children under 5 years of age [70]. For example, archeological and paleoecological evidence in Europe showed that during the Neolithic demographic transition, the causes of increased infant mortality would have included a lack of drinking water supplies, contamination by feces, emergence of highly virulent zoonoses, as well as an increase in the prevalences of other germs such as Rotavirus and Coronavirus (causing diarrhea, one of the main killers of children under 5 years of age), Streptococcus, Staphylococcus, Plasmodium (P. falciparum and P. vivax, which are believed to have emerged more recently), and Herpesvirus [70]. However, the real impact of the ABCA1*230Cys variant in these infectious processes will require additional functional studies.
In agreement with this historical scenario, the genetic variation in Arg230Cys presented a worse fit to neutrality than loci known to be neutral, indicating that selective mechanisms are necessary to explain the genetic diversity of Arg230Cys, especially when the Mesoamerican agriculturalist subdivision is considered in the analysis. This result also supports our hypothesis that maize domestication in Mesoamerica lead to changes in the gene pool of the natives from that region.
South America presents much more diversity in relation to habitats, people, and culture than Mesoamerica. For instance, maize arrived in South America, but apparently the level of consumption seen in Mesoamerica/Central America was rarely found there. Archaeological data indicate that only during the implantation and expansion of the Inca Empire (800-500 YBP) was the level of maize consumption important, but the level of consumption was not comparable to that of Mesoamerica/Central America [71][72][73]. Additionally, South Amerindian hunter-gatherers/foragers present lower intrapopulation genetic variation and higher levels of population structure when compared to those seen in Andean populations [27,74]. This same tendency was also observed in the present study. These results indicate low levels of gene flow between villages/populations and low effective population sizes, favoring the role of genetic drift. Conversely, the Andean groups show opposing characteristics. These findings correlate well with distinguishing patterns of gene flow and historical effective sizes in these indigenous populations, with cultural differences, as well as with paleoclimatic and environment changes in their habitats [74]. Therefore, the significant role of random processes and/or more heterogeneous cultural and ecological scenarios makes it difficult to define a particular pattern associated with the Arg230Cys polymorphism in South American groups, a situation different from that in Mesoamerica.
In conclusion, our analyses demonstrate for the first time a robust correlation between a constructed niche and a selected Native American autochthonous allele. The 230Cys allele, with a probable origin in America continent, seems have been the target for an ongoing directional selective sweep as a result of the origin and spread of the maize culture in ancient Mesoamerica. Figure S1 Twenty SNPs selected based on their location (chromosome 9: from position 85252250 to 85317359) inside the putative neutral region, defined by Schroeder and colleagues [39], which comprises ,76,000 bp around the D9S1120 locus.