Humans living at high altitude (≥2,500 meters above sea level) have acquired unique abilities to survive the associated extreme environmental conditions, including hypoxia, cold temperature, limited food availability and high levels of free radicals and oxidants. Long-term inhabitants of the most elevated regions of the world have undergone extensive physiological and/or genetic changes, particularly in the regulation of respiration and circulation, when compared to lowland populations. Genome scans have identified candidate genes involved in altitude adaption in the Tibetan Plateau and the Ethiopian highlands, in contrast to populations from the Andes, which have not been as intensively investigated. In the present study, we focused on three indigenous populations from Bolivia: two groups of Andean natives, Aymara and Quechua, and the low-altitude control group of Guarani from the Gran Chaco lowlands. Using pooled samples, we identified a number of SNPs exhibiting large allele frequency differences over 900,000 genotyped SNPs. A region in chromosome 10 (within the cytogenetic bands q22.3 and q23.1) was significantly differentiated between highland and lowland groups. We resequenced ~1.5 Mb surrounding the candidate region and identified strong signals of positive selection in the highland populations. A composite of multiple signals like test localized the signal to FAM213A and a related enhancer; the product of this gene acts as an antioxidant to lower oxidative stress and may help to maintain bone mass. The results suggest that positive selection on the enhancer might increase the expression of this antioxidant, and thereby prevent oxidative damage. In addition, the most significant signal in a relative extended haplotype homozygosity analysis was localized to the SFTPD gene, which encodes a surfactant pulmonary-associated protein involved in normal respiration and innate host defense. Our study thus identifies two novel candidate genes and associated pathways that may be involved in high-altitude adaptation in Andean populations.
Citation: Valverde G, Zhou H, Lippold S, de Filippo C, Tang K, López Herráez D, et al. (2015) A Novel Candidate Region for Genetic Adaptation to High Altitude in Andean Populations. PLoS ONE 10(5): e0125444. https://doi.org/10.1371/journal.pone.0125444
Academic Editor: Yong-Gang Yao, Kunming Institute of Zoology, Chinese Academy of Sciences, CHINA
Received: December 2, 2014; Accepted: March 12, 2015; Published: May 11, 2015
Copyright: © 2015 Valverde et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: Guido Valverde acknowledges the DAAD (German Academic Exchange Service) Scholarship (Forschungsstipendium Referat: 414 / PKZ: A/07/97245). Kun Tang acknowledges the support by the Max-Planck-Gesellschaft <www.mpg.de> Partner Group Grant and the National Science Foundation of China (31371267). This research was supported by the Max Planck Society. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
It is generally accepted that anatomically modern humans emerged in Africa and radiated from there to colonize most of the world's land masses . During this “out of Africa” diaspora, modern humans encountered new habitats with a very diverse set of ecological conditions in contrast to the African homeland, e.g., in the form of new geographic environments, climates, diets and/or pathogens. Humans adapted successfully to these new conditions both culturally and biologically, the latter involving physiological acclimatization and/or genetic adaptation. One of the extreme habitats successfully colonized by humans is high altitude.
The main environmental stresses of living in elevated plateaus and mountainous regions are the decrease of temperature and humidity, the increase in solar electromagnetic radiation, and hypobaric hypoxia (defined as the decrease in oxygen intake for metabolic processes due to reduced barometric pressure) [2,3]. Although the term “high altitude” has no precise definition, it is generally taken to refer to regions that are 2,500–3,000 meters (m) or more above sea level, as the majority of newcomers arriving in such regions present certain clinical, physiological, anatomical and biochemical changes (reviewed in ). Moreover, some populations of humans have developed a physique that enables permanent habitation of high-altitude regions despite the severe conditions of hypoxia and other environmental stressors.
Three main high-altitude regions of the world have supported relatively large human populations for millennia: the Ethiopian highlands of the Semien Mountains [4,5], the Tibetan Plateau and Himalayan valleys [6,7], and the Andes of South America [2,8,9]. In order to overcome hypobaric hypoxia, the human body needs to adjust the cascade of metabolic processes for oxygen uptake and utilization. However, there is no universal pattern of response to hypoxia. People living in each of the above mentioned regions exhibit diverse respiratory, circulatory, hematological, and even pathological patterns of acclimatization and/or adaptation. For example, there is a relatively low hemoglobin concentration in Ethiopian and Tibetan highlanders as opposed to Andean or European populations living at high altitude [4,10,11]. Tibetans, similar to sojourners, display higher hypoxic ventilatory response (HVR) which results in increased ventilation compared with Andeans [12,13]. Furthermore, chronic mountain sickness (CMS), a disease defined as loss of adaptation to altitude , is common in the Andes, occasionally found in the Himalayas, and absent from the Ethiopian highlands [5,15]. CMS has a strong familial component, and it has been also noted that in Bolivian Andeans CMS is predominant in males of mixed or entirely European genetic background [16,17]. Moreover, having been born and raised within multigenerational high-altitude residence families appears to confer a substantial advantage in survival and performance at high-altitude environments (reviewed in ). This is in accordance with expectations that distinctive traits between high and low-altitude natives (or between different high-altitude native groups) may reflect genetic adaptations resulting from natural selection.
The characteristic morphology and physiology of high-altitude natives, in particular Tibetans and Andeans, has been studied in detail , enabling the identification of underlying candidate genes or groups of genes (e.g., ), such as the hypoxia-inducible transcription (HIF) pathway, renin-angiotensin system (RAS), and nitric oxide synthases (NOSs) [20–26]. However, because of the limited genomic scope of many candidate-gene studies, functionally relevant variation may have been overlooked. Moreover, the use of rather dissimilar lowland population outgroups, for example comparing Andeans with controls of European or indigenous North American genetic background (e.g., [20,21]), lessens statistical power. More recently, studies applying genome-wide scans have independently identified genes whose products participate in the HIF pathway (e.g. EGLN1 and EPAS1), and represent strong targets of selection at high altitude, especially for the Tibetan population [27–32]. Besides the HIF pathway, two genes involved in heart performance (VEGFB and ELTD1) have also been implicated in the elevated hematocrit characteristic of high-altitude populations in the Andes , while new candidate-gene sets have been proposed in Ethiopians as well [34–36].
In the present study, we focused on three indigenous populations from Bolivia, namely two groups of native inhabitants living at high altitude (≥3600 meters above sea level) in the Andes, Aymara and Quechua, and as a control group the Guarani from the Gran Chaco lowlands. Using a genome-scan approach and pooled population samples, we identified a candidate region in chromosome 10 that exhibited significant allele frequency differences between the high- and lowland populations. Targeted resequencing of approximately 1.5 Mb of the region of interest revealed strong signals of positive selection in both Andean groups, suggesting that this genomic region harbors genes for high-altitude adaptation in these populations.
We collected saliva samples for the extraction of genomic DNA from 55 (24 males / 31 females) Aymara (AYM) and 21 (18 males / 3 females) Quechua (QUE) from locations above 3,600 meters, and 23 (14 males / 9 females) Guarani (GUA) from the lowlands (see Materials and Methods) in Bolivia (Fig 1). Samples were obtained from healthy individuals with no known medical record or indication of CMS or any other altitude disease. All sampled individuals were unrelated and of self-identified ancestry as Aymara, Quechua or Guarani. Given the high historical rates of post-Columbian colonization male-mediated admixture into Native American communities , we performed determination of Y chromosome haplogroups for confirmation of the donor’s ethnicity in all of the male samples. Indeed, the most common haplogroup found in our collection of Bolivian males was Q (predominant branch of the Y phylogeny observed in modern-day Amerindians of Central and South America), at frequencies of roughly 80% in all three groups (S1 Fig).
Pooled DNA Microarray genotyping and estimation of allele frequencies
To search for highly-differentiated genic regions among the high- and lowland groups, and therefore putatively involved in altitude adaptation, we pooled DNA samples from each population independently in triplicate (see Materials and Methods), genotyped each pool on the Affymetrix Genome-Wide Human SNP Array 6.0, and estimated allele frequency differences between each pair of populations based on the intensity of the hybridization signals. This approach has been used in several studies [38–41]. On average, the SNP call rate was 87.5% for the Aymara pools, 85.1% for the Quechua pools, and 88.2% for the Guarani pools. These rates are comparable to previous results with the Affymetrix platform for pooled DNA (e.g., ) or even with DNA from single samples (e.g., ).
We used frequency estimates for every called SNP obtained from the microarray experiments to perform pairwise comparisons among the populations, and we identified candidate regions that contained highly differentiated SNPs (see Materials and Methods). The tests were done between all pairs of the three groups as well as between the highlander group (combining Aymara and Quechua, from here on referred to as HL) and the lowlander group (Guarani, from here on LL). Aymara and Quechua share similar environments and lifestyles, and previous studies have found that they are genetically similar as well [33,43,44]. Since the main goal of this study was to investigate loci related to high-altitude adaptation, the following results and discussions will be mainly focused on the HL-LL comparison, as in previous studies [24,32], unless otherwise specified.
Based on the HL-LL comparison, we detected 9 candidate regions containing SNPs exhibiting large allele frequency differences (≥ 0.3); these regions were also supported by a t-test (Fig 2). In particular, a region on chromosome 10 (81.7~82.2 Mb) was enriched for several differentiated SNPs. In total, we found 56 SNPs having estimated allele frequency differences above 0.25, with four of them above 0.4 (Fig 2).
The 9 candidate regions and corresponding SNPs exhibiting significant allele frequency differences are labeled. The top red line indicates allele frequency differences of 0.4, and the bottom red line indicates genome-wide significance (P value = 0.5 x 10–4).
Validation of SNPs with large allele frequency differences
We selected the 14 SNPs with the highest estimated allele frequency differences in the 9 detected candidate regions for individual genotyping validation, with an emphasis on the chr10:81.7~82.2 Mb region (with 6 SNPs); the individually genotyped SNPs are listed in Table 1 and also indicated in Fig 2. Each of the 14 SNPs was typed in the full set of individuals, not just those used in constructing the pools (see Materials and Methods). Observed allele frequencies were calculated by genotype counting and we performed a Fisher's exact test for significant differences in allele frequencies, correcting for multiple testing. We used a very conservative cutoff of 0.5 x 10–8, assuming 1 million random markers and a single test level of 0.05. With this threshold, five out of the six SNPs from the region on chr10:81.7~82.2 Mb stood out as highly significant in the HL-LL as well as the Aymara-Guarani comparisons (Table 1). It is worth noting that Aymara and Quechua had very similar allele frequencies at these 14 SNPs, supporting the merging of these two groups into a single highlander group (Table 1) for further analysis.
Signals of population differentiation and positive selection
As the region on chromosome 10 (spanning approximately 500 kb from 81.7 to 82.2 Mb) contained several SNPs exhibiting significant allele-frequency differences between high- and lowland populations (Table 1), we investigated this signal in more detail. We performed targeted resequencing of a 1.5 Mb segment (from 81.1 to 82.6 Mb) that encompassed this region (see Materials and Methods) in 20 Aymara, 18 Quechua and 20 Guarani. The average coverage was 9X, and 1,983 SNPs were identified.
When population differentiation between the lowlander Guarani and the two highlander groups was examined, both the AYM-GUA and QUE-GUA comparisons revealed extreme FST values  that were generally above 0.5, reaching peak values of ~0.7 in the chr10:82.0~82.3 Mb region (S2A Fig). By contrast, the differentiation between the two highlander groups was much lower, with the FST values generally below 0.1 in this region (S2A Fig). This suggests that Aymara and Quechua may have shared very similar demographic/selection events, and that the extreme differentiation between the highlanders and Guarani in this region is unlikely to be accounted for solely by neutral genetic drift.
To formally test the latter hypothesis, we computed neutral simulations assuming several different demographic scenarios. Previous studies showed that the peopling of the Americas occurred more than 15,000 years ago through Beringia [46–48], with the initial colonization of the Andes around 11,000 years ago . An admixture graph of Native American populations suggested that Guarani are descendants of Amazonia ancestry, which separated from high-altitude populations before the latter occupied the Andes . To simplify the model, we set the divergence time between high and low-altitude populations at 10,000 years ago, and the divergence of the two high-altitude populations at 5,000 years ago. Population size estimates were obtained by using the demographic trajectory of Mexicans based on 1000 Genomes data  (red line in S3 Fig) as a surrogate for the common ancestral history of Quechua, Aymara and Guarani (see Materials and Methods). Given the much smaller recent population sizes of these three groups compared to Mexicans, we set constant population sizes of 9,000 after divergence for Quechua, Aymara and Guarani, instead of the sharp growth in the Mexicans. This scenario is referred to as the standard demographic model (green line in S3 Fig). We also simulated a bottleneck model with half the recent population sizes (blue line in S3 Fig), and a constant population size model (purple line in S3 Fig).
None of these neutral models could account for the strong divergence observed between Guarani and the two highlander groups in the chromosome 10 candidate region. The significant divergence signals strongly support the occurrence of positive selection, either in HL, LL, or in both groups. Table 2 lists the top 5% thresholds of FST values in simulations under all models (see Materials and Methods). Aymara and Quechua revealed an obvious lack of derived alleles with intermediate frequencies, particularly around the 82.0~82.3 Mb region of chromosome 10. Guarani did not seem to show any specific patterns (S2B Fig).
We also calculated Tajima’s D  and Fay and Wu’s H  for the resequenced data (see Materials and Methods). Interestingly, in both Aymara and Quechua, when compared to the standard demographic model, Tajima’s D values are marginally significant (P values = 0.0568 and 0.0565, S2C Fig) around chr10:82.0~82.3 Mb, and Fay and Wu’s H values are significant in both highlander groups in the same region (P values = 0.014 and 0.0212, S2D Fig). Guarani exhibits sporadic signs of selection around region 81.2 Mb and 82.6 Mb, but the signals are not consistent between Tajima’s D and Fay and Wu’s H (S2C and S2D Fig).
Since the genic region chr10:82.0~82.3 Mb in Aymara and Quechua hosts the strongest and most consistent signals of selection, and the genetic profiles in this region are highly similar between Aymara and Quechua as shown in previous studies [33,43,44], we carried out in depth analyses of positive selection on the merged HL data. First, a composite of multiple signals like (CMSL) test was constructed based on six different selection tests: FST, ΔDAF , Tajima’s D, Fay & Wu’s H, XP-CLR  and iHS  (see Materials and Methods). As can be seen in Fig 3, individual tests in general showed consistent signals of positive selection in this region. The patterns of FST, ΔDAF, Tajima’s D, and Fay & Wu’s H are highly similar in Aymara and Quechua (Fig 3A–3D and S2 Fig); however in Guarani there is no consistent pattern (S4 Fig). XP-CLR and iHS both exhibit the highest signals within the 82.0~82.3 Mb interval although the iHS peak locates upstream from that of XP-CLR (Fig 3E and 3F). The maximum XP-CLR value is 46.99 (P value = 9.01 x 10–4) and maximum |iHS| is 3.144 (P value = 0.00732, Fig 3E and 3F). When all these signals are combined together to derive a summary CMSL score (see Materials and Methods), the empirical CMSL scores are highly significant compared to the neutral CMSL distribution obtained from the simulations under the standard demographic (Fig 3G), extreme bottleneck (S5A Fig), and constant size models (S5B Fig). Signals in both individual tests and the CMSL test are consistently located within the 82.0~82.3 Mb interval, which provides strong evidence of positive selection in HL. The CMSL scores narrow the signal to a ~57 kb region that contains one protein coding gene, FAM213A, and an enhancer that significantly influence the expression of this gene . The peak region of CMSL is similar under all models (Fig 3G and S5 Fig), and the enhancer (chr10:82176099–82176325) is close to the highest CMSL signal (chr10:82174949).
(A) FST between HL and LL. (B) Derived allele frequencies of HL. (C) Tajima’s D of HL. (D) Fay & Wu’s H of HL. (E) XP-CLR of HL against LL. (F) Absolute iHS score of HL. (G) CMSL score of HL. Black, green, and red dashed lines are 5%, 1%, and 0.1% thresholds respectively of each test in standard simulations. FAM213A gene and its enhancer, covered by the peak of the CMSL scores, are labeled in red.
Another test commonly used to identify candidate regions of positive selection is the relative extended haplotype homozygosity (REHH) test, which is based on the principle of long extended haplotypes . We scanned the entire candidate region and found widespread REHH signals (Fig 4A). The strongest signal occurred between position chr10:81699238 and chr10:81701722, within the SFTPD gene, where a major core haplotype with a frequency of 52.6% (haplotype 1 in Fig 4B) decays much slower than the other two haplotypes (haplotype 2 and 3 in Fig 4B). The P value for the observed excessive EHH of haplotype 1 is 8.4 x 10–10, indicating a strong signal of positive selection.
(A) P value of each haplotype based on standard simulations. (B) The most significant core haplotype, located in the SFTPD gene. Haplotype 1 reaches a frequency of 52.6% and is inferred to be the haplotype with the most significant signal of selection.
Nowadays, it is estimated that worldwide some 140 million  people reside permanently at an altitude of 2,500 meters or more above sea level, and that countless others sojourn to high plateaus and mountainous regions for leisure or professional activities. The physiology of humans living at high altitude has been the subject of over a century of research, especially in Tibetan, Ethiopian and Andean populations which have acquired long-term physiological, anatomical, and biochemical responses to high-altitude environmental stress when compared to lowland inhabitants. Recent advances in genomic technologies are providing opportunities to explore the genetic basis of their adaptive traits, particularly in the regulatory systems of respiration and circulation.
In the present study, we focused on three populations from Bolivia, namely two groups of native inhabitants of the Andes: Aymara and Quechua, and Guarani from the Gran Chaco lowlands as a neighboring control group. Special care was taken to obtain samples from members of long-term high-altitude residence families, avoiding the collection of recent immigrants. Moreover, we collected samples only from healthy individuals, especially with no known medical record or indication of CMS
We performed a genome-wide scan of over 900,000 SNPs using microarray technology on pooled DNA samples (see Results), thus examining the genetic profile of each group in the search for large differences in allele frequencies among them. After validation of the individual genotypes for the variants exhibiting the largest allele frequency differences (on average more than 38%), we applied multiple-test corrections and detected a region in chromosome 10 harboring several SNPs that achieved statistical significance. Genotyping of pooled samples decreases the power to detect weaker signals of population differences; however the fact that we do detect a strong signal of population differentiation that is likely to be due to selection further substantiate the utility of pooled data in genome scan studies [38–41].
The region on chromosome 10 is a novel candidate region for high-altitude adaptation, which has not been detected in previous studies. We further verified that the signal of high differentiation between HL and LL groups for the chromosome 10 region was unlikely to arise by demographic events alone after carrying out simulations under various demographic models. In order to investigate in more detail the potential signal of selection in chromosome 10, we performed targeted resequencing of ~1.5 Mb surrounding the region of interest. Table 3 lists all protein coding genes in the candidate region and all non-synonymous SNPs observed in the resequencing data; regulatory elements could also be the target of positive selection, and several enhancers are included in the candidate region (Fig 3G and S1 Table).
DAF stands for derived allele frequency.
The strongest evidence of positive selection was in the region of 82.0~82.3 Mb in chromosome 10, where several genes (ANXA11, MAT1A, DYDC1, DYDC2, FAM213A, TSPAN14 and SH2D4B) are included in or near the borders of this region (Fig 3G). One non-synonymous SNP (rs1049550) was detected in the ANXA11 gene, is predicted to be ‘damaging’ by Polyphen , and showed a significant differentiation between highlanders and lowlanders. Strong associations have been repeatedly found between genetic polymorphisms of ANXA11 and sarcoidosis, a systemic immune disorder characterized by destructive, noncaseating epithelioid granulomatous lesions (i.e., nodules caused by inflammation that do not lead to cell death) [60–62]. It is most often located in the lung or associated lymph nodes. The sarcoidosis-associated SNPs are listed in Table 4. In addition, a genome wide association study of chronic obstructive pulmonary disease identified one SNP in an intron of ANXA11 . The risk allele is rs6585424-G (Table 4, P value = 1 x 10-10).
The FAM213A gene was localized by the CMSL test under all models of population history (Fig 3G and S5 Fig). Also known as PAMM: peroxiredoxin (PPX)-like 2 activated in M-CSF-stimulated monocytes , it has been shown that the expression of FAM213A can protect cells from oxidative stress and modulate osteoclast differentiation through inhibition of NF-κB and c-Jun activation, which may affect bone resorption and help to maintain bone mass . Oxidative stress is one of the most detrimental effects of hypobaric hypoxia, which is caused by increased reactive oxygen species (ROS), reactive oxygen and nitrogen species (RONS), decreased antioxidants and reduction in pulmonary nitric oxide (NO) bioavailability (reviewed in ). Antioxidant supplementation has been shown to have beneficial effects and reduced the oxidative stress of some individuals . The expression levels of antioxidants were upregulated in hypoxia tolerant rats , and also in sojourners after a high-altitude stay, even if not sufficient to ameliorate oxidative stress completely . These studies suggest that antioxidants are quite important in protecting against oxidative stress, and adaptive effects on the antioxidant system could be influenced by genetic factors, which differ between highlanders and sojourners. Moreover, as a consequence of preventing oxidative damage, the expression of FAM213A could abolish osteoclast formation, resulting in the maintenance of bone mass. It is unclear if this function of FAM213A would be beneficial for high-altitude adaptation; however, studies have shown accelerated growth in lung volume and chest dimensions in highlanders vs. lowlanders [70–72], which might be a developmental compensatory response to high-altitude hypoxia .
In addition to the FAM213A gene itself, the target region of positive selection includes an enhancer of FAM213A. Two SNPs are located in this enhancer; one (rs77999529) exhibits a low minor allele frequency in various human populations, while the second (rs150230265) exhibits significant allele frequency differences between HL and LL (FST = 0.229, P value = 0.014, S1 Table). The global distribution of the allele frequencies of rs150230265 is shown in S6 Fig, which suggests that the derived G allele is restricted to Native American populations. Moreover, the derived allele is at highest frequency (0.382) in HL, and hence could be considered a candidate mutation. The fact that the rs150230265 SNP does exist at low frequency in low-altitude Native American populations (but nowhere else) makes selection on standing variation a possibility, which would further reduce the signal of selection in tests for selective sweeps. Our results suggest that elevated expression of FAM213A by positive selection on the enhancer could help protect against oxidative damage in a hypoxia environment. The mutation and the enhancer could thus be novel candidates for further experimental studies and therapeutic targets.
Although FAM213A was detected as a candidate gene in the CMSL analysis, it was not identified by the REHH analysis. Instead, a different candidate gene in the resequenced region, the SFTPD gene, was identified by this analysis. These different results are not surprising, given that different methods have different power to detect selection, especially in the case of partial selective sweeps and/or selection on standing variation . SFTPD encodes lung surfactant protein D (SP-D), which contributes to the lung’s defense against inhaled microorganisms and may participate in the extracellular reorganization or turnover of pulmonary surfactant. Pulmonary surfactant in turn lowers the surface tension at the air-liquid interface in the alveoli of the mammalian lung and is essential for normal respiration. Given the low oxygen levels at high altitude, altering the surfactant surface tension could be beneficial. A genome-wide association study of chronic obstructive pulmonary disease identified two risk alleles in an intron of SFTPD: the G allele of rs3923564 (P value = 2 x 10–27) and the T allele of rs7078012 (P value = 5 x 10–9) . Several non-synonymous SNPs in SFTPD were detected in our resequencing data (Table 3). The rs3088308 SNP involves a serine to threonine substitution, was predicted to be ‘damaging’ by Polyphen, and exhibits significant differentiation between HL and LL (FST = 0.537, P value = 5.46 x 10–5). However, the derived allele frequency is higher in LL than in HL. Another SNP (rs721917) involves a methionine to threonine substitution and exhibits significant differentiation between HL and LL (FST = 0.271, P value = 7.6 x 10–3) with a higher frequency of the methionine-encoding allele in HL. This mutation has been investigated intensively and influences oligomerization, function, and the concentration of SP-D in serum . The Thr/Thr genotype had significantly lower SP-D serum levels, and is associated with increased disease-susceptibility [76–78]. The Met allele was associated with defense to respiratory syncytial virus . The third non-synonymous SNP is rs2243639 (Thr/Ala), which also showed significant differentiation between HL and LL (FST = 0.181, P value = 0.028).
In addition to SFTPD, there are two other genes coding for surfactant pulmonary-associated proteins (SFTPA1 and SFTPA2) which are within the genomic region resequenced, but outside the region showing the highest signals in the CMSL and REHH tests. Mutations in SFTPA1 and SFTPA2 are associated with idiopathic pulmonary fibrosis , and (along with SFTPD) play an essential role in surfactant homeostasis and in the defense against respiratory pathogens [80,81]. Given that these surfactant proteins play a role in both lung function and disease resistance, it is unclear which of these (or perhaps both) might be the driving force behind the signals of selection that we detect in the HL populations.
The novel candidate genes for high-altitude adaption identified here are in accordance with previous evidence that the functional adaptations of Andean, Tibetan, and Ethiopian natives to high altitude differ . Andeans exhibit lower levels of resting ventilation, a more ‘blunted’ HVR, higher levels of pulmonary hypertension and an increased frequency of CMS. In Tibetans, the exhaled NO is elevated compare to Andean and lowlanders , which was associated with higher blood flow through the lung . Similar hemoglobin phenotypes among Tibetan and Ethiopian highlanders associate with different genetic loci, and the variants at those loci are present in most populations regardless of altitude . Overall, populations in different continents have adapted to high altitude through different adaptation processes as a result of convergent evolution [85,86].
A recent study showed that altitude adaptation in Tibetans may have arisen via introgression of Denisovan-like DNA . Thus, modern humans could obtain genetic adaptations to local environments through admixture with other hominin species [88–90]. Native American populations migrated from Siberia, where admixture might have happened between ancestors of modern Asians and archaic humans (including Neanderthals and Denisovans) [91–93]. We therefore checked our sequence data and found no haplotype specifically shared with Denisovans in the region surrounding both FAM213A and SFTPD genes (S7 Fig). These results further support different routes to functional adaptation in Tibetan and Andean high-altitude natives .
In summary, we identified a novel candidate region for high-altitude adaptation in Andeans, with several genes and/or enhancers potentially under positive selection. In particular, multiple tests localized the signal to FAM213A and a related enhancer encoding an antioxidant to reduce oxidative stress, which might be beneficial for adaptation to high altitude in the Andes. However, further functional studies are needed to elucidate the role of this gene (as well as the other candidates) in high-altitude adaptation.
Materials and Methods
Sample collection and DNA extraction
We collected in total 99 saliva samples from South American indigenous individuals from Bolivia. The participants were informed about our study objectives and provided written consent for the anonymous use of the biological material for academic research. This research was approved by the Ethics Committee of the University of Leipzig Medical Faculty. All sampled individuals were unrelated and of self-identified ancestry as either Aymara, Quechua or Guarani. They were members of long-term residence families from the places where samples were gathered; sample collection from recent immigrants was avoided. Furthermore, special care was taken to obtain samples only from healthy individuals, with no known medical record or indication of CMS or other altitude-related illness. The Aymara individuals were sampled in El Alto (N = 24, situated at 4,100 m altitude above sea level), Tiwanaku (N = 24, 3,885 m), and La Paz (N = 7, 3,600 m). The Quechua individuals were sampled in Oruro-Soracachi (N = 21, 3,750 m), and the Guarani individuals were sampled in Santa Cruz-Gran Chaco (N = 23, 416 m). Genomic DNA was extracted from the saliva samples following the protocol published elsewhere , and the fraction of endogenous (i.e., human) DNA present in the extracts was quantitated as described previously .
Y chromosome haplogroups
A total of 56 males from our Bolivian collection of samples were genotyped for 24 SNPs (12f2, M106, M124, M145, M168, M170, M172, M174, M175, M20, M201, M207, M213, M214, M269, M45, M52, M69, M9, M91, M96, MEH2, SRY10831, and Tat) defining the major branches of the Y chromosome tree . The 24 loci were typed and used for haplogroup assignment as described in .
DNA pooling and microarray genotyping
To search for candidate genomic regions of high differentiation between high and low-altitude Bolivian populations, we genotyped pooled samples on microarrays; this approach has been used successfully in other studies [38–41]. A total of nine equimolar DNA mixtures were constructed, consisting of one pool of 18 Aymara, one pool of 17 Quechua, and one pool of 18 Guarani samples; each pool was prepared independently in triplicate with the same individuals, thus resulting in three technical replicates for each pool. We selected the individual genomic extracts containing the highest fractions of endogenous DNA, with all selected extracts containing ≥ 30% human DNA . Each individual sample contributed 100 ng human DNA to the mixture. Pooled DNA solutions were diluted to a working concentration of 50 ng/μl with ddH2O. Affymetrix Reference Genomic DNA 103 was used as a positive control for the microarray experiments. Genotyping was performed using the Affymetrix Genome-Wide Human SNP Array 6.0 according to the manufacturer's protocol. Each of the nine DNA pools and the positive control sample were assayed on a separate microarray. Each array was scanned using the Affymetrix GeneChip Scanner 3000 with the High-Resolution Scanning Upgrade. The cell intensity files were analyzed using the Affymetrix Genotyping Console (GTC v2.1), and the concordance of called genotypes (excluding missing data) between replicates and between the positive control and its consensus genotypes provided by Affymetrix was analyzed using GTC v2.1. The concordance for the pooled Aymara genotypes was 97.5% (on average for the pairwise comparison among the three replicates), for the Quechua was 96.7%, and for the Guarani was 97.9%. The concordance of the positive control compared to the consensus genotypes provided by Affymetrix was 99.7%.
Allele frequencies from DNA pools and highly differentiated genic regions
The allele frequency per called SNP and population was estimated from the raw probe intensity data of each microarray as previously described [38–40]; the allele frequency data are available from the authors upon request. Briefly, we computed the Relative Allele Signal (RAS) score as an estimate of the allele frequency. In order to have consistent calculations, we only considered the first three probe sets for each SNP locus and removed SNPs whose standard deviation of RAS across technical replicates and/or probe sets in any group of individuals was greater than 0.1. Then, the allele frequency for each group of individuals was estimated by averaging across the technical replicates and the probe sets:, where j is the technical replicate and k is the probe set. The allele frequency difference was taken as , where p1 and p2 are the allele frequencies in two different groups. We calculated the allele frequency differences between groups in a pairwise fashion, and we also compared the Guarani against Aymara and Quechua individuals combined together into a single highland group.
We applied a t-test to formally evaluate the statistical significance of the calculated allele frequency differences. Therefore, , where T should follow a distribution. The overall variance: consists of two parts: Vs+Vp, where Vs represents the sampling variance, and Vp represents the component from the pooling process. Vs is given by and Vp is given by , where n is the sample size.
We applied a multi-locus approach to search for highly differentiated genic regions. SNPs were ranked according to either the allele frequency difference or P value significance. The top 0.1% SNPs were connected if they were within 100 kb distance, and a differentiated region was called if there were more than 10 top SNPs connected.
Validation of estimated allele frequencies
Confirmation of the allele frequencies estimated from the RAS scores for the 14 SNPs with the largest HL-LL allele frequency differences was performed using the ABI PRISM SNaPshot Multiplex System (Applied Biosystems by Life Technologies) according to the manufacturer's protocol. Primers for the single PCRs and for the subsequent extension reactions were designed using the UCSC In-Silico PCR tool (http://genome.csdb.cn/cgi-bin/hgPcr/). Primer interactions within the multiplex were evaluated and minimized using the NetPrimer online software (http://www.premierbiosoft.com/). Briefly, single PCRs amplified the target region surrounding the SNP of interest for each individual contained in the full collection set. The amplicons were then assembled into four separate multiplexes and analyzed on an ABI 3130xl Genetic Analyzer. The SNP calling was performed using the ABI GeneMapper ID v3.2 software. As a positive control for the SNaPshot experiments, the sample HapMap #NA06985 CEPH/UTAH Pedigree 1341 was assayed along with the Bolivian samples. The called genotypes for the control were compared with the consensus genotypes for the same 14 SNP loci obtained from the HapMap website; the concordance was 100%. Additionally, one Bolivian sample was assayed in single primer extension reactions for each of the 14 SNPs, and the called genotypes were compared to the genotypes obtained from the multiplex approach; the concordance was 100%. For the 14 SNPs re-genotyped individually, we performed a Fisher’s exact test to validate the results obtained from the DNA pooling approach. The Fisher’s exact test was performed using R (http://www.r-project.org/).
Capture array and resequencing
We used Agilent custom 1M capture arrays in order to resequence the target region of interest. We designed overlapping microarray probes of 60 bases targeting over 1.5 Mb of the region of interest in chromosome 10 (chr10:81113000–82664000). Probes were tiled every 3 bases across the target region. Probes containing repetitive elements were discarded . We used the human reference sequence NCBI Build 37 (hg19) to design the probes.
Illumina GAIIx libraries were prepared following Meyer and Kircher , with some differences noted below. All samples were sheared with the Bioruptor UCD-200 (Diagenode) down to a range of approximately 200–800 bps. The adapter fill-in step was performed using Dynabeads MyOne Streptavidin C1 (Invitrogen). The beads were prepared and libraries immobilized by aliquoting 25μl bead suspensions for each sample, washing twice with 2X-BWT buffer and eluting in 25μl 2X-BWT buffer. A magnetic plate was used for all washing steps. The adapted sample libraries were added to the bead suspension, pipette-mixed, and incubated for 15 minutes at room temperature. The supernatant was then discarded while the plate was on a magnet and the beads were washed twice with 100μl 1X-BWT buffer. The fill-in step was performed by adding the master mix used in Meyer and Kircher  after removing the buffer, and no subsequent SPRI purification was necessary.
Individual-specific indexes were used to multiplex the libraries prior to hybridization enrichment. These were attached by performing a PCR amplification using the Phusion Mastermix (New England Biolabs, NEB). After indexing, samples were pooled in equimolar ratios and hybridized. After hybridization, quantitative PCRs were performed on the sample pools with the DyNAmo qPCR kit (NEB). Based on the resulting qPCR amplification plots, the sample pools were amplified using the Phusion Mastermix so that they did not reach plateau. Each sample pool was sequenced on a single lane of an Illumina GAIIx run by single-end sequencing using 36 cycles.
Resequence data processing
The raw sequencing reads were aligned to the human reference genome sequence GRCh37 by BWA v0.70  with default parameters. The alignments were transferred to indexed binary alignment map (BAM) files by SAMtools  and duplicates removed with the Picard tool v.1.66.
Genotypes were called by the GATK UnifiedGenotyper v1.4  with the following parameters: the minimal base quality score setting was 20, the minimal mapping quality score setting was 30, and the confidential Phred-scale threshold for genotyping calling setting was 50; the default settings were used for all other parameters. Furthermore, the GATK VariantRecalibrator tool was used to score variant calls by a machine-learning algorithm and to identify a set of high-quality SNPs using the Variant Quality Score Recalibration (VQSR) procedure. The insertions and deletions (indels) were filtered by GATK, resulting in 1,983 SNPs (with average coverage 9X) for the following analyses.
Population Genetic Analyses and Selection Tests
To analyze the population differences and detect signals of natural selection in either high or low-altitude populations, we employed several methods with both empirical polymorphism data and simulated data. These methods were based on population differentiation, the allele frequency spectrum, properties of haplotypes, and composite signals:
FST is a measurement of population differentiation. We calculated it in pairwise manner by using the unbiased estimator of Weir and Cockerham .
We calculated ΔDAF  between a putative selected population and a non-selected population. ΔDAF scores range between -1 and 1. SNPs with positive scores indicate a higher derived allele frequency in the selected population. The ancestral allele states were as determined by the 1000 Genomes Project .
XP-CLR test is a likelihood method for detecting selective sweeps based on multilocus allele frequency differentiation between a putative selected population and a non-selected population . We set 0.05 cM sliding window sizes and uniform grid points with a spacing of 2 kb. The maximum number of SNPs was set to 200 for each window.
Tajima’s D test.
Tajima’s D  was performed with a sliding window of 20 kb and no overlap between adjacent windows. The calculation was performed by an in-house Perl script.
Fay&Wu’s H test.
Fay & Wu’s H  was also calculated by an in-house Perl script with the same sliding window approach as in Tajima’s D test.
This method is based on the length of the haplotype associated with ancestral vs. derived alleles; derived alleles subject to positive selection tend to have unusually long haplotypes, as such alleles have risen to high frequency too quickly for recombination and/or new mutations to break down the length of the associated haplotype. The iHS test partitions haplotypes into an ancestral group and a derived group according to the allele states of core SNPs; iHS is defined as the log ratio of the integrated EHH (extended haplotype homozygosity) for these two groups .
The REHH is another test based on haplotype length and structure, and was calculated with the Sweep software . We set the option ‘matching distance’ to be ‘marker H of about 0.04’.
Numerous methods have been developed to detect positive selection based on various patterns of genetic variation, and hundreds of candidate regions have been identified. But usually these regions are typically large and the causal variants remain unknown. A composite of multiple signals method narrows down the candidate regions and aids in identifying the causal variant . Recently, another framework combing P values in large scale genomic data was used to detect selection . This test is based on Fisher’s combination test . The statistic is computed as , where k is the number of SNPs in one region and Pi is the empirical P value of one test for the SNP i. In Luisi’s study, FST, ΔDAF and iHS statistics are calculated, and regions with high ZF scores indicate positive selection. Following this approach, we used a CMSL method by combining FST, ΔDAF, Tajima’s D, Fay & Wu’s H, XP-CLR and the iHS test. The ZF statistic is computed as above, where k is the number of tests and Pi is the P value of the SNP in test i. We obtained the P value from empirical distributions by simulations.
Simulations were used to calculate the P value of the scores in the empirical data. To account for the impact of demography on the detection of selection, we did simulations under a wide range of demographic scenarios inferred by pairwise sequentially Markovian coalescent (PSMC) model, which is a method to infer the history of population size change based on a single genome sequence . In this study, we sequenced nearly 1.5 Mb, which is not enough for inferring a high resolution Ne trajectory. We therefore used the Ne estimated from the Mexican (MXL) population in the 1000 Genomes Project (red line in S3 Fig), with the modification of a constant Ne in recent history instead of a sharp expansion, as our standard demographic model. We set the divergence time between high and low-altitude populations at 10,000 years ago, and the divergence of the two high-altitude populations at 5,000 years ago. We also used two other models, one with a more intense bottleneck (Ne reduced by 50% during the most recent 10,000 years) and one with a constant Ne of 7,000 for the entire history.
We used a different formula for the time interval boundaries in PSMC: We set n to be 30, to reduce the complexity of the search space. The squared exponential growth of time intervals results in more intervals in the recent past and much fewer intervals in the ancient past, as recent Ne needs more information for accurate inference and is more important for our purposes. We simulated 2 Mb neutral segments with MSMS  with 100 replicates for each of the three demographic scenarios.
S1 Fig. Y Chromosome haplogroup frequencies (%) in the analyzed individuals and populations.
S2 Fig. Population differentiation, derived allele frequencies and signature of selection in individual groups.
(A) FST in all comparisons. (B) Derived allele frequencies. (C) Tajima’s D. (D) Fay & Wu’s H. Black dashed lines are the 5% threshold of corresponding tests from the standard simulations.
S3 Fig. Demographic models used in simulations.
The red line is the population size trajectory of MXL as estimated by PSMC. The green line is from the standard model modified from the MXL trajectory by assuming a constant population size in recent history. The blue line is the extreme bottleneck model modified from the standard model by reducing Ne by half beginning 10,000 years ago. The purple line is the constant model with a constant Ne of 7,000.
S4 Fig. Signature of positive selection in the lowland population as revealed by multiple tests.
(A) FST between HL and LL. (B) Derived allele frequencies of LL. (C) Tajima’s D of LL. (D) Fay & Wu’s H of LL. (E) XP-CLR of LL against HL. (F) Absolute iHS score of LL. (G) CMSL score of LL. Black dashed lines are the 5% threshold of each test in standard simulations, green lines are the 1% threshold, and red lines are the 0.1% threshold.
S5 Fig. Signature of positive selection.
CMSL in highland populations under (A) the extreme bottleneck model and (B) the constant population size model.
S6 Fig. Global distribution of the allele frequencies of rs150230265.
Data are from this study and 1000 Genomes.
S7 Fig. Common haplotype frequency.
Haplotype frequencies in modern humans (from this study and 1000 Genomes data) and Denisovan genome sequence in FAM213A and SFTPD (low quality Denisovan sites were filtered following ). Green is haplotype shared with Denisovan; blue is the first dominant haplotype in HL; red is the second dominant haplotype which contain the derived allele of rs150230265 in HL; purple is the first dominant haplotype in LL; gray are haplotypes with frequencies <10% in Bolivian populations. The radii are scaled by sample sizes. (A) Haplotypes are 10kb extended on both sides from rs150230265. (B) Haplotypes are 10kb extended on both sites from the three non-synonymous SNPs in SFTPD.
S1 Table. Enhancers and associated genes in the candidate region.
The ‘robust set’ enhancers and the ‘enhancer_tss_association’ were downloaded from the FANTOM5 project  (http://enhancer.binf.ku.dk/Pre-defined_tracks.html).
The authors are grateful to all donors who generously contributed the biological material for this study. We thank Takashi Bravo for helping with the sample collection, and Hernán A. Burbano for assistance in designing the resequence capture array.
Conceived and designed the experiments: GV MS. Performed the experiments: GV SL DLH. Analyzed the data: HZ KT DLH JL. Contributed reagents/materials/analysis tools: SL CdF MS. Wrote the paper: GV HZ KT DLH JL MS. Collected the samples: GV.
- 1. Cavalli-Sforza LL, Menozzi P, Piazza A. The history and geography of human genes. Princeton, NJ: Princeton University Press. 1994: 518 p.
- 2. Cassinell CM, Velarde FL, de Bigio DL. El reto fisiológico de vivir en los Andes. l'Institut. 2003: 435 p.
- 3. West JB, Schoene RB, Milledge JS. High Altitude Medicine and Physiology. Hodder Arnold. 2007: 480 p.
- 4. Beall CM, Decker MJ, Brittenham GM, Kushner I, Gebremedhin A, Strohl KP. An Ethiopian pattern of human adaptation to high-altitude hypoxia. Proc Natl Acad Sci U S A. 2002;99: 17215–17218. pmid:12471159
- 5. Xing G, Qualls C, Huicho L, Rivera-Ch M, Stobdan T, Slessarev M, et al. Adaptation and mal-adaptation to ambient hypoxia; Andean, Ethiopian and Himalayan patterns. PLoS One. 2008;3: e2342. pmid:18523639
- 6. Aldenderfer MS. Moving Up in the World. American Scientist. 2003;91: 542.
- 7. Zhao M, Kong QP, Wang HW, Peng MS, Xie XD, Wang WZ, et al. Mitochondrial genome evidence reveals successful Late Paleolithic settlement on the Tibetan Plateau. Proc Natl Acad Sci U S A. 2009;106: 21230–21235. pmid:19955425
- 8. Rupert JL, Hochachka PW. The evidence for hereditary factors contributing to high altitude adaptation in Andean natives: a review. High Alt Med Biol. 2001;2: 235–256. pmid:11443004
- 9. Rothhammer F, Dillehay TD. The late Pleistocene colonization of South America: an interdisciplinary perspective. Ann Hum Genet. 2009;73: 540–549. pmid:19691551
- 10. Beall CM, Brittenham GM, Strohl KP, Blangero J, Williams-Blangero S, Goldstein MC, et al. Hemoglobin concentration of high-altitude Tibetans and Bolivian Aymara. Am J Phys Anthropol. 1998;106: 385–400. pmid:9696153
- 11. Beall CM. Two routes to functional adaptation: Tibetan and Andean high-altitude natives. Proc Natl Acad Sci U S A. 2007;104 Suppl 1: 8655–8660. pmid:17494744
- 12. Zhuang J, Droma T, Sun S, Janes C, McCullough RE, McCullough RG, et al. Hypoxic ventilatory responsiveness in Tibetan compared with Han residents of 3,658 m. J Appl Physiol (1985). 1993;74: 303–311.
- 13. Brutsaert TD. Population genetic aspects and phenotypic plasticity of ventilatory responses in high altitude natives. Respir Physiol Neurobiol. 2007;158: 151–160. pmid:17400521
- 14. Monge C. Life In the Andes And Chronic Mountain Sickness. Science. 1942;95: 79–84. pmid:17757318
- 15. Leon-Velarde F. Pursuing international recognition of chronic mountain sickness. High Alt Med Biol. 2003;4: 256–259. pmid:12855057
- 16. Ergueta J, Spielvogel H, Cudkowicz L. Cardio-respiratory studies in chronic mountain sickness (Monge's syndrome). Respiration. 1971;28: 485–517. pmid:5140341
- 17. Mejia OM, Prchal JT, Leon-Velarde F, Hurtado A, Stockton DW. Genetic association analysis of chronic mountain sickness in an Andean high-altitude population. Haematologica. 2005;90: 13–19. pmid:15642663
- 18. Julian CG, Wilson MJ, Moore LG. Evolutionary adaptation to high altitude: a view from in utero. Am J Hum Biol. 2009;21: 614–622. pmid:19367578
- 19. Shriver MD, Mei R, Bigham A, Mao X, Brutsaert TD, Parra EJ, et al. Finding the genes underlying adaptation to hypoxia using genomic scans for genetic adaptation and admixture mapping. Adv Exp Med Biol. 2006;588: 89–100. pmid:17089882
- 20. Rupert JL, Devine DV, Monsalve MV, Hochachka PW. Beta-fibrinogen allele frequencies in Peruvian Quechua, a high-altitude native population. Am J Phys Anthropol. 1999;109: 181–186. pmid:10378457
- 21. Rupert JL, Monsalve MV, Devine DV, Hochachka PW. Beta2-adrenergic receptor allele frequencies in the Quechua, a high altitude native population. Ann Hum Genet. 2000;64: 135–143. pmid:11246467
- 22. Droma Y, Hanaoka M, Basnyat B, Arjyal A, Neupane P, Pandit A, et al. Genetic contribution of the endothelial nitric oxide synthase gene to high altitude adaptation in sherpas. High Alt Med Biol. 2006;7: 209–220. pmid:16978133
- 23. Bigham AW, Kiyamu M, Leon-Velarde F, Parra EJ, Rivera-Ch M, Shriver MD, et al. Angiotensin-converting enzyme genotype and arterial oxygen saturation at high altitude in Peruvian Quechua. High Alt Med Biol. 2008;9: 167–178. pmid:18578648
- 24. Bigham AW, Mao X, Mei R, Brutsaert T, Wilson MJ, Julian CG, et al. Identifying positive selection candidate loci for high-altitude adaptation in Andean populations. Hum Genomics. 2009;4: 79–90. pmid:20038496
- 25. Wang P, Ha AY, Kidd KK, Koehle MS, Rupert JL. A variant of the endothelial nitric oxide synthase gene (NOS3) associated with AMS susceptibility is less common in the Quechua, a high altitude Native population. High Alt Med Biol. 2010;11: 27–30. pmid:20367485
- 26. Pagani L, Ayub Q, MacArthur DG, Xue Y, Baillie JK, Chen Y, et al. High altitude adaptation in Daghestani populations from the Caucasus. Hum Genet. 2012;131: 423–433. pmid:21904933
- 27. Beall CM, Cavalleri GL, Deng L, Elston RC, Gao Y, Knight J, et al. Natural selection on EPAS1 (HIF2alpha) associated with low hemoglobin concentration in Tibetan highlanders. Proc Natl Acad Sci U S A. 2010;107: 11459–11464. pmid:20534544
- 28. Simonson TS, Yang Y, Huff CD, Yun H, Qin G, Witherspoon DJ, et al. Genetic evidence for high-altitude adaptation in Tibet. Science. 2010;329: 72–75. pmid:20466884
- 29. Yi X, Liang Y, Huerta-Sanchez E, Jin X, Cuo ZX, Pool JE, et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010;329: 75–78. pmid:20595611
- 30. Peng Y, Yang Z, Zhang H, Cui C, Qi X, Luo X, et al. Genetic variations in Tibetan populations and high-altitude adaptation at the Himalayas. Mol Biol Evol. 2011;28: 1075–1081. pmid:21030426
- 31. Xu S, Li S, Yang Y, Tan J, Lou H, Jin W, et al. A genome-wide search for signals of high-altitude adaptation in Tibetans. Mol Biol Evol. 2011;28: 1003–1011. pmid:20961960
- 32. Bigham A, Bauchet M, Pinto D, Mao X, Akey JM, Mei R, et al. Identifying signatures of natural selection in Tibetan and Andean populations using dense genome scan data. PLoS Genet. 2010;6: e1001116. pmid:20838600
- 33. Eichstaedt CA, Antao T, Pagani L, Cardona A, Kivisild T, Mormina M. The Andean adaptive toolkit to counteract high altitude maladaptation: genome-wide and phenotypic analysis of the Collas. PLoS One. 2014;9: e93314. pmid:24686296
- 34. Scheinfeldt LB, Soi S, Thompson S, Ranciaro A, Woldemeskel D, Beggs W, et al. Genetic adaptation to high altitude in the Ethiopian highlands. Genome Biol. 2012;13: R1. pmid:22264333
- 35. Alkorta-Aranburu G, Beall CM, Witonsky DB, Gebremedhin A, Pritchard JK, Di Rienzo A. The genetic architecture of adaptations to high altitude in Ethiopia. PLoS Genet. 2012;8: e1003110. pmid:23236293
- 36. Huerta-Sanchez E, Degiorgio M, Pagani L, Tarekegn A, Ekong R, Antao T, et al. Genetic signatures reveal high-altitude adaptation in a set of ethiopian populations. Mol Biol Evol. 2013;30: 1877–1888. pmid:23666210
- 37. O'Rourke DH, Raff JA. The human genetic history of the Americas: the final frontier. Curr Biol. 2010;20: R202–207. 210.1016/j.cub.2009.1011.1051 pmid:20178768
- 38. Kirov G, Nikolov I, Georgieva L, Moskvina V, Owen MJ, O'Donovan MC. Pooled DNA genotyping on Affymetrix SNP genotyping arrays. BMC Genomics. 2006;7: 27. pmid:16480507
- 39. Docherty SJ, Butcher LM, Schalkwyk LC, Plomin R. Applicability of DNA pools on 500 K SNP microarrays for cost-effective initial screens in genomewide association studies. BMC Genomics. 2007;8: 214. pmid:17610740
- 40. Wilkening S, Chen B, Wirtenberger M, Burwinkel B, Forsti A, Hemminki K, et al. Allelotyping of pooled DNA with 250 K SNP microarrays. BMC Genomics. 2007;8: 77. pmid:17367522
- 41. Jawaid A, Sham P. Impact and quantification of the sources of error in DNA pooling designs. Ann Hum Genet. 2009;73: 118–124. pmid:18945289
- 42. Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, et al. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 2006;16: 1575–1584. pmid:17122084
- 43. Salzano F, Callegari-Jacques SM. South American Indians: A Case Study in Evolution. Oxford: Clarendon Press. 1988.
- 44. Gaya-Vidal M, Moral P, Saenz-Ruales N, Gerbault P, Tonasso L, Villena M, et al. mtDNA and Y-chromosome diversity in Aymaras and Quechuas from Bolivia: different stories and special genetic traits of the Andean Altiplano populations. Am J Phys Anthropol. 2011;145: 215–230. pmid:21469069
- 45. Weir BS, Cockerham CC. Estimating F-Statistics for the Analysis of Population Structure. Evolution. 1984;38.
- 46. Tamm E, Kivisild T, Reidla M, Metspalu M, Smith DG, Mulligan CJ, et al. Beringian standstill and spread of Native American founders. PLoS One. 2007;2: e829. pmid:17786201
- 47. Fagundes NJ, Kanitz R, Eckert R, Valls AC, Bogo MR, Salzano FM, et al. Mitochondrial population genomics supports a single pre-Clovis origin with a coastal route for the peopling of the Americas. Am J Hum Genet. 2008;82: 583–592. pmid:18313026
- 48. Goebel T, Waters MR, O'Rourke DH. The late Pleistocene dispersal of modern humans in the Americas. Science. 2008;319: 1497–1502. pmid:18339930
- 49. Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, et al. Reconstructing Native American population history. Nature. 2012;488: 370–374. pmid:22801491
- 50. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491: 56–65. pmid:23128226
- 51. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123: 585–595. pmid:2513255
- 52. Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics. 2000;155: 1405–1413. pmid:10880498
- 53. Grossman SR, Shlyakhter I, Karlsson EK, Byrne EH, Morales S, Frieden G, et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science. 2010;327: 883–886. pmid:20056855
- 54. Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010;20: 393–402. pmid:20086244
- 55. Voight BF, Kudaravalli S, Wen X, Pritchard JK. A Map of Recent Positive Selection in the Human Genome. PLoS Biology. 2006;4: e72. pmid:16494531
- 56. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507: 455–461. pmid:24670763
- 57. Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;419: 832–837. pmid:12397357
- 58. Moore LG, Niermeyer S, Zamudio S. Human adaptation to high altitude: regional and life-cycle perspectives. Am J Phys Anthropol. 1998;Suppl: 25–64.
- 59. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7: 248–249. pmid:20354512
- 60. Hofmann S, Franke A, Fischer A, Jacobs G, Nothnagel M, Gaede KI, et al. Genome-wide association study identifies ANXA11 as a new susceptibility locus for sarcoidosis. Nat Genet. 2008;40: 1103–1106. pmid:19165924
- 61. Morais A, Lima B, Peixoto M, Melo N, Alves H, Marques JA, et al. Annexin A11 gene polymorphism (R230C variant) and sarcoidosis in a Portuguese population. Tissue Antigens. 2013;82: 186–191. pmid:24032725
- 62. Levin AM, Iannuzzi MC, Montgomery CG, Trudeau S, Datta I, McKeigue P, et al. Association of ANXA11 genetic variation with sarcoidosis in African Americans and European Americans. Genes Immun. 2013;14: 13–18. pmid:23151485
- 63. Kim DK, Cho MH, Hersh CP, Lomas DA, Miller BE, Kong X, et al. Genome-wide association analysis of blood biomarkers in chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2012;186: 1238–1247. pmid:23144326
- 64. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42: D1001–1006. pmid:24316577
- 65. Xu Y, Morse LR, da Silva RA, Odgren PR, Sasaki H, Stashenko P, et al. PAMM: a redox regulatory protein that modulates osteoclast differentiation. Antioxid Redox Signal. 2010;13: 27–37. pmid:19951071
- 66. Pandey P, Pasha MQ. Oxidative stress at high altitude: genotype—phenotype correlations. Advances in Genomics and Genetics. 2014;4: 29–43.
- 67. Schmidt MC, Askew EW, Roberts DE, Prior RL, Ensign WY Jr, Hesslink RE Jr. Oxidative stress in humans training in a cold, moderate altitude environment and their response to a phytochemical antioxidant supplement. Wilderness Environ Med. 2002;13: 94–105. pmid:12092978
- 68. Padhy G, Sethy NK, Ganju L, Bhargava K. Abundance of plasma antioxidant proteins confers tolerance to acute hypobaric hypoxia exposure. High Alt Med Biol. 2013;14: 289–297. pmid:24067188
- 69. Sinha S, Ray US, Tomar OS, Singh SN. Different adaptation patterns of antioxidant system in natives and sojourners at high altitude. Respir Physiol Neurobiol. 2009;167: 255–260. pmid:19454326
- 70. Meer K, Heymans HS, Zijlstra WG. Physical adaptation of children to life at high altitude. Eur J Pediatr. 1995;154: 263–272. pmid:7607274
- 71. Frisancho AR. Human growth and pulmonary function of a highaltitude Peruvian Quechua population. Hum Biol. 1969;91: 365–379.
- 72. Hurtado A. Respiratory adaptation in the Indian natives of the Peruvian Andes. Studies at high altitude. Am J Phys Anthropol. 1932;17: 137–165.
- 73. Frisancho AR. Developmental functional adaptation to high altitude: review. Am J Hum Biol. 2013;25: 151–168. pmid:24065360
- 74. Ferrer-Admetlla A, Liang M, Korneliussen T, Nielsen R. On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol Biol Evol. 2014;31: 1275–1291. pmid:24554778
- 75. Leth-Larsen R, Garred P, Jensenius H, Meschi J, Hartshorn K, Madsen J, et al. A common polymorphism in the SFTPD gene influences assembly, function, and concentration of surfactant protein D. J Immunol. 2005;174: 1532–1538. pmid:15661913
- 76. Lahti M, Lofgren J, Marttila R, Renko M, Klaavuniemi T, Haataja R, et al. Surfactant protein D gene polymorphism associated with severe respiratory syncytial virus infection. Pediatr Res. 2002;51: 696–699. pmid:12032263
- 77. Ishii T, Hagiwara K, Kamio K, Ikeda S, Arai T, Mieno MN, et al. Involvement of surfactant protein D in emphysema revealed by genetic association study. Eur J Hum Genet. 2012;20: 230–235. pmid:21934714
- 78. Ryckman KK, Dagle JM, Kelsey K, Momany AM, Murray JC. Genetic associations of surfactant protein D and angiotensin-converting enzyme with lung disease in preterm neonates. J Perinatol. 2012;32: 349–355. pmid:21960125
- 79. Choi EH, Ehrmantraut M, Foster CB, Moss J, Chanock SJ. Association of common haplotypes of surfactant protein A1 and A2 (SFTPA1 and SFTPA2) genes with severity of lung disease in cystic fibrosis. Pediatr Pulmonol. 2006;41: 255–262. pmid:16429424
- 80. Jack DL, Cole J, Naylor SC, Borrow R, Kaczmarski EB, Klein NJ, et al. Genetic polymorphism of the binding domain of surfactant protein-A2 increases susceptibility to meningococcal disease. Clin Infect Dis. 2006;43: 1426–1433. pmid:17083016
- 81. Herrera-Ramos E, Lopez-Rodriguez M, Ruiz-Hernandez JJ, Horcajada JP, Borderias L, Lerma E, et al. Surfactant protein A genetic variants associate with severe respiratory insufficiency in pandemic influenza A virus infection. Crit Care. 2014;18: R127. pmid:24950659
- 82. Beall CM, Laskowski D, Strohl KP, Soria R, Villena M, Vargas E, et al. Pulmonary nitric oxide in mountain dwellers. Nature. 2001;414: 411–412. pmid:11719794
- 83. Hoit BD, Dalton ND, Erzurum SC, Laskowski D, Strohl KP, Beall CM. Nitric oxide and cardiopulmonary hemodynamics in Tibetan highlanders. J Appl Physiol (1985). 2005;99: 1796–1801. pmid:16024527
- 84. Beall CM. Human adaptability studies at high altitude: research designs and major concepts during fifty years of discovery. Am J Hum Biol. 2013;25: 141–147. 110.1002/ajhb.22355. Epub 22013 Jan 22324. pmid:23349118
- 85. Scheinfeldt LB, Tishkoff SA. Living the high life: high-altitude adaptation. Genome Biol. 2010;11: 133. pmid:20979669
- 86. Beall CM. Andean, Tibetan, and Ethiopian patterns of adaptation to high-altitude hypoxia. Integr Comp Biol. 2006;46: 18–24. pmid:21672719
- 87. Huerta-Sánchez E, Jin X, Asan , Bianba Z, Peter BM, Vinckenbosch N, et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature. 2014;
- 88. Abi-Rached L, Jobin MJ, Kulkarni S, McWhinnie A, Dalva K, Gragert L, et al. The shaping of modern human immune systems by multiregional admixture with archaic humans. Science. 2011;334: 89–94. pmid:21868630
- 89. Vernot B, Akey JM. Resurrecting surviving Neandertal lineages from modern human genomes. Science. 2014;343: 1017–1021. pmid:24476670
- 90. Sankararaman S, Mallick S, Dannemann M, Prufer K, Kelso J, Paabo S, et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507: 354–357. pmid:24476815
- 91. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, et al. A draft sequence of the Neandertal genome. Science. 2010;328: 710–722. pmid:20448178
- 92. Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, Mallick S, et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338: 222–226. pmid:22936568
- 93. Prufer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505: 43–49. pmid:24352235
- 94. Quinque D, Kittler R, Kayser M, Stoneking M, Nasidze I. Evaluation of saliva as a source of human DNA for population and association studies. Anal Biochem. 2006;353: 272–277. pmid:16620753
- 95. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF. New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 2008;18: 830–838. pmid:18385274
- 96. de Filippo C, Barbieri C, Whitten M, Mpoloka SW, Gunnarsdottir ED, Bostoen K, et al. Y-chromosomal variation in sub-Saharan Africa: insights into the history of Niger-Congo groups. Mol Biol Evol. 2011;28: 1255–1269. pmid:21109585
- 97. Herraez DL, Stoneking M. High fractions of exogenous DNA in human buccal samples reduce the quality of large-scale genotyping. Anal Biochem. 2008;383: 329–331. pmid:18804445
- 98. Hodges E, Rooks M, Xuan Z, Bhattacharjee A, Benjamin Gordon D, Brizuela L, et al. Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat Protoc. 2009;4: 960–974. pmid:19478811
- 99. Meyer M, Kircher M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc. 2010;2010: pdb prot5448.
- 100. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26: 589–595. pmid:20080505
- 101. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–2079. pmid:19505943
- 102. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20: 1297–1303. pmid:20644199
- 103. Luisi P, Alvarez-Ponce D, Dall'Olio GM, Sikora M, Bertranpetit J, Laayouni H. Network-level and population genetics analysis of the insulin/TOR signal transduction pathway across human populations. Mol Biol Evol. 2012;29: 1379–1392. pmid:22135191
- 104. Zaykin DV, Zhivotovsky LA, Czika W, Shao S, Wolfinger RD. Combining p-values in large-scale genomics experiments. Pharm Stat. 2007;6: 217–226. pmid:17879330
- 105. Li H, Durbin R. Inference of human population history from individual whole-genome sequences. Nature. 2011;475: 493–496. pmid:21753753
- 106. Ewing G, Hermisson J. MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics. 2010;26: 2064–2065. pmid:20591904