Shared and Unique Signals of High-Altitude Adaptation in Geographically Distinct Tibetan Populations

Recent studies have used a variety of analytical methods to identify genes targeted by selection in high-altitude populations located throughout the Tibetan Plateau. Despite differences in analytic strategies and sample location, hypoxia-related genes, including EPAS1 and EGLN1, were identified in multiple studies. By applying the same analytic methods to genome-wide SNP information used in our previous study of a Tibetan population (n = 31) from the township of Maduo, located in the northeastern corner of the Qinghai-Tibetan Plateau (4200 m), we have identified common targets of natural selection in a second geographically and linguistically distinct Tibetan population (n = 46) in the Tuo Tuo River township (4500 m). Our analyses provide evidence for natural selection based on iHS and XP-EHH signals in both populations at the p<0.02 significance level for EPAS1, EGLN1, HMOX2, and CYP17A1 and for PKLR, HFE, and HBB and HBG2, which have also been reported in other studies. We highlight differences (i.e., stratification and admixture) in the two distinct Tibetan groups examined here and report selection candidate genes common to both groups. These findings should be considered in the prioritization of selection candidate genes in future genetic studies in Tibet.


Introduction
Native highlanders have specific traits that enable their survival despite the stress imposed by decreased oxygen availability at altitude. Several regions of the genome, including those that harbor hypoxia-sensing and -regulated genes, were recently identified as putative targets for high-altitude adaptation in Tibetans [1][2][3][4][5][6][7][8][9][10][11]. Two of these regions contain genes involved in hypoxia sensing and response: the EPAS1 gene, which encodes the hypoxia inducible factor (HIF)-2a subunit, and the EGLN1gene, which encodes a proline hydroxylase, PHD2, that regulates hypoxia-induced factors in an oxygen-dependent manner. Variation in the EPAS1 region is associated with hemoglobin concentration ([Hb]) in Tibetan populations examined in two separate studies [2,10], and haplotypes at loci containing EGLN1 and PPARA (peroxisome proliferator activated receptor alpha) are associated with [Hb] in Tibetans from the Maduo township in the northeast section of the Qinghai-Tibetan Plateau [7] and in a sample of Tibetans located throughout the Plateau [12]. EPAS1 and PPARA haplotypes in Tibetans from the Tuo Tuo River examined here are associated with elevated serum lactate and free fatty acid concentrations, respectively, providing further support for important biological roles [13]. In addition to these phenotypeassociated selection targets, many other genes have been reported as strong targets of selection in other studies and are likely associated with additional adaptive traits in Tibetan populations [6]. Here we highlight selection signals identified at the p,0.02 empirical significance level in previously examined Maduo [7] and Tuo Tuo River [13] Tibetan populations using the same analytical strategies.
Tibetans inhabit a vast area of the Qinghai-Tibetan Plateau, which spans approximately 1.5 million square km (0.96 million square miles). At least three major Tibetan dialects are spoken among these geographically distinct groups (Amdo, U-Tsang, and Kham in the northeastern, southwestern, and southeastern regions of the Plateau, respectively [14]), suggesting potential genetic isolation among different Tibetan populations. The demographic history of these populations is, however, highly debated. Archaeological evidence suggests that the ancestors of present-day Tibetan groups migrated to the Qinghai-Tibetan Plateau at various times, ranging from 25,000 to 5,000 years ago [15]. Patterns of genome-wide SNPs support a single-route migration into this region [8], although analyses of mitochondrial DNA variation suggest that different migrations, dating to pre-and post-Last Glacial Maximum, contributed to genetic variation observed among present-day inhabitants [16,17]. Neighboring populations have also mixed with various Tibetan groups located on the periphery of the plateau, contributing to present-day levels of variation in these regions [8].
While various studies have focused on a few select candidate gene regions [1,4,5], many genetic loci have probably been targeted by selection in high-altitude populations (see review [6]). In order to evaluate whether our previously reported selection targets [7] are significant in a different Tibetan group [13], we carried out genome-wide SNP-based selection scans in a linguistically distinct population from a different region of the Qinghai-Tibetan Plateau. Several of the same haplotypes exhibit extreme signals of selection in this second population, highlighting genomic regions that warrant further investigation in genetic studies of high-altitude adaptation in Tibet. We also determined which regions with extreme selection signals contain hypoxiaassociated microRNAs. We report differences in the prevalence of admixture and prevalence of mitochondrial haplogroups in two distinct Tibetan groups, highlighting the value of studying different Tibetan groups and accounting for varied genetic backgrounds.

Stratification of Tibetan populations
We used pair-wise allele sharing distances to calculate F ST [18] among Tibetan populations and neighboring Asian groups. The two Tibetan populations exhibit the least amount of genetic differentiation from each other (F ST = 0.004) compared to other Asian populations (HapMap Chinese [CHB] and Japanese [JPT] [19], and Buryat and Deedu Mongolians [20,21]. Tuo Tuo River and Maduo Tibetans are similarly differentiated from the Hap Map CHB (F ST = 0.014 and 0.012, respectively), which are commonly used for genetic comparisons (Table S1).
To determine the extent of population structure among Tibetans and neighboring populations, we used the program Admixture [22] to examine the group-specific proportion of each individual's genome ( Figure 1). Each of the Tibetan populations, as well as Buryat Mongolians, Deedu Mongolians, HapMap CHB, and HapMap JPT populations, form a distinct group. As previously reported, Maduo Tibetans exhibit components associated with Han Chinese (CHB), suggesting either recent admixture or may reflect different founder populations in this group [7]. Individuals from the Tuo Tuo River population do not, however, exhibit this signal, likely reflecting the more insulated geographic location of this group (Figure 1).

High-altitude selection candidate genes identified in two distinct Tibetan populations
We employed the same analytic strategies (iHS [23] and XP-EHH [24] statistics) used in our previous study of a Tibetan highaltitude adaptation to identify mutual targets of selection in a second linguistically distinct Tibetan population. By applying the same analytical methods in both populations, we provide further evidence for positive selection in each of two distinct Tibetan populations at the p,0.02 significance level for EPAS1, EGLN1, HMOX2, and CYP17A1 gene regions (Table 1) and further support for adaptation involving PKLR, HFE, and HBB and HBG2 genes identified in other recent studies. The first population described [7] is a group of Qinghai-Tibetans from Maduo who speak the Amdo Tibetan dialect (one of the three major Tibetan dialects spoken on the Qinghai-Tibetan Plateau); the second population examined, who speak the Kham dialect, are from the Tuo Tuo River area [13] (Figure 1). In order to identify high-altitude specific selection candidates, genomic regions with iHS significance levels at p,0.02 identified in our other Asian populations were excluded from analysis as described previously [7]. Of the selection candidate genes reported in our previous study of Maduo Tibetans [7], seven were identified at the p,0.02 significance level in Tuo Tuo River Tibetans and/or have been reported as strong selection candidates in other studies of Tibetan adaptation to altitude (Table 1).

Genotype-phenotype analysis of hemoglobin concentration
The putatively adaptive EPAS1 genomic region is associated with [Hb] in two independent studies of high-altitude adaptation in Tibetans [2,10]. While an extreme signal of selection near the EPAS1 gene was detected in both the Maduo and Tuo Tuo River Tibetan populations, the EPAS1 haplotype did not exhibit an association with [Hb] in Maduo Tibetans as previously reported [7], nor was there a significant association in Tuo Tuo River Tibetans (n = 36) examined here (Table S3; Figure S1). It is likely that our modest sample size yields reduced power to detect phenotype-genotype associations ( Figure S2).
In our previous study of Maduo Tibetans, we identified associations between EGLN1 and PPARA selected regions and [Hb] [7]. There is no association between the EGLN1 and PPARA haplotypes and [Hb] in the 36 Tuo Tuo River Tibetans examined here (Table S3; Figure S1). We speculate that genetic heterogeneity may influence genotype-phenotype relationships as a result of the genetic admixture that was detected in Maduo but not Tuo Tuo River Tibetans ( Figure 2). It will be important to test, in a larger sample, whether admixture differences underlie reduced power to detect a signal in this modest sample ( Figure S2).

Non-protein coding regions of the genome highlighted in both Tibetan groups
In order to determine whether non-genic microRNAs not previously examined as a priori candidates are contained within our top selection candidates, we examined the intersection between the top two percent of iHS and XP-EHH selection candidate regions and genomic locations that contain any of 54 hypoxia-related miRNAs [25]. Interestingly, the PPARA selection candidate region identified in our previous study contains a hypoxia-associated miRNA (Table S2) that may be involved in high-altitude response, although this region was not identified as a top selection candidate in the Tuo Tuo River Tibetans. A recent report of adaptation in Ethiopian highlanders also reports an extreme signal of selection near PPARA [26], although it is unclear whether the hypoxiarelated miRNA in this region is involved in adaptation of either Tibetan or Ethiopian highland groups.

Mitochondrial DNA analyses highlight variation in Maduo and Tuo Tuo River Tibetan populations
Entire mtDNA control region sequence and genotyping of present-day inhabitants of the Tibetan Plateau indicate distinct patterns of variation among Tibetan groups [16,27]. In order to determine whether the two Tibetan populations exhibit differences in mitochondrial DNA (mtDNA) haplogroups, we genotyped 12 haplotype-defining SNPs in18 and 40 individuals from Maduo and Tuo Tuo River (Figure 1), respectively. Consistent with previously reported frequencies for Tibeto-Burman groups, the most common major mtDNA haplogroup is M (94% and 67.5% for Maduo and TTR, respectively), although a small proportion of haplogroup N is also detected in our samples. M9, which is found at elevated frequency at high-altitude inhabitants in Tibet and India [28], is predominant in both Maduo and Tuo Tuo River populations ( Table 2). The sub-M haplogroups D4, M9E, and M8 are most prevalent among Maduo Tibetans, whereas M9E and G are the most common mtDNA haplogroups within the TTR sample. Haplogroup A, which exhibits greater diversity in the southern region of the Qinghai-Tibetan Plateau [17], is detected among Tibetan inhabitants of the Tuo Tuo River area (8%) and Maduo Tibetans who reside further north (7%). Haplogroup B, a major group common among Native American populations, is also present in Tuo Tuo River Tibetans (TTR: 10%). The power to detect mtDNA variants at marginal frequencies (,10%) is limited in our modest sample of mtDNA from the Maduo population, and additional sample collection will be necessary to place these samples in the context of global mtDNA variation and determine precise genetic relationships among these populations.

Discussion
Recent reports of high-altitude adaptation in Tibetans are based on a range of analytical methods applied to different Tibetan populations located throughout the Qinghai-Tibetan Plateau [1,[4][5][6]. While some of the selection candidate genes reported by these studies are the same, there are also selection targets unique to each study. This could be due in part to differences in analytical methods, which range from sequence-based examination of allelefrequency differences between Tibetan and non-Tibetan groups to genome-wide analyses of extended SNP haplotype variation [1,[4][5][6].
Highly differentiated SNPs in the EPAS1 gene region are related to relatively lower [Hb] in two independent studies of high-altitude adaptation in Tibet [2,10], and recent sequencing efforts have identified two variants in the first exon of EGLN1 that are highly differentiated in Tibetans [12,29]. Another study of the EGLN1 region shows that two polymorphisms within the first intron are found at elevated frequency (71%) in a high-altitude population from India and are associated with EGLN1 expression and highaltitude pulmonary edema (HAPE) in this population [30]. These SNPs are in complete linkage disequilibrium (r 2 = 1.0) with the putatively adaptive selected haplotypes identified in both of the Tibetan populations examined here, providing additional evidence for an adaptive role of EGLN1 across the Qinghai-Tibetan Plateau. Interestingly, EGLN1 was identified as a target of selection in Andean highlanders, although it is unclear whether the putatively adaptive variants are the same as those reported in Tibetan highlanders [3,31]. A study of highland Daghestani populations also indicates highly differentiated intronic SNPs in a wellconserved region of EGLN1 [32].
While EPAS1 [2,3,7,[9][10][11], EGLN1 [3,7,[9][10][11]31], and PPARA [7,26] have been identified as selection candidates in one or more studies of altitude adaptation and/or are associated with Tibetan putatively adaptive phenotypes, several other selection candidates have also been identified in multiple studies. The HMOX2 gene, a heme oxygenase involved in HIF-independent hypoxia sensing, was a top selection candidate in our Maduo and Tuo Tuo River populations and was also reported as a selection candidate in a pooled sample of 50 Tibetan samples collected from various regions of the Qinghai-Tibetan Plateau [11]. Furthermore, PKLR and HBB/HBG2, respectively identified in the Tuo Tuo River and Maduo Tibetan populations at the p,0.02 level, were also reported as top selection candidates in an independent examination of genetic variation in 50 Tibetan exomes [10]. Variants in the CYP17A1 gene, contained within a top selection candidate region identified in both Maduo and Tuo Tuo River populations, are associated with hypertension in European and Asian populations [33][34][35]. While these selection candidates have not been associated with the phenotypes measured thus far, they should be considered as strong candidates for future studies of genetic and physiological adaptations.
Considering the lack of differentiation detected through analysis of the protein-coding regions of Han Chinese and Tibetan samples [10], it is also possible that many genetic targets of selection are in non-coding, regulatory regions of the genome. Our analyses suggest natural selection on a miRNA near the PPARA gene, highlighting regulatory variation as a potential factor underlying adaptive advantages within Tibetan genomes.
Differences among studies published to date could also result from variation in the genetic background of Tibetan groups. Such factors could influence the extent to which selection signals are captured (the early stages of a selection sweep versus those that are fixed or nearly fixed in the population) or the potential for selection events to occur in different groups. In order to better understand differences in Tibetan adaptation to altitude, it will be necessary to fully characterize population histories in various Tibetan groups, determine the precise functional variants and timing of selection events, and evaluate additional phenotypic differences related to adaptive functional variants.

Conclusions
Selection and association signals identified in Tibetan populations may be influenced by the demographic history of these groups and by the choice of analytic methods. Despite genetic heterogeneity between Tibetan groups as shown here, several putatively adaptive genetic variants are common to Maduo and Tuo Tuo River Tibetans. Further genetic and phenotypic characterizations of physiological traits (e.g., oxygen transport, maternal/fetal responses during pregnancy) are required to determine if the remaining selection targets highlighted here are of biological significance, whether they are related to or The

DNA sample collection
We extracted DNA from whole blood samples obtained from 85 highland natives (non-smokers, no chronic diseases) residing in the Tuo Tuo River township in Qinghai Province (,4,500 m).

Ethics Statement
Participants provided written agreement as indicated by a signature or mark on a sheet of paper that described the study in their native language. Consent was obtained for all participants, and this study was approved by the Institutional Review Board at the High Altitude Medical Research Institute (Xining, Qinghai, People's Republic of China).

SNP genotyping
We employed the Affymetrix 6.0 SNP Array technology (.900,000 SNPs) to genotype 70 DNA samples at Capital Bio Corporation (Beijing, China). Default parameters for the Birdseed algorithm (version 2) were used to determine genotypes for all samples (Affymetrix, Santa Clara, CA, USA). Genotypic data were analyzed using the Affymetrix Genotyping Console 3.1 (Affymetrix) and included all autosomes but excluded the X and Y chromosomes and the mitochondrial genome.

Estimates of relatedness
We attempted to exclude all first-degree relatives who visited the clinic from our study. We used the ERSA software package [36] to determine genetic relatedness among all individuals examined and excluded one member of any pair exhibiting relatedness closer than second cousins. Based on these criteria, a total of 46 unrelated individuals were included in the analyses. The genotype data of the 46 individuals are available at http://jorde-lab.genetics.utah. edu/ under published data.

Principal components analysis
We performed principal components analysis on genetic distances as previously reported [21]. This analysis included two Tibetan populations from distinct regions and other Asian groups such as the HapMap CHB and JPT populations (CHB = Chinese in Beijing, China; JPT = Japanese in Tokyo, Japan). The CEU (U.S. Utah residents with ancestry from northern and western Europe) and YRI (YRI = Yoruba in Ibadan, Nigeria) HapMap populations provide context for the patterns of variation observed among these populations [19].

Functional candidate gene list
We had previously generated a list of genes likely related to high-altitude adaptation based on Gene Ontology and Panther Pathway categories as described in [7] and examined the intersection between these candidates and selection candidate genes identified at the p,0.02 level. Potential candidate genes identified in the mitochondrial genome and on the X chromosome were not considered for this study.

Admixture analysis
A model-based algorithm implemented in ADMIXTURE [22] was used to determine the genetic ancestries of each individual in a given number of populations without using information about population designation. To eliminate the effects of SNPs that are in linkage disequilibrium (r 2 = 1.0), we first filtered out SNPs that had r 2 .0.2 within 100 kb using PLINK [37], as recommended by the authors of ADMIXTURE.

Selection analyses
We performed the iHS [23] and XP-EHH [24] tests of selection on phased data estimated by the Beagle software package as previously described [7]. These tests are based on extended haplotype homozygosity and measure the reduction in haplotype diversity based on the probability that, as distance from a focal SNP increases, two extended haplotypes are the same. XP-EHH compares across populations (in this case, Tuo Tuo River Tibetans and HapMap Han Chinese and Japanese); iHS compares withinpopulation profiles based on the focal SNP's ancestral state (derived or ancestral).
We focused on the selection candidate genes contained within 200 kb regions significant at the p,0.02 level in either test. We further excluded regions where the iHS test was significant at this level in neighboring populations as previously described [7].

Phenotype collection
Hemoglobin concentration, hematocrit, and percent oxygen saturation were determined from venous blood samples using the Mindray Hematology Analyzer (BC-2300, Shenzhen, People's Republic of China) and the Pulse Oximeter (Ohmeda 3700 Pulse Oximeter, Datex-Ohmeda, Boulder, Colorado, USA), respectively.

Genotype-phenotype association
We identified the putatively advantageous haplotypes as previously described [7] and tested whether the three-SNP allele haplotype exhibiting the most extreme iHS scores within each 200 kb genomic region were associated with [Hb] in each population and both populations combined. Stepwise linear regression (MATLAB R2010a) was used to detect significant relationships between these genotypes and [Hb].  Figure S2 Statistical power to detect an association of [Hb] with a haplotype. Simulated data sets were constructed with varying sample size (n = 30-500), assuming that the putatively selected haplotype at one locus decreases [Hb] by e g/dl when present in two copies and e/2 if present in one copy (additive model, e = 0.5, 1.0, 2.0).
[Hb] was simulated as a normallydistributed variable with mean 19.6 and standard deviation of 1.6 g/dl, as observed in the Tuo Tuo River sample, and the effect of adaptive haplotype copies was added to that variate for each individual. The frequency f of the adaptive haplotype was set at 0.65, 0.75 or 0.85, per the legend. To mirror the actual tests performed, haplotypes with no effect on [Hb] were simulated for two additional loci (haplotype frequency of 0.65 for both). Genotypes were assigned in Hardy-Weinberg equilibrium. Ages were assigned from a normal distribution, mean 37 years and standard deviation 11.5, then truncated to the range of 18-68, mirroring the observed distribution. Sex was assigned randomly with a 50/50 ratio. Multiple stepwise linear regression was performed using the five simulated predictors: age, sex and haplotype copies at three loci (as used in Table S3). Power to detect a significant association of [Hb] with the simulated adaptive haplotype was estimated as the fraction of 1000 iterations for each parameter set that yielded a significant result at the alpha = 0.5 level. Effect size e has the largest impact on statistical power.
Haplotype frequency has a modest influence (Tuo Tuo River EGLN1, EPAS1, and PPARA frequencies = 0.68, 0.81, and 0.77, respectively). Considering our modest sample size, it will be necessary to collect more data from the Tuo Tuo River population to achieve greater power to detect genotype-phenotype associations. (TIF)