A non-synonymous SNP with the allele frequency correlated with the altitude may contribute to the hypoxia adaptation of Tibetan chicken

The hypoxia adaptation to high altitudes is of considerable interest in the biological sciences. As a breed with adaptability to highland environments, the Tibetan chicken (Gallus gallus domestics), provides a biological model to search for genetic differences between high and lowland chickens. To address mechanisms of hypoxia adaptability at high altitudes for the Tibetan chicken, we focused on the Endothelial PAS domain protein 1 (EPAS1), a key regulatory factor in hypoxia responses. Detected were polymorphisms of EPAS1 exons in 157 Tibetan chickens from 8 populations and 139 lowland chickens from 7 breeds. We then designed 15 pairs of primers to amplify exon sequences by Sanger sequencing methods. Six single nucleotide polymorphisms (SNPs) were detected, including 2 missense mutations (SNP3 rs316126786 and SNP5 rs740389732) and 4 synonymous mutations (SNP1 rs315040213, SNP4 rs739281102, SNP6 rs739010166, and SNP2 rs14330062). There were negative correlations between altitude and mutant allele frequencies for both SNP6 (rs739010166, r = 0.758, p<0.001) and SNP3 (rs316126786, r = 0.844, P<0.001). We also aligned the EPAS1 protein with ortholog proteins from diverse vertebrates and focused that SNP3 (Y333C) was a conserved site among species. Also, SNP3 (Y333C) occurred in a well-defined protein domain Per-AhR-Arnt-Sim (PAS domain). These results imply that SNP3 (Y333C) is the most likely casual mutation for the high-altitude adaption in Tibetan chicken. These variations of EPAS1 provide new insights into the gene’s function.


Introduction
Low oxygen is one of the more severe environmental challenges for animals living in high-altitude regions such as the Tibetan Plateau, where the average altitude is above 4000 m. The partial pressure of oxygen at 4000 m is approximately 50% of that at sea-level, and the hypoxia imposes severe constraints on aerobic metabolism and leading to high-altitude illness [1,2] [3] and the mechanisms of hypoxic adaptation [4]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 DNA extraction, PCR amplification, and genotyping DNA was extracted by the phenol-chloroform method [25]. The high altitude DNA pool was constructed from 50 samples, each containing 100 ng DNA from randomly selected TC. The LC pool was constructed in the same way. These two gene pools were used to detect SNPs.
After blasting chicken EPAS1 mRNA sequence (Ensembl accession ENSGALT00000016253) in the chicken genome, we designed 15 primer pairs ( Table 2) to amplify the 15 exons using Primer V6.0 software [26]. PCR was performed in 25 μL of reaction volume. The ingredient contain 50 ng DNA template, 1×buffer (including 1500 μmol L -1 Mg 2 Cl 2 , 200 μmol L -1 dNTPs, and 1.5 U of Taq DNA polymerase) and 1 μmol L -1 of each primer. Besides, the PCR procedure was as following: initial denaturation for 5 min at 9500, followed by 35 cycles of 95˚C for 30 s; annealing at prescribed annealing temperature for 30 s; and primer extension at 72˚C for 45 s. The final extension was performed at 72˚C for 7 min. PCR products were sequenced on an ABI 3730 DNA sequencer. Then, the SNPs were determined and pairs of 4 primers (P1, P7, P12, and P14) were used for individual genotyping. PCR procedures and systems were the same as above.

Data analysis
The sequences were edited and aligned by DNAstar Software (DNAstar Inc. Madison, WI, USA). Sequence variations were identified using MEAG5 [27]. The complete EPAS1 sequence of Red Jungle Fowl (G.g gallus, Ensemble accession number: ENSGALG00000010005) was used as reference for determining the variable sites of chicken. Hardy-Weinberg equilibrium, pairwise linkage disequilibrium (LD), and association analysis were conducted by Haploview software (version 4.2, http://www.broad.mit.edu/mpg/haploview/). We calculated the nucleotide diversity (Pi) and average number of nucleotide differences (K) for the TC and LC populations by DNAspV5 [28]. The Fixation index (F st ) between the high (!3300m) and low altitude groups ( 600m) were calculated by Arlequin 3.5 [29].
Differences of allele frequencies between TC and LC groups were analyzed using Pearson' Chi-square test, odds ratio (OR), 95% confidence intervals (95% CLs), and MAF (Minor allele frequencies) were calculated. Correlation coefficients between the altitude and frequencies of mutant alleles were calculated. All these parameters were calculated by SPSS software, Version 22 [30] and P < 0.05 was considered as statistical significance.

Protein secondary structure prediction and variable sites analysis
The protein domains of EPAS1 were predicted by the Pfam web service [31]. The EPAS1 ortholog proteins of 16 vertebrates were retrieved from Ensembl [32]. Multiple sequence alignments were performed by Clustal X (V2.1) [33] and displayed by Jalview (v2.8) [34]. To find the best model for a Maximum Likelihood (ML) based tree, we use Prottest V3.4 to compare

SNP genotypes
Genotype frequencies are presented in Table 4    95% CI = 0.120-0.374) were different between TC and LC groups. Results of odds ratios shared that the probability of the ability to adapt hypoxia for chickens with a C allele at SNP1 polymorphic site was more than 1.826 times that of chickens with T allele. The odds ratios of allele A at SNP3 and allele C at SNP6 were less 0.257 and 0.212 fold than chickens with G and T alleles, respectively.
Correlation coefficients between altitudes and allele frequencies of the EPAS1 gene Table 5 contains the observed allelic frequencies of each polymorphic site for each sample from the 15 populations (Allele and genotype frequencies of all SNPs were shown in Tables B, C, and D in S1 File.). We found that SNP2 was unique to TC with the allele frequency of 1.5%. The other five SNPs (SNP1, SNP4 and SNP6, SNP3, and SNP5) were shared by TC and LC. Using Pearson chi-square test and Fisher exact test except the SNP5 (P = 0.077), the other four SNPs whose allele frequencies were significantly different between TC and LC, implying that these four SNPs were associated with high-altitude adaptation. Notably, The non-synonymous substitutions SNP3 (P = 1.94E-06) was significantly different between TC and LC. There were significant negative linear correlations between altitude and the frequency of mutant allele at SNP6 (rs739010166, r = 0.758, p<0.001) and SNP3 (rs316126786, r = 0.844, P<0.001). There was no correlation for the other SNPs. There was a striking change of allele frequency along with the altitude elevation in SNP3. As shown in Fig 3A, SNP6 (Fig 3C).
These results suggested that the frequencies of SNP3 and SNP6 in EPAS1 decreased with the elevation of altitude. The Hardy-Weinberg equilibrium, nucleotide diversity, and population differentiation of the EPAS1 To test whether all of the SNPs at EPAS1 loci were in equilibrium in each population, the Hardy-Weinberg equilibrium (HWE) tests was performed on each SNP locus (Table 6). Except for SNP4 and SNP5, all alleles in TC were in accordance with Hardy-Weinberg equilibrium (P>0.05). In contrast, there was a departure from Hardy-Weinberg equilibrium (P < 0.05) for LC in SNP1, SNP4, SNP5, and SNP6. Notably, SNP3 in both populations was in accordance with Hardy-Weinberg equilibrium. Besides, there were no significant diferences between TC and LC for nucleotide diversity (Pi) and the average number of nucleotide differences (K) ( Table E in S1 File). Then we calculated the LD block of Tibetan chicken and Lowland chicken and we found close linkage between SNP3 and SNP6 in TC ( Figure A in S1 File). Besides, the observed heterozygosity of all SNPs was from 0.03 to 0.44 in TC and from 0 to 0.30 in LC ( Table 6). The minor allele frequencies (MAF) of all mutations were more than 0.005 (Table 5).
To fully understand the population differentiation that occurred between TC and LC, we analyzed the distribution of alleles at the EPAS1 for the high altitude (!3300m) and low altitude groups ( 600m). The Fixation index (F st ) was calculated to evaluate the population differentiation degree. As shown in Table 6, the F st values for SNP1 (0.181, P < 0.001), SNP3 (0.229, P < 0.001), and SNP6 (0.278, P < 0.001) for two groups reflected remarkable population differentiation.
Protein secondary structure prediction and non-synonymous sites analysis SNP3 (Y333C) was located in a well-defined protein domain (PAS domain, Fig 4). Using the crystal structure model of PAS domain for human EPAS1, we found that mutation SNP3 in chicken (Y333C) occurred in a beta sheet which may affect the thermodynamic stability of the domain. To evaluate the its functional, we aligned the mutant EPAS1 protein with its ortholog proteins of diverse vertebrates (Fig 4) and found that the amino acid mutation (Y333C) were conserved among species. By contrast, M765V was not found in other species. Because mutant functional effects showed that Y333C was deleterious, while M765V were tolerated implied that Y333C was the most likely casual mutation for the EPAS1 in LC but not TC.

Discussion
Long-term exposure to hypoxia reduced metabolic activity, retarded development, and increased embryonic mortality in animal fetes [37] and chronic hypoxia decreased growth of developing chick embryos [38,39]. The Tibetan chicken, a Chinese native chicken breed, adapts itself well to the low pressure and high altitude plateau environment [7,40]. Here, we investigated specific SNPs in EPAS1 gene and discuss their possible contributions to the highaltitude adaptation of the Tibetan chicken.
Amino acid replacement at key sites can difference protein function [41]. EPAS1 protein has an important role in regulating transcriptional responses to hypoxia in physiological and pathologic conditions including up-regulating the expression of erythropoietin (EPO) [42], Thus, it is not surprising that Mutations at EPAS1 are associated with hematologic phenotypes [43].
Here, we found 4 synonymous mutations and 2 non-synonymous substitutions which resulted in amino acid mutations of Tyr to Cys at SNP3 and Met to Val at SNP5, respectively. Except for SNP5, the other 5 SNPs were significantly different between TC and LC for genotypic frequencies. Both genotypic and allelic results for SNP1, SNP3, and SNP6 revealed ** represent significant P-value of less than 0.001. 1.2 Ho and He represent the observed heterozygosity and expected heterozygosity.
significant differences between two chicken groups. Except for SNP4 and SNP5, all the alleles in the TC were in accordance with Hardy-Weinberg equilibrium. While for LC there was a significant departure from Hardy-Weinberg equilibrium (P < 0.05) in SNP1, SNP4, SNP5, and SNP6. These results implied an association of SNP1 and SNP6 might be correlate with highaltitude adaptation. Also there is linkage between SNP3 and SNP6 in TC ( Figure A in S1 File).
The Fst values for SNP1, SNP3, and SNP6 of EPAS1 between the high altitude group (HIC!3300m) and low altitude group (LOC 600m) reflected remarkable population differentiation. Overall, both the Fst value and the LD block demonstrate a remarkable population differentiation between the TC and LC. We deduce reasons why the previous study [24] did not find SNP3 are that compared to other SNPs, SNP3 may have low frequency in the TC population and sample size, only 40 Tibetan chickens from 3 different areas were involved. A correlation coefficient is a number that quantifies some type of association between random variables and observed data [44]. Pearson correlation between altitude and allele frequency showed that in TC there was a significant relationship in synonymous mutation SNP6 and non-synonymous mutation SNP3. The frequencies of SNP3 (Y333C) and SNP6 were high in LC but decreased along with the elevation of altitude. The same relationship also found for the EPAS1 gene in cattle [45]. There was an association of double variants in the oxygen degradation domain (ODDD) of EPAS1 in Angus cattle with High-altitude pulmonary hypertension (HAPH). These mutations were prevalent in lowland cattle, but they disappeared for the animals located at high altitudes [45]. Like Angus cattle, the mutation we found was located in a well-defined domain and quite conserved. Human EPAS1 is a basic helix-loop-helix transcription factor that contains a Per-AhR-Arnt-Sim (PAS) domain and shares 48% homology [46,47]. The transcriptional response to hypoxia appears in several physiological and pathologic conditions, including up-regulation in the expression of erythropoietin (EPO) [48]. Mutations at EPAS1 in humans are associated with hematologic phenotypes [43]. HIF proteins are constantly degraded with sufficient oxygen availability, but one released under hypoxia condition. HIFs are protected by inhibition of oxygen-dependent hydroxylation of specific residues in the oxygen-dependent degradation domain under hypoxic condition. This prevents interactions with the von Hippel-Lindau ubiquitin ligase complex and proteasome destruction. HIF2a is found in all human tissues and its effects on downstream cascades include regulation of angioenic factors VEGF and TGFa, and cell permeability and stimulation of erythropoietic and glycolytic protein [46,49,50]. Clinical studies found that several gain-of-function mutations that occurred in the primary hydroxylation site (Pro531) at EPAS1 caused erythroctosis and pulmonary hypertension [51].While, the unique EPAS1 mutation in the Tibetan people were associated with lower hemoglobin concentrations [52,53], suggesting that it played a loss-of-function role to high blood viscosity and cardiovascular disorders. As erythrocytosis is a common symptom of chronic mountain sickness which can lead to high blood viscosity and cardiovascular disorders, a decrease of hemoglobin level may provide a protective mechanism for Tibetan people [43]. Nonetheless, it is worth noting that all EPAS1 variations detected preciously in the Tibetan population are in introns [52,53].
Transgenic mice carrying a G536W variant in EPAS1 developed high right ventricular systolic pressure and medial hypertrophy of pulmonary arteries in addition to erythrocytosis under normoxic condition [54]. The key amino acid mutation G305S was identified at EAPS1 PAS domain in dogs was predicted to be damaging, which is also likely to cause the loss of function of EPAS1 and it is also associated with blood flow resistance [16]. Recently, the missense mutation Q597L was identified in EPAS1 in the Tibetan Cashmere goat, which occurred next to the hypoxia-inducible factor-1 (HIF-1) domain, and was exclusively enriched in the high-altitude population [55]. Thus, HIF2a dysregulation is well recognized in association with pulmonary hypertension, frequency in context of erythrocytosis, with or without severe hypoxemia. Here, we found a non-synonymous mutation SNP3 (Y333C) in PAS domain which is similar to the mutation in dogs and Angus cattle. In order to adapt to the same selection pressure, mutations that produce an adaptive change in one species may preclude possibilities in other species because of differences in genetic background and experiments involving resurrected ancestral proteins have revealed continuous effects of historical substitutions [56]. Although the function of mutation (SNP3 Y333C) in chickens is unknown, based on the previously studies, we inferred that SNP3 (Y333C) might also be harmful to chickens at high altitudes. The lower oxygen and extreme environment selectively swept out mutation SNP3 (Y333C). Taking together, these results indicate that EAPS1 gene may play a key role in the adaptation of the Tibetan chicken to a high-altitude environment at the molecular level. The adaptation patterns however, may differ in human, dog, grey wolf, and Tibetan Cashmere goat.
In conclusion, genetic analysis in the present study for EPAS1 gene in TC and LC populations indicated that the non-synonymous SNP3 (Y333C) may have critical role in the high-altitude adaptation of TC. Although the adaptation patterns may not be the same in humans and dogs, these novel variations of EPAS1 could put new insights into the gene function.
Supporting information S1 File. The LD Block, production performance, allele frequency and nucleotide diversities. Figure A. The LD block of Tibetan chicken. TC means Tibetan chicken. Table A. The partial production performance of Tibetan and 7 lowland chicken breeds. The production performance data of 8 breeds of chicken is based on poultry genetic resources in China. Table B. Allele and genotype frequencies of the SNP1 rs315040213 and SNP2 rs14330062 in the EPAS1. "C" represents the reference allele and "T" represents the mutant allele. Numbers represent allele/genotype frequency, with the figures in brackets representing the number of individuals with each genotype. Table C. Allele and genotype frequencies of the SNP3 rs316126786 and SNP4 rs739281102 in the EPAS1. "A" represents the reference allele and "G" represents the mutant allele. Numbers represent allele/genotype frequency, with the figures in brackets representing the number of individuals with each genotype. Table D. Allele and genotype frequencies of the SNP5 rs740389732 and SNP6 rs739010166 in the EPAS1. "A" represents the reference allele and "G" represents the mutant allele. Numbers represent allele/genotype frequency, with the figures in brackets representing the number of individuals with each genotype. "C" represents the reference allele and "T" represents the mutant allele. Numbers represent allele/genotype frequency, with the figures in brackets representing the number of individuals with each genotype. Table E. Nucleotide diversities of EPAS1 gene in Tibetan chicken and Lowland chicken. 1 Pi is the abbreviation of nucleotide diversity. 2 K is the abbreviation of average number of nucleotide differences. 3 TC means Tibetan chicken. 4 LC means Lowland chicken. (DOCX)