Allele polymorphism and haplotype diversity of HLA-A, -B and -DRB1 loci in sequence-based typing for Chinese Uyghur ethnic group.

Background Previous studies indicate that the frequency distributions of HLA alleles and haplotypes vary from one ethnic group to another or between the members of the same ethnic group living in different geographic areas. It is necessary and meaningful to study the high-resolution allelic and haplotypic distributions of HLA loci in different groups. Methodology/Principal Findings High-resolution HLA typing for the Uyghur ethnic minority group using polymerase chain reaction-sequence-based-typing method was first reported. HLA-A, -B and -DRB1 allelic distributions were determined in 104 unrelated healthy Uyghur individuals and haplotypic frequencies and linkage disequilibrium parameters for HLA loci were estimated using the maximum-likelihood method. A total of 35 HLA-A, 51 HLA-B and 33 HLA-DRB1 alleles were identified at the four-digit level in the population. High frequency alleles were HLA-A*1101 (13.46%), A*0201 (12.50%), A*0301 (10.10%); HLA-B*5101(8.17%), B*3501(6.73%), B*5001 (6.25%); HLA-DRB1*0701 (16.35%), DRB1*1501 (8.65%) and DRB1*0301 (7.69%). The two-locus haplotypes at the highest frequency were HLA-A*3001-B*1302 (2.88%), A*2402-B*5101 (2.86%); HLA-B*5001-DRB1*0701 (4.14%) and B*0702-DRB1*1501 (3.37%). The three-locus haplotype at the highest frequency was HLA-A*3001-B*1302-DRB1*0701(2.40%). Significantly high linkage disequilibrium was observed in six two-locus haplotypes, with their corresponding relative linkage disequilibrium parameters equal to 1. Neighbor-joining phylogenetic tree between the Uyghur group and other previously reported populations was constructed on the basis of standard genetic distances among the populations calculated using the four-digit sequence-level allelic frequencies at HLA-A, HLA-B and HLA-DRB1 loci. The phylogenetic analyses reveal that the Uyghur group belongs to the northwestern Chinese populations and is most closely related to the Xibe group, and then to Kirgiz, Hui, Mongolian and Northern Han. Conclusions/Significance The present findings could be useful to elucidate the genetic background of the population and to provide valuable data for HLA matching in clinical bone marrow transplantation, HLA-linked disease-association studies, population genetics, human identification and paternity tests in forensic sciences.


Introduction
Almost all the Uyghurs in China live in Xinjiang Uyghur Autonomous Region. The region, by far the biggest of the country's regions and provinces, covers more than 1,709,400 square kilometers or approximately one sixth of China's total landmass. Although Han, Kazak, Hui, Mongolian, Kirgiz, Tajik, Xibe, Ozbek, Manchu, Daur, Tatar and Russian people also live in Xinjiang, the Uyghur, who believe in Islam, are the largest ethnic group there (http://www.fmprc.gov.cn/eng/ljzg/3584/ t17921.htm). The Uyghur ethnic minority has its own language and alphabet. The Uyghur language, formerly known as Eastern Turki, belongs to the Uyghur Turkic branch of the Turkic language family, which is controversially a branch of the Altaic language family. The Uyghurs have two written languages, with one based on Arabian letters and the other on Latin letters [1,2].
The human major histocompatibility complex (MHC), also called human leukocyte antigen (HLA), is located at chromosome 6p21.31 [3]. HLA is the most gene-dense region and plays an important role in the generation of immune responses. According to the IMGT (the international ImMunoGeneTics information project)/HLA database (http://www.ebi.ac.uk/imgt/hla/stats. html, June 18, 2010), a total of 3391 alleles, including 1001 HLA-A, 1605 HLA-B and 785 HLA-DRB1 alleles, have been identified at HLA class I and class II loci in the world, which indicates that the HLA system constitutes the most complex and highly polymorphic genetic system in the human genome that has ever been discovered. The previously published population data [4][5][6][7][8][9][10][11] have revealed that the frequency distributions of HLA alleles and haplotypes vary from one ethnic group to another or between the members of the same ethnic group living in different geographic areas. Furthermore, each ethnic group is characterized by a unique pattern of high diversity genetic linkage disequilibria (LD) among HLA loci [6]. Therefore, the extensive allelic polymorphisms and the linkage disequilibrium among different HLA loci in different populations are usually used as highly polymorphic genetic markers in anthropology studies. Genetic distance calculation, cluster analysis and principal component analysis on the basis of the allelic frequencies at HLA loci in different populations have become valuable tools to study the genetic relationships among different ethnic groups as well as the origin, evolution and migration of the populations [12,13].
Studies have been done on the genetic polymorphisms of HLA loci in the Uyghur population using low-middle resolution techniques. Yan CX et al. analyzed the HLA-A locus in Chinese Uyghur population by PCR amplification using sequence-specific oligonucleotide probe (PCR-SSOP) [14]; Mizuki N et al. studied the HLA class II (DRB1, DQA1, DQB1 and DPB1) genotyping in a Uyghur population in the Silk Route of Northwest China using the polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) method [15]; Xu MY et al. investigated the low-resolution HLA-B locus polymorphisms in Xinjiang Uyghur ethnic group using the polymerase chain reactionsequence specific primers (PCR-SSP) method [16]. However, those researchers studied the genetic polymorphisms of either the HLA class I or the class II in the Uyghur population, which means that their studies provided limited genetic information of the population. No high-resolution data using HLA class I and II loci of the Uyghur ethnic group in the same region are available now. In the present study, we investigated the allelic distributions at HLA-A, -B and -DRB1 loci, calculated the linkage disequilibrium parameters of two-or three-locus haplotypes, estimated the Haplotypic frequencies of HLA-A, -B and -DRB1 loci, and constructed a phylogenetic tree based on the allelic frequencies of HLA loci of the Uyghur ethnic minority group in Yining city, Xinjiang Uyghur Autonomous Region of China, using the polymerase chain reaction sequence-based typing (PCR-SBT) approach. The study is expected to enrich the knowledge of HLA genetic polymorphisms in Chinese ethnic groups and to further the understanding of the genetic relationships between Uyghur and other ethnic groups.

Hardy-Weinberg Equilibrium tests of HLA-A, -B and -DRB1 loci
Polymorphisms of HLA-A, -B and -DRB1 loci in the Uyghur ethnic minority of Chinese Xinjiang Uyghur Autonomous Region were investigated using the SBT method. The P values for Hardy-Weinberg equilibrium tests of HLA-A, -B and -DRB1 loci were 0.5785, 0.9696 and 0.4242, respectively, which indicate that the HLA genotypic frequency distributions in the Uyghur ethnic minority were consistent with the Hardy-Weinberg equilibrium at the three HLA loci.

Allelic frequency distributions at HLA-A, -B and -DRB1 loci
The allelic frequencies at HLA-A, -B and -DRB1 loci obtained by high-resolution DNA typing of 104 unrelated healthy Uyghur individuals are summarized in Table 1. Thirty-five distinct alleles were detected at HLA-A locus in this population. The allelic frequencies of HLA-A*2, A*11, A*24 and A*33 groups accounted for 17.78%, 16.34%, 11.05% and 9.62% of the total, respectively. The HLA-A*2 group was found to be the most diverse allelic family at HLA-A locus and consisted of nine alleles: A*0201, A*0202, A*0203, A*0205, A*0206, A*0211, A*0214, A*0222 and A*0236. The most common subtype HLA-A*1101 allele was seen at a frequency of 13.46%; the allelic frequencies of the next four most common alleles, A*0201, A*0301, A*3301 and A*2402, were 12.50%, 10.10%, 9.62% and 9.13%, respectively. The five most common alleles have a cumulative frequency of 54.81%.

HLA haplotypic frequencies and linkage disequilibriums
A total of 133 HLA A-B haplotypes, 118 HLA B-DRB1 haplotypes and 173 HLA A-B-DRB1 haplotypes were estimated using the expectation maximization (EM) algorithm. The estimated haplotypes were considered statistical significance only when their frequencies were greater than or equal to 2/3N (N: the sample size) [17]. Only 41 HLA A-B haplotypes, 37 HLA B-DRB1 haplotypes and 22 HLA A-B-DRB1 haplotypes had a haplotypic frequency higher than 0.64% and were thus considered statistically significant. The significant haplotypes of the three groups had a cumulative haplotypic frequency of 55.48%, 54.93% and 27.4%, respectively.
The haplotypic frequencies of the estimated significant HLA haplotypes (the haplotypic frequencies $1.00%) are summarized in Table 2. Three HLA-A-B haplotypes had a frequency greater than 2%, i.e., HLA-A*3001-B*1302 (2.88%), A*2402-B*5101 (2.86%) and A*3201-B*3501 (2.  Table 3 shows the values of haplotypic frequency (HF), absolute linkage disequilibrium (ALD), maximal linkage disequilibrium (MLD) and relative linkage disequilibrium parameter (RLD) in the population, which have been commonly used in the evaluation of the linkage disequilibrium (LD) in population genetics. HLA-A*0201, A*0205 and A*2501 alleles were found to be tightly associated with HLA-B*4006, B*5001 and B*1801, respectively, with their relative linkage disequilibrium values equal to 1. The HLA-A*2301, A*3001 and A*3604 alleles were also found to have  Construction of a phylogenetic tree The phylogenetic tree in Figure 1 was constructed using the allelic frequencies at HLA-B locus in the Xinjiang Uyghur population and its neighboring ethnic groups. The neighboring ethnic groups used in the phylogenetic tree construction were Miao, Bouyei and Shui ethnic minorities [6], Jinuo and Wa populations [18], Maonan people [19], Yi ethnic minority [13], Tujia nationality [20], Dai population [21], Tibetan ethnic   [24], Han from Southern China [25], Han population in Northern China [26], Taiwanese [27], Singapore Chinese [28], Javanese [29], Vietnamese population(the Kinh population in Vietnam) [30], Korean population [31] and Japanese population [32]. The three main clusters of populations obtained are: (1) the populations living in Korea, Japan and the northwestern of China,

Forensic parameters
The values of the major forensic parameters for HLA-A, -B, and -DRB1 loci in the Uyghur population were listed in Table 4

Discussion
With the continuous development of economy and society and the progress in human culture, inter-ethnic intermarriages, especially the marriage between national minorities and the Han population have markedly increased, resulting in the assimilation and the decrease in ethnic minority populations. In addition, the ethnic minority populations previously isolated in remote areas are gradually migrating to the civilized areas in pursuit of a better life. These people may also marry other ethnic people. In view of these facts, the number of pure blooded ethnic minorities decrease rapidly. Therefore, preservation and study of genetic information resources of pure blooded ethnic minorities have been a priority on the research agenda. The authors have collected and stored blood samples from more than ten national minorities and have studied the population genetics of HLA loci and the short tandem repeats (STR) of Y chromosome and autosome in these national minorities, including the genetic polymorphisms of Y chromosome and autosome STR in the Uyghur ethnic minority in Xinjiang Uyghur Autonomous Region, China [33,34]. The authors have also studied the HLA polymorphisms (low resolution) of the Mongol ethnic group in Inner Mongolia [35], the Hui population in Ningxia Hui autonomous region [36], and the Han population in Guanzhong region of the Shaanxi province [12] by using the SSO and SSP methods, and the HLA polymorphisms (high resolution) of Yi population in Yunnan [13] and Han population in Beijing by using the SBT method [37]. In this study, the distributions of HLA class I and class II allelic and haplotypic frequencies in 104 unrelated Uyghur individuals living in Yining city of the Xinjiang Uyghur Autonomous Region, China were analyzed for the first time by the high-resolution PCR-SBT method.
The HLA-DRB1 locus exhibited a high degree of polymorphism in the Uyghur population. The HLA-DRB1*0701, DRB1*1501, DRB1*0301, DRB1*1301, DRB1*0401 and DRB1*1502 were predominant alleles in the HLA-DRB1 locus in the population. The results were similar to frequency distributions of DRB1*0701 (16.7%) and DRB1*0301 (14.0%) in the Uygur population in the Silk Route of Northwest China [15] and DRB1*0301 (13.1%) and DRB1*0701(10.7%) in the Kazak population in the Silk Route of Northwest China [45]. These HLA data indicate that the Uyghur ethnic group shows a relatively close genetic relationship to the Kazak population inhabiting the same area. The HLA-DRB1*0701 was found to be the most frequent allele in Uyghur ethnic group, as well as in some other populations, such as Han population in Beijing, Southern Han, Korean population, the population on Madeira Island (Portugal), Caucasian population on Northern Ireland [46]. However, it is rare or less common in Dai, Jinuo, Wa, Maonan, Drung, Naxi, Miao and Yao ethnic groups in the southwest of China. HLA-DRB1*1501 and DRB1*0901, two high frequency alleles in the Chinese population, exist in most ethnic groups in the north and south of China, with an average allelic frequency of more than  In all, the distributions of these haplotypes further suggest that the Uyghur population was a characteristic northwestern Chinese population. The haplotypes in this study were estimated using EM algorithm, but the haplotypes presented were not based on family data; therefore, we need to acknowledge the potential errors inherent to haplotype estimation methods [47].
The phylogenetic tree reveals that the Uyghur population belonged to the northwestern Chinese populations and was most closely related to Xibe group, and then to Kirgiz, Hui, Mongolian, and Northern Han. The reason that HLA-B locus was chosen for the phylogenetic tree construction was that HLA-B locus is highly polymorphic and that high-resolution HLA-B data can be obtained from a large number of comparable populations. The similar clustering results were obtained using the single HLA-A or HLA-DRB1 locus or HLA A-B-DRB1 haplotype. The close relationship of the Uyghur population with these groups may be partly explained by its history. The origin of the Uyghur population can be traced back to the Dingling nomads in the third century BC. The majority of the Uyghurs moved to the Western Region (present-day Xinjiang area) and some went to the Tufan principality in western Gansu Province after the mid-ninth century. The Uyghurs who settled in the Western Region intermarried with the Han people in Southern Xinjiang and Tibetan, Qidan and Mongol tribes, and evolved into the group now known as the Uyghurs.
Genetic distance here refers to the genetic divergence between populations within a species. A small genetic distance indicates a close genetic relationship between two populations whereas a large genetic distance indicates a distant genetic relationship [48]. The best explanation for the close genetic relationship of the Uyghur population with Kirgiz and Xibe ethnic groups is that these populations live in the same region and that gene flows occur at a significantly high level among the populations. These results agree with those of some previous studies. Genetic landscape through 14 single nucleotide polymorphisms (SNP) and 12 Y-STR loci shows that the Uyghur is closer to the Han Chinese and Mongolian population [49]. The dendrogram based on the allelic frequencies of the four variable number of tandem repeats (VNTR) and one STR loci reveals that Uyghur, Hui, Northern Han and Japanese form one cluster [50]. Yu MS et al. analyzed Mitochondrial DNA polymorphism using RFLP and found that the Uyghur was relatively close to the Han and the Hui populations [51]. The genetic relationships between Uyghur and Xibe [52,53], Mongolian [53,54], Hui [53,55] and Tibetan [56] were also supported by Y-chromosome STR variation analysis.
The polymorphisms of genetic markers can be evaluated by some forensic parameters such as HO, HE, PD, PIC and PPE. A higher heterozygosity means that more allele diversity exists and therefore, there is less chance of a random sample matching. A locus is considered highly polymorphic when its PIC is higher than 0.5 [57]. PD and PPE are indicators for discrimination capability of a genetic marker and the PD and PPE values of a marker with high polymorphism are normally higher than 0.8 and 0.5, respectively [13]. Forensic parameters HO, HE, PD, PIC and PPE in the Uyghur population were higher than 0.82. The combined probability of exclusion, power of discrimination, probability of matching value for HLA-A, -B and -DRB1 loci were 0.998199, 0.999994 and 5.27610 26 , respectively, which suggest that allelic frequencies at HLA loci were highly polymorphic in the study population and that HLA loci could be applied to personal identification and paternity testing in forensic science. Take a paternity testing case in forensic science for example. When one or two STR loci violate the genetic principle in paternity testing and exclude the alleged father or mother as biological father or mother, it is suspected that the alleged father or mother may not be the biological father or mother. However, we considered that gene mutation might occur at the STR loci excluded. In such a case, additional genetic markers are needed to further determine paternity. Nowadays, although a large number of new STR loci have been studied, they don't adopt quality and measurement attestation in the forensic application. Therefore, the highly polymorphic HLA loci may be the best option for further paternity testing. The results show that combinations of the HLA loci and STR loci may be a powerful tool for individual identification and paternity testing for the Chinese Uyghur population in the region.
This present study may provide basic and valuable data for anthropological analysis and studies of HLA-associated disease susceptibility, organ transplantation (especially bone marrow transplantation), population genetics, human identification and paternity testing in forensic sciences.

Ethical statement
This study was approved by the Ethics Committee of Xi'an Jiaotong University, China. All the participants provided their written informed consent for the collection of the samples and the subsequent analysis, and the investigation was conducted in accordance with humane and ethical research principles of Xi'an Jiaotong Univeristy, China.

Population samples
One hundred and four unrelated healthy Uyghur individuals were randomly chosen from Yining city, Xinjiang Uyghur Autonomous Region, China. All participants were interviewed to ensure that no individuals have common ancestry going back at least three generations.

DNA extraction
Whole blood samples were collected from the participants and stored at -20uC until DNA extraction. Genomic DNA was extracted from whole blood containing ethylenediaminetetraacetic acid (EDTA) using a standard salting out method which yielded good quality high molecular weight DNA suitable for sequencing [58].
Polymerase chain reaction-sequence based typing of HLA-A, -B and -DRB1 loci All individuals were typed for HLA-A, -B and HLA-DRB1 loci. Sequencing-based-typing (SBT) of exons 2 and 3 at HLA-A and -B loci were performed according to Kurz et al. [59] and Pozzi et al. [60], with minor modifications; SBT of exon 2 and part of intron sequences on both sides of exon 2 at HLA-DRB1 locus were performed as described by Jia et al. [61] and Deng et al. [62].
PCR amplification was performed using a GeneAmp PCR system 9700 (Applied Biosystems, Foster City, CA, USA). PCRamplified DNA fragments were purified and sequenced with ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction Kits (Applied Biosystems, Foster City, CA, USA) using an ABI 3730XL DNA sequencer (Applied Biosystems, Foster City, CA, USA), according to the manufacturer's instructions. Sequencing was processed always in the forward and reverse directions using software Sequencing Analysis, MatchTools and Navigator (MatchTools Allele Identification package, Applied Biosystems, Foster City, CA, USA). The software was used to detect the heterozygous position within each electropherogram and to assess the typing, based on an alignment of the processed sequence with a library of HLA sequences and alleles updated to October 2007. Ambiguous types were resolved to four digits according to the updated database (IMGT release 2.19.0).

Statistical analysis
Allelic frequencies of HLA-A, -B and -DRB1 loci were estimated using SPSS 11.0 software (SPSS Inc., Chicago, Illinois). Haplotypic frequencies were calculated from genotype data by expectation maximization (EM) algorithm using Arlequin software package version 3.0 (Laurent Excoffier, CMPG, Zoological Institute, University of Bern, Switzerland) [63]. The linkage disequilibrium, the non-random association of two alleles at two different loci which is defined as the delta (D') coefficient, was calculated as described by John Lee [64]. The genetic distances between different populations were calculated as previously described by Nei [65] and a phylogenetic tree was constructed based on the allelic frequencies of HLA-B locus with the Mega 3.1 Software package (Center for Evolutionary Functional Genomics, the Biodesign Institute Tempe, AZ, USA) using the Neighborjoining method [66]. The exact testing method of Guo and Thomson [67] was used to evaluate the deviation from the expected Hardy-Weinberg genotypic proportions.
Forensic parameters, including heterozygosity observed, expected heterozygosity, power of discrimination, polymorphism information contents, and probability of paternity exclusion, computed using the PowerStat version 1.2 spreadsheet (Promega Corporation, USA), as described by Tereba [68], were used to evaluate the polymorphisms of genetic markers. HO was defined as the number of heterozygote divided by the sample size and HE is defined as the estimated fraction of all individuals who would be heterozygous for any randomly chosen locus. PIC indicates the polymorphism of a locus and is often used to measure the indicative strength of genetic markers for linkage studies. PD is defined as the probability that two randomly selected individual will have different genotypes. PPE is defined as the fraction of the individuals that is different from that of a randomly selected individual or as the power of a locus to exclude a person from the possibility of being the biological father [69].