Allele and Haplotype Diversity of 26 X-STR Loci in Four Nationality Populations from China

Background Haplotype analysis of closely associated markers has proven to be a powerful tool in kinship analysis, especially when short tandem repeats (STR) fail to resolve uncertainty in relationship analysis. STR located on the X chromosome show stronger linkage disequilibrium compared with autosomal STR. So, it is necessary to estimate the haplotype frequencies directly from population studies as linkage disequilibrium is population-specific. Methodology and Findings Twenty-six X-STR loci including six clusters of linked markers DXS6807-DXS8378-DXS9902(Xp22), DXS7132-DXS10079-DXS10074-DXS10075-DXS981 (Xq12), DXS6801-DXS6809-DXS6789-DXS6799(Xq21), DXS7424-DXS101-DXS7133(Xq22), DXS6804-GATA172D05(Xq23), DXS8377-DXS7423 (Xq28) and the loci DXS6800, DXS6803, DXS9898, GATA165B12, DXS6854, HPRTB and GATA31E08 were typed in four nationality (Han, Uigur, Kazakh and Mongol) samples from China (n = 1522, 876 males and 646 females). Allele and haplotype frequency as well as linkage disequilibrium data for kinship calculation were observed. The allele frequency distribution among different populations was compared. A total of 5–20 alleles for each locus were observed and altogether 289 alleles for all the selected loci were found. Allele frequency distribution for most X-STR loci is different in different populations. A total of 876 male samples were investigated by haplotype analysis and for linkage disequilibrium. A total of 89, 703, 335, 147, 39 and 63 haplotypes were observed. Haplotype diversity was 0.9584, 0.9994, 0.9935, 0.9736, 0.9427 and 0.9571 for cluster I, II, III, IV, V and VI, respectively. Eighty-two percent of the haplotype of cluster IIwas found only once. And 94% of the haplotype of cluster III show a frequency of <1%. Conclusions These results indicate that allele frequency distribution for most X-STR loci is population-specific and haplotypes of six clusters provide a powerful tool for kinship testing and relationship investigation. So it is necessary to obtain allele frequency and haplotypes data of the linked loci for forensic application.


Introduction
Autosomal short tandem repeats (AS-STR) and Y chromosomal STR (Y-STR) are powerful tools for human identification and kinship test.Many multiplex PCR systems of autosomal STR (AS-STR) and Y chromosomal STR (Y-STR) have been reported, and many commercial kits of the AS-STR and the Y-STR are available.The X chromosomal STR (X-STR) is recognized as important tools in forensic application.In recent years, considerable X-STR systems have been studied in the field of population genetics and forensics [1][2][3][4][5].However, few kits include X-linked X-STR markers except MentypeH Argus X-8 Kit and Investigator Argus X-12 Kit (Biotype AG, Dresden, Germany).With the complication of forensic cases, AS-STR and the Y-STR markers as well as these two X-STR Kits were not enough in forensic application.So we developed two multiplex PCR system with twenty-six X-STR loci including DXS6800(Xq13), DXS6803(Xq21), DXS9898(Xq21), GATA165B12 (Xq25), DXS6854(Xq25), HPRTB(Xq26), GATA31E08 (Xq27), and six clusters of closely linked markers, cluster I: DXS6807-DXS8378-DXS9902 (Xp22); II: DXS7132-DXS10079-DXS10074-DXS10075-DXS981 (Xq12); III: DXS6801-DXS6809-DXS6789-DXS6799 (Xq21); IV: DXS7424-DXS101-DXS7133 (Xq22); V: DXS6804-GATA172D05 (Xq23); and VI: DXS8377-DXS7423 (Xq28).(Fig. 1 shows the physical localization of these markers).On the other hand, allele frequency distribution for most X-STR loci varies with different populations [6,7].Moreover, the use of X-STR requires a precise knowledge not only of allele and haplotype frequencies, but also of the genetic linkage and linkage disequilibrium (LDE) status among markers [8].This study investigated polymorphism and linkage and/or independence of the selected markers in four nationality populations from China.

Sampling and DNA extraction
Blood samples were collected from 1,522 unrelated individuals from four nationality populations in Mainland China.A total of 745 subjects of Han nationality from Guangdong (477 males and 268 females), 234 subjects of Uigur nationality (100 males and 134 females) from Yi-ning City, Ili, Xinjiang Province, 386 subjects of Kazakh nationality (173 males and 213 females) from Tacheng Prefecture of Xinjiang and 157 subjects of Mongol nationality (126 males and 31 females) from Inner Mongolia were studied.There were 325 family trios (father-mother-daughter), 286 family duos (mother-son), and 40 three-generation families (grandmotherfather-granddaughter) from Guangdong.Parents of the trios and mothers of the duos were included in the unrelated individuals.Samples were prepared and DNA was extracted using Chelex-100 methods [9].

Ethics Statement
The research protocol was approved by the Human Subjects Committee at the Zhongshan School of Medicine, Sun Yat-sen University and written informed consent was obtained from all participants or guardians involved in the study.

Sample electrophoresis
Electrophoresis was performed in a 24-capillary ABI 3500 Genetic Analyzer (Applied Biosystems, USA). 1 ml PCR products to 10 ml deionized formamide (Applied Biosystems, USA) and 0.25 ml Genescan TM -500 LIZ TM size standards (Applied Biosystems, USA).The matrix standards for spectral calibration were developed according to the Matrix manufacture's instructions (AGCU Scien Tech Incorporation, China).The results were analyzed with GeneMapper ID-X Analysis Software.The K562 and 9947A (Promega Corporation, Madison, WI, USA) Cell lines DNA were typed for calibrating allelic ladder.

Sequence analysis
Allele of the ladder was sequenced in order to ensure correct designation of allele nomenclature.Samples were amplified with the single PCR in Gene Amp PCR System 9700 Thermal Cycler (Applied Biosystems, Foster City, CA, USA) under the following conditions: initial denaturation at 94uC for 11 min, followed by 30 cycles of 94uC for 45 min, 61uC for 45 min, 72uC for 45 min, and additional 72 min at 5uC.PCR products were purified or cloned with the TOP10F Cloning Kit (TIANGEN Biochemical Technology Co. Beijing, China) following the manufacturer's instructions.Then purified PCR products or the chosen clones were sequenced on ABI 3100 Genetic Analyzer using a BigDyeH Terminator Cycle Sequencing Kit (Applied Biosystems, USA) according to the manufacturer's instructions.

Statistical analysis
The software ARLEQUIN 3.5 [12] was used to perform the following statistical analysis, including allelic frequencies and haplotype frequencies, the exact chi-square test for Hardy-Weinberg equilibrium (HWE) for female data, exact tests for population differentiation between allele frequencies of males and females, linkage disequilibrium (LDE) test between all pairs of markers.The exact test differentiation of allele frequency distribution among different populations was performed with SPSS v.15.0.Polymorphism information content (PIC) was estimated according to Botstein et al. [13] The power of discrimination in females (PD F ) and males (PD M ), mean exclusion chance (MEC) were calculated according to Desmarais et al. [14]

Results
Sequences of some alleles for ladder are shown in electronic supplementary material (ESM: FigS1, FigS2, FigS3, FigS4, FigS5,           FigS6, FigS7, FigS8, FigS9, FigS10, FigS11, FigS12, FigS13, FigS14, FigS15, FigS16, FigS17, FigS18, FigS19, FigS20, FigS21, FigS22, FigS23, FigS24, FigS25, FigS26, FigS27, FigS28 in File S1).When 1,522 samples were tested, a total of 5-20 alleles for each locus were observed and altogether 289 alleles for all the selected loci were found.The allele frequencies and further statistical information of the twenty-six loci in Han, Uigur and Mongol population are shown in Table 1.The allele frequencies and further statistical information in Kazakh has been described in MX15-STR [10] and MX12-STR [11].HWE was performed on female samples, and the P-values of HWE are greater than 0.05 at all the twenty-six loci.The comparisons among our studied populations as well as between our selected populations and those reported by others show that allele frequency distribution is different for most X-STR loci in different populations.The results for P-values of population differentiation are listed in Table S1 and Table S2.A total of 876 male samples were investigated by haplotype analysis and for linkage disequilibrium.P valuate of the exact test for LDE is listed in Table 2.The haplotype number and haplotype diversity of the six clusters are shown Table 3.The haplotype frequencies of the six clusters are shown in Table S3, S4, S5, S6, S7, and S8.Thirty-one cases of mutation were detected from the fifteen loci in 9,480 meioses.Mutation information is listed in Table 4.

Discussion
Polymorphism HWE was performed on female samples, and the genotype distributions did not deviate from HWE at the twenty-six loci.Allele frequencies between female and male samples were not significantly different in all the examined loci.The allele frequencies were 0.0010-0.8164.PIC of all the selected loci reached above 0.59 with the exception of DXS7133, DXS6800 and DXS7423.Power of discrimination in females (PD F ) was 0.3827-0.9849.Notably, DXS8377, DXS10079, DXS101 and DXS981 are highly polymorphic, with the highest power of discrimination and probability of paternity exclusion among the twenty-six loci studied.These results suggest that the twenty-six X-STR loci are highly polymorphic and have satisfactory forensic efficiency.

Comparisons among different populations
The comparisons of the allele frequency distribution were performed among our studied populations as well as between our selected populations and those reported by others, such as Sichuan Han [1], Taiwan [3], Japan [4], Pakistan [16], Northern Italy [17], Brazil [18], Algeria [19], Ghana [20], and Ivory Coast [21].Significant differences were found in the selected 21 loci between Han and Uigur, in the selected 24 loci between Han and Kazakh, and in the selected 16 loci between Han and Mongol.However, no significant differences were found between Guangdong Han and Sichuan Han as well as Taiwanese Han.Probably this is because most Taiwanese come from Han population living in Mainland China.Significant differences were found between Uigur and Mongol in the selected 13 loci, but no significant differences were found between Uigur and Kazakh in the selected  20 loci.Heterogeneous marriage or marriage between different regions is not common and homogeneous marriage or marriage within the same region is prevalent because of differences in nationality origin, language and culture, etc.The Uigur are originated from ancient HuiGe.The Kazakh are originated in the central Asian steppes.In the middle of the sixth century, Kazakh and Uigur were affected by the Turkish culture.There are many similarities between Uigur, Kazakh, and Turkish ethnic languages and cultures.So intermarriage among the Uigur, kazakh and Turkish is common.This may possibly explain why there is no significant difference between the Uigur and the Kazakh.Moreover, there are significant differences of haplotype distribution in the five clusters between the Uigur and the Kazakh except at the clusters VI (DXS8377/DXS7423).Notably, the same haplotype in clusters II (DXS7132-DXS10079-DXS10074-DXS10075-DXS981) has only nine between the Uigur and the Kazakh.Significant differences were found between Kazakh and Mongol in the selected 10 loci.Besides, significant differences were also found in a great number of loci between our selected populations and those of other countries (Table S2).As a result, allele frequency distribution for most X-STR loci is different in different populations.So it is important to develop population data for forensic analysis.

Mutation
In the kinship cases, 40 three-generation families (grandmotherfather-granddaughter) have been tested using MX15-STR and MX12-STR.The grand-maternal genotypes were found to be transmitted to her granddaughters by her son.Thirty-one mutations were detected from the twenty-six loci in 24,336 meioses.The average mutation rate for the twenty-six loci was estimated to be 1.27610 23 per meiosis.96.77% mutation is the shift of one repeat unit.Our results are consistent with those of Fracasso [22], Shin [23] and Szibor et al [24].Mutation rate of the same order was also described for autosomal STR [25].

Conclusion
Our results suggest that allele frequency distribution for most X-STR loci is population-specific and the haplotypes of the six clusters may provide a powerful tool for haplotype analysis in kinship testing and relationship identification.So it is necessary to acquire allele frequency and haplotypes data of the linked loci in different ethnic groups for forensic application.

Figure 1 .
Figure 1.Idiogram of 26 X-STR Loci.doi:10.1371/journal.pone.0065570.g001 PDM power of discrimination in males, PD F power of discrimination in females, MEC I mean exclusion chance for X-STR in standard trios with daughters.MEC II mean exclusion chance for X-STR in father/daughter duos.PIC: polymorphism information content.doi:10.1371/journal.pone.0065570.t001

Table 1 .
Allele frequencies and statistical parameter of the 26 loci in the three nationality populations from China.

Table 2 .
Results of p values for test of linkage disequilibrium.

Table 3 .
Haplotype number and diversity of the six clusters in the four nationality populations from China.

Table 4 .
Mutation detected from the pedigree analysis of the 325 father-daughter-mother trios and the 286 mother-son duos.