Inner and inter population structure construction of Chinese Jiangsu Han population based on Y23 STR system

In this study, we analyzed the genetic polymorphisms of 23 Y-STR loci from PowerPlex® Y23 system in 916 unrelated healthy male individuals from Chinese Jiangsu Han, and observed 912 different haplotypes including 908 unique haplotypes and 4 duplicate haplotypes. The haplotype diversity reached 0.99999 and the discrimination capacity and match probability were 0.9956 and 0.0011, respectively. The gene diversity values ranged from 0.3942 at DYS438 to 0.9607 at DYS385a/b. Population differentiation within 10 Jiangsu Han subpopulations were evaluated by RST values and visualized in Neighbor-Joining trees and Multi-Dimensional Scaling plots as well as population relationships between the Jiangsu Han population and other 18 Eastern Asian populations. Such results indicated that the 23 Y-STR loci were highly polymorphic in Jiangsu Han population and played crucial roles in forensic application as well as population genetics. For the first time, we reported the genetic diversity of male lineages in Jiangsu Han population at a high-resolution level of 23 Y-STR set and consequently contributed to familial searching, offender tracking, and anthropology analysis of Jiangsu Han population.


Introduction
Genetic markers derived from Y chromosome independent of recurrent mutations or recombination [1], play special roles in uncovering genetic structure [2,3] and inferring human dispersal and important migration time range [4,5]. In Y chromosomes, genetic variations are inherited from father to son paternally. Thus, Y chromosome could reflect the gene flows and genetic differentiation of male lineages [6]. In addition, genetic markers in Y chromosome have smaller effective population size compared to those in autosomes. The advancement of new sequencing technologies and more attention paid to human genome project provided abundant genetic markers in Y-DNA [3,7,8]. Among the various markers in Y chromosome, short tandem repeats (STRs), are widely employed due to their hypervariability and hypermutability, thus it is

Statistical analysis
After direct count of the allele frequencies and haplotype by Arlequin 3.0 [21], we calculated gene diversity (GD) following Nei [22,23]. We also calculated several representative forensic parameters, such as haplotype diversity (HD), discrimination capacity (DC), and match probability (MP) according Sabine et al [24]. DYS385a/b, a multi-copy locus, was analyzed as combined haplotypes. And we got the allele of DYS389II by the subtraction of DYS389I. The popular combination of computational R ST values which referred to the excess similarity Genetic structure of Chinese Jiangsu Han by 23 Y-STRs among alleles chosen randomly within the subgroup relative to the entire group [25] and Multidimensional Scaling plot (MDS) [26] was adopted as YHRD analyzed and many publications conducted [15,19,[27][28][29][30][31][32]. Linkage patterns and Analysis of Molecular Variance (AMOVA) which generated the pairwise R ST values and according significant values were performed using Arlequin 3.0 based on detailed haplotype information of eligible individuals (samples with the null, intermediate alleles, and copy-number variations were removed). Neighbor-Joining (N-J) tree was depicted and visualized by Mega 6.0 [33] as others conducted [31,34]. Based on the matrix of R ST values, we illustrated MDS as YHRD recommended and achieved values of initial stress by employing "MASS" package (https://cran.r-project.org/web/packages/MASS/index. html).

Quality control
Our laboratory's ability to Y-STR genotyping was accredited by YHRD through Y-STR haplotyping quality test. The population accession number is YA004256 (https://www.yhrd.org). As modified by YHRD, the null, intermediate alleles, and copy-number variations were removed in the analysis of R ST . The study was carried out following ISFG recommendations with respect to DNA polymorphisms as Schneider described [35].

Haplotypes and forensic parameters
After statistical analysis, we found 912 different haplotypes in total, comprising 908 singletons (99.6%) and 4 duplicates (0.4%). Also, two null allele was observed at the DYS448 and DYS456, which have been detected in several previously publications (S1 Table, [12,19,36,37]. As well, we presented the geographic locations of all samples in S2 Table (available at the  S3 and S4 Tables displayed allele frequencies and each GD value of 21 single-copy Y-STR loci and the multi-copy DYS385 locus. The size of alleles in each Y-STR locus ranged from 6 at DYS391 to 33 at DYS389II. Additionally, allele number ranged from 5 at DYS437 to 13 at DYS458 and 76 different allele combinations were observed at DYS385a/b, slightly higher than 53 in 12 worldwide populations [31] and 69 in four U.S. populations [32]. Except DYS391, DYS438 and DYS437, all loci got GD values higher than 0.5. The highest GD value was 0.9607 at DYS385, which was a multi-copy locus taken to be consisted of 76 various alleles, and the lowest was 0.3942 at DYS438 with the highest allele frequency of 0.7533 at allele 10. Actually, distributions of allele frequencies at the panel of 23 Y-STRs varied in different populations. At DYS438, the most frequent allele was 11 in Tibetan (F = 0.7016) [38] and Bangladeshi (F = 0.5330) [39]. However, the most frequent allele at DYS438 was still 10 in Zhuang (F = 0.6880) [40], Mongolian (F = 0.7483) [41], Miao (F = 0.8095) [42], South Korean (F = 0.4967) [43], Liaoning Han (F = 0.7016) [34], Chengdu Han (F = 0.6861) [44], and Hunan Han (F = 0.7323) [30]. In addition, similar to Jiangsu Han population, the frequency of allele 10 at DYS438 was also the highest among targeted Y-STRs in Liaoning Han, Guangdong Han, Hunan Han, and South Korean. However, the underlying reason was unclear and this genetic phenomenon may have connections with population bottleneck in population genetics, thus leading to genetic drift [45]. Likewise, this could also be induced by the founder effect outlined by Ernst Mayr [46]. From the new-established groups' perspective, the genetic diversities of offspring were massively dependent of several emigrants from a large population. The randomization thus probably led to that the most abundant alleles were different among populations. Excluding DYS385, the maximum level of GD was DYS458 at 0.8233, which had the most alleles in single-copy locus. The distribution of various forensic parameters demonstrated that Jiangsu Han population was genetically of high diversity.

Linkage disequilibrium analysis
LD analysis of 23 Y-STRs in Jiangsu Han was performed in Fig 2. Among 253 pairwise comparisons, 233 showed the status of LD (red area encircled by bold black boxes), accounting for 92.1%. It showed high level of LD patterns among Jiangsu Han population, explained by previous genomic studies that 95% of Y chromosome, so-called male-specific region (MSY) where X-Y crossing was absent, was a non-recombining region [47].

Inner-population differentiation
The values of pairwise R ST genetic distances and significances among 10 prefecture-level divisions of Jiangsu province were shown in Table 1. Zhenjiang, Yangzhou, and Taizhou Han sub-populations were excluded because of their small sample sizes which could not reflect the comprehensive genetic milieu. There were little significant differences existed among these 10 prefectures. The lowest genetic distance was between Xuzhou Han and Nantong Han (R ST = 0.0000, P = 0.1441), while the highest genetic distance was of significance between Lianyungang Han and Nantong Han (R ST = 0.0292, P = 0.0451). In addition, 4 more significant differences were found between Xuzhou and Wuxi Hans (R ST = 0.0101, P = 0.0000), Yancheng and Wuxi Hans (R ST = 0.0153, P = 0.0360), Lianyungang and Wuxi Hans (R ST = 0.0128, P = 0.0360), and Yancheng and Nantong Hans (R ST = 0.0269, P = 0.0180). After Bonferroni Correction (P < 0.05/55 % 0.0009), only the comparison between Wuxi Han and Xuzhou Han was found to be remarkable. In the map of Jiangsu province (Fig 1), Xuzhou Han and Wuxi Han were geographically far apart, respectively located in the farthest north and south part of Jiangsu, which contributed to the significant genetic differentiation. Fig 3 showed the optimal N-J tree of 10 prefecture-level divisions of Jiangsu with the sum of branch length (SBL) = 0.01566328. In general, the 10 Han populations clustered in the N-J tree were not in accordance with the geographical distribution. The phylogenetic reconstruction indicated the close relationships between Huaian and Wuxi Hans, Nantong and Suqian Hans. Nanjing Han was clustered with Yancheng Han first, then with Lianyungang Han, followed by Soochow Han. The Han groups from south were close to various Han groups from north Jiangsu province, demonstrating the population homogeneity among Jiangsu Han subgroups.
For further validation, we drew the MDS plot whose stress reached 0.00824, indicating the perfect configuration (Fig 4). We observed representative genetic relationships between Yancheng and Lianyungang Hans, Nantong and Wuxi Hans, which were similar to N-J tree. Besides, Wuxi, Soochow, Changzhou, and Nanjing Hans were located dispersedly in the plot. The results of MDS conformed to that of N-J reconstruction. Changzhou, Lianyungang, and Nantong Han populations were observed as three outlier Han populations along both Dimensions 1 and 2, which were all situated at the margin areas of Jiangsu province (Fig 1) and had frequent gene exchanges with populations from neighbor provinces.
We associated the genetic mixture of southern and northern Jiangsu Han with the fact that southern Jiangsu absorbed many floating people derived from northern Jiangsu due to its high level of economic development, living conditions, education, and means of transportation. The genetic characteristics of 10 Han sub-populations in Jiangsu province covered 2 main aspects: one was the significant difference between southern and northern Han populations in Jiangsu; another was the modern living mode consistently influenced the population structure of various Han populations in southern Jiangsu, in addition, the frequent crosstalk among Jiangsu Han populations pulled them together from the view of gene flow.
To the best of our knowledge, only the population data and corresponding genetic background of Huai'an Han [48], and Nantong Han [19,49] have been collected and analyzed in the level of subpopulation. Our efforts to establish the 23 Y-STRs database of 10 Jiangsu Han populations illustrated the genetically gene exchange comprehensively and would provide references for investigating authorities to target the potential pedigree potently. Attentions have been paid to demonstrating the inner-population structure utilizing Y-STRs as genetic markers for the purpose of inferring the most likely geographic origin of a haplotype at Y chromosome. In 2014, Zeng et al. [27] discussed the Taiwanese origin of the Austronesian expansion within 9 major aboriginal tribes inhabiting Taiwan. And in 2016, Ulises et al. [29] reported a comprehensive study on population structure of 13 Argentine provinces. In this research, we initially focused on the interactive relationships among 10 inner-Jiangsu Han groups and point out its values on deducing the historical evolution and inferring the potential population trend nowadays. More importantly, our findings contributed to providing the investigating authorities with potential mixed relationships within Jiangsu province when solving crossregion crimes.  of Han ancestry. Dramatically, Singapore Indian had the greatest genetic distance with Jiangsu Han, in accordance with geographic location. In addition, we performed N-J tree (Fig 5) and MDS plot (Fig 6) of above-mentioned populations labelled with language affinities. The result of branch length test (SBL = 0.07132081) of 19 East Asian populations, larger than that of 13 prefectures, verified again the greater genetic differences between Jiangsu Han population and various neighbor populations located in East Asia. The phylogeny clearly showed that Jiangsu Han and Dai were from the same node, then clustered with Xuanwei and Beijing Hans, all sharing the same language of Sino-Tibetan. Obviously, the 5 populations (Gunma Japanese, Shizuoka Japanese, South Korean, Tokyo Japanese, and Ibaraki Japanese groups) derived from Altaic language region were close to each other, located in the another flank of N-J tree compared to the bundle containing Jiangsu Han group.
As shown in Fig 6, the MDS plot had a good credibility as initial stress was 0.1067. Jiangsu Han, Beijing Han, Chengdu Han, and Singapore Han populations clustered together in the left  [50] and genetic evidence on Y-SNPs [51], microsatellites [52], and autosomal SNPs [53]. The contribution of Southern or Northern Han to Jiangsu Han population still remained ambiguous, caused by the marginal zones and complex historical population events of Jiangsu province. Previously, Yao et al. [54] have reported that Jiangsu Han was closer to Southern Han (Guangdong Han) than Northern Han (Liaoning Han) based on 15 autosomal STR data. These two opposite opinions provided strong evidence that Jiangsu Han was mixed with Han populations from both side. In population genetics of humans, mtDNA was the most popular molecular [55], unravelling the human origin out of Africa [56][57][58]. When comparing Y-specific markers and DNA fingerprints from other system, Eric et al. [59] concluded that striking differences showed between the history events and behavior patterns of male and female lineages. Thus, from the accurate perspective, the male portion of Jiangsu Han originated from Northern Han. In future, the population data based on mtDNA and X chromosome of Jiangsu Han would integrate the complex network underlying human dispersal of Jiangsu Han population. In this research, we enriched the forensic molecular database of Jiangsu Han with the 916 typing files on 23 Y-STRs, which furthered the understanding of population genetics, as well as molecular anthropology. The structure patterns of either inner-or inter-population concerning Jiangsu Han were initially illustrated. This research called for more genotyping profiles of different genetic markers to improve the inference on where and how Jiangsu Han originated and dispersed divergently. The limitation to our research was the disaccord of sample sizes of inner Jiangsu populations, which might have some influences on the structure reconstruction. In next stage, we are dedicated to type more representative samples from Jiangsu Han population to explain the genetic background and population exchange of Jiangsu Han for forensic applications more comprehensively and scientifically.

Conclusion
As we have noted above, 916 unrelated healthy Han males from Jiangsu province were genotyped at 23 Y-STR loci. 912 different haplotypes, including 908 singletons (99.6%) and 4 duplicates (0.4%), were found in total comprising, revealing that the 23 Y-STRs contained in the PowerPlex 1 Y23 System were highly polymorphic (HD = 0.9999952) in Jiangsu Han population and were great valuable for forensic application (DC = 0.9956, MP = 0.0011). The highest GD value was 0.9607 at DYS385, and the lowest was 0.3942 at DYS438. Then, LD analysis elaborated the necessity of forming these markers into haplotype. Pairwise R ST genetic distances and relevant significances among inner-and inter-population of Jiangsu Han, combined with N-J trees and MDS plots were conducted to illustrate the genetic background of Jiangsu Han population objectively. For the first time, we depicted the inner-population differentiation and genetic characteristics of Jiangsu Han population. Additionally, in the scale of East Asian, it indicated that the major male component of Han population in Jiangsu came from Northern Chinese Han.
Supporting information S1 Table. The