Genetic Evidence of an East Asian Origin and Paleolithic Northward Migration of Y-chromosome Haplogroup N

The Y-chromosome haplogroup N-M231 (Hg N) is distributed widely in eastern and central Asia, Siberia, as well as in eastern and northern Europe. Previous studies suggested a counterclockwise prehistoric migration of Hg N from eastern Asia to eastern and northern Europe. However, the root of this Y chromosome lineage and its detailed dispersal pattern across eastern Asia are still unclear. We analyzed haplogroup profiles and phylogeographic patterns of 1,570 Hg N individuals from 20,826 males in 359 populations across Eurasia. We first genotyped 6,371 males from 169 populations in China and Cambodia, and generated data of 360 Hg N individuals, and then combined published data on 1,210 Hg N individuals from Japanese, Southeast Asian, Siberian, European and Central Asian populations. The results showed that the sub-haplogroups of Hg N have a distinct geographical distribution. The highest Y-STR diversity of the ancestral Hg N sub-haplogroups was observed in the southern part of mainland East Asia, and further phylogeographic analyses supports an origin of Hg N in southern China. Combined with previous data, we propose that the early northward dispersal of Hg N started from southern China about 21 thousand years ago (kya), expanding into northern China 12–18 kya, and reaching further north to Siberia about 12–14 kya before a population expansion and westward migration into Central Asia and eastern/northern Europe around 8.0–10.0 kya. This northward migration of Hg N likewise coincides with retreating ice sheets after the Last Glacial Maximum (22–18 kya) in mainland East Asia.


Introduction
In recent years, extensive studies of the Y-chromosome lineages in East Asian populations have been conducted and found that the dominant haplogroups O-M175, D-M174, C-M130, and N-M231 in East Asian populations all have a southern origin [1][2][3][4][5][6][7][8]. Among these East Asian Y-chromosome lineages, D-M174 represents the earliest northward migration, beginning from the southern part of East Asia of what is now mainland Southeast Asia and southern China about 50-60 kya [5]. The northward migration of C-M130 occurred about 40 kya, following coastal route up mainland China, then reaching further north to Siberia around 15 kya and finally making its way to northern America [8][9][10][11]. The northward expansion of O-M175 within the Asian continent (about 25-30 kya) made the greatest impact on current East Asian Y chromosomal profiles, reflected by the dominance of O-M175 lineages (ranging anywhere from 18-75%) in East Asia, and both mainland and island Southeast Asia [4].
In the present study, we systematically analyzed Hg N profiles in East Asia and Southeast Asia populations (a total of 6,371 males from 169 geographic populations) to trace the origin and prehistoric migration patterns of the Hg N lineage.

Samples
A total of 6,371 unrelated males from 169 populations in East Asia ( Figure 1 and Table S1) were recruited and asked to sign written informed consent for the usage of samples in this study. The protocol of this study was approved by the Institutional Review Board of Kunming Institute of Zoology, Chinese Academy of Sciences (Approval ID number, SWYX-2012008). In addition, to compare the population structure of Y chromosome Hg N among geographic populations, we also retrieved previously published data on 1,210 Hg N individuals from different geographic areas (Y-SNP and Y-STR) [3,12,13,16,19,28,29].

Y-Chromosome Marker Genotyping
According to the hierarchical genotyping strategy, M231 was typed first and samples from the M231-positive individuals were then subjected to further subtyping, according to the highresolution Y chromosomal haplogroup tree so that they could be assigned to a specific haplotype [30]. The Y chromosome bi-allelic markers (LLY22g, M128, P43 and M46 (Tat)) were genotyped by the Snapshot method (Applied Biosystems, USA). Additionally, the 7 commonly used Y-STR markers: DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, and DYS393were also typed using fluorescence-labeled primers on an ABI 3130XL Genetic Analyzer (Applied Biosystems, USA). The Y-STR nomenclature follows a system proposed previously [31].

Data Analysis
To visualize the geographic distributions of Hg N and its sublineages, Golden Software Surfer 10.0 (Golden Software Inc., USA) with the Kriging algorithm was used to construct a contour map, and the data used was listed in Table S3.  Table S1. doi:10.1371/journal.pone.0066102.g001 Median-joining networks for STR variations of the Y-chromosomal haplogroups were constructed using NETWORK 4.6 (Fluxus Engineering) [32] with equal weights across loci.
For each Y-chromosomal haplogroup/sub-haplogroup (defined by Y-SNPs), we estimated its age by Y-STR variations using the published method [27,33,34]. An effective mutation rate of 0.0069 was used [34].
The genetic diversity of the different geographic populations under Hg N and its sub-haplogroups were calculated using STR data by GenAlEx 6.5 [35].
For the analysis of Y-chromosomal STR alleles, DYS389II was named DYS389b after subtracting DYS389I because the PCR product of DYS389II contains both DYS389II and DYS389I loci.

Results
We systematically screened a total of 6,371 unrelated males from 169 populations in China and Cambodia ( Figure 1 and Table S1). By genotyping the Y-chromosome bi-allelic marker M231, we identified 390 males (6.12%) belonging to this Hg N lineage. Further typing of 4 additional bi-allelic markers and 7 Ychromosome STRs, generated complete data for 360 Hg N males, which were used in the following analyses (Table 2). We also retrieved 1,210 Hg N data from other published studies, including 1,197 Hg N males identified from 68 populations in Siberia, Central Asia and Europe [3,12,16,28,29], and 13 Hg N males from 4 populations in Japan, Laos and southern China [3,13,19]. Collectively, we analyzed a total of 1,570 Hg N, covering all major geographic regions possessing the Hg N lineage (from 20,826 males in 359 populations across Eurasia, Table S2).
Hg N is prevalent (.5%) in East Asia (e.g., among Han Chinese, Tibeto-Burman and Austro-Asiatic speaking populations), as well as in northern/central Asia and eastern/northern Europe with on average the highest frequency in Siberia (38.27%). Meanwhile, Hg N is relatively rare in southeastern, southern and western Asia, and completely absent in southern/western Europe. Within the Hg N lineage, there are 5 sub-haplogroups with distinctive geographic distributions. N*-M231 is presumably the ancestral haplogroup in Hg N, mostly present in southern East Asian populations including Daic, southern Han Chinese, Tibeto-Burman and Hmong-Mien in southern China ( Figure 2A); however, it is totally absent in Siberia, Central Asia and eastern/northern Europe, consistent with the previously proposed southern origin of Hg N in East Asia [3,12,16,28,29]. The other 4 sub-haplogroups share a common mutation at the LLY22g locus ( Figure 3). Under LLY22g, N1*-LLY22g is both the ancestral and most dominant sub-haplogroup, with distribution extending from southern to northern East Asia and the highest frequency observed in Tibeto-Burman populations. The distribution pattern of N1a-M128 is similar to N1*-LLY22g, but much less prevalent ( Figure  2B and 2C). By contrast, the distributions of N1b-P43 and N1c-M46 are restricted to North Asia and East/North Europe, rare in East Asia and Central Asia, and absent in Southeast and South Asia ( Figure 2D and 2E). Collectively, this geographic distribution pattern suggests a clear divergence between regional populations with the ancestral lineages occurring in multiple ethnic populations throughout southern China.
We constructed contour maps of the five N-M231 subhaplogroups based on the geographic distributions of these lineages in Eurasian populations (Table S3). The two presumably ancestral haplogroups (N*-M231 and N1*-LLY22g) likely originated in southern China, as there is a clear south-to-north decline of these frequencies (Figure 2A and 2B). Conversely, N1b-P43 and N1c-M46 are both enriched in Siberia with N1b-p43 having a north-to-south decline and N1c-M46 having an east-to-west decline ( Figure 2D and 2E). The contour map of N1a-M128 is different from the others with the highest frequency observed in Central Asia due to the relatively high frequency of N1a-M128 among Kazakhs (8.1%) in Central Asia ( Figure 2C).
To examine the detailed diversity of each N-M231 subhaplogroup, we constructed STR networks for the 5 subhaplogroups based on data of 7 Y-chromosome STR loci ( Figure  3). Among the two ancestral lineages of Hg N, we observed relatively diverged STR haplotypes, and the core STR haplotypes are mostly from southern populations in China, suggesting a likely origin in southern China. Comparatively, the core STR haplotypes of N1b-P43 are mostly from the northern populations of China and Siberia, suggesting its origin may be in northern East Asia. Moreover, the STR networks of N1b-P43 reflect that the STR haplotypes in Europeans were derived from Siberia and Central Asia, consistent with the proposed counter-clock-wise prehistoric migration of the Hg N lineages into East/North Europe [3]. Interestingly, N1a-M128 displayed a star-like STR network, implying a recent expansion of this Hg N lineage.    (Figure 3). The high frequency of N1a-M128 in Central Asia is likely then due to a recent local expansion of this sub-haplogroup. Further comparison of the STR variation levels among the different populations also supports an East Asia origin of the Hg N. For the two ancestral lineages, N*-M231 and N1*-LYY22g, the STR diversity of southern populations is higher than northern populations in East Asia (Table 3). We observed similar patterns for the other three sub-haplogroups, which expanded outside of East Asia and into Siberia, Central Asia and East/North Europe (Table 3). Unfortunately, due to the limited sample sizes used to calculate the STR diversity of different Hg N haplotypes, we are cautious of making any definitive conclusions from STR diversity level data.
In order to date the major prehistoric population events along the northward and westward migration routes of the Hg N lineages, we used the STR data to calculate the STR variation ages of the 5 Hg N sub-haplogroups ( Table 4). As expected, the ancestral lineage under LLY22g (N1*-LLY22g), the oldest among all N-M231 sub-haplogroups, was dated to 21.66 kya, falling in the Upper Paleolithic. The age of N1b-P43 was also very old (18.90 kya), indicating a relatively rapid northward migration during the Paleolithic period from southern China northward into Siberia. N1c-M46 was relatively young (11.70 kya). The age of N*-M231 (13.69 kya), presumably the ancestral lineage of Hg N, is younger than expected, likely as a result of yet-to-be-identified individuals having derived N-M231 sub-haplogroup when new Y SNP markers are uncovered in the future. By comparison, the age of N1a-M128 is strikingly young (3.75 kya), consistent with the observed star-like STR network suggesting a recent expansion of this lineage (Figure 3). Because the reported Central Asian population (Kazakhs) possessing relatively high frequency of N1a-M128 did not have enough STR data to calculate diversity, we were unable to infer the time of N1a-M128's migration from East Asia into Central Asia.

Discussion
Hg N is the most widely distributed Y chromosome haplogroup in Eurasia (Table 1). By extending the population coverage into East Asia, we showed that Hg N is present in most East Asian populations, though the frequencies are low (Table 1 and Table  S1). Previously, Hg N was speculated to have originated in Southeast Asia, and consequently split with its sister haplogroup O-M122 about 34 kya and then migrated northward to mainland Figure 3. Median-joining networks for sub-haplogroups of Hg N lineage using Y-STR alleles. The diagnostic mutations used to classify the sub-haplogroups are labeled on the tree branches. Each node represents a haplotype and its size is proportional to the haplotype frequency, and the length of a branch is proportional to the mutation steps. The colored areas indicate the geographic origins of the studied populations or language groups. doi:10.1371/journal.pone.0066102.g003 East Asia during late Pleistocene-Holocene [3]. However, we demonstrated that Hg N is in fact extremely rare in Southeast Asia populations. For example, in our analysis of 293 multi-ethnic Cambodian males, we only detected one Hg N individual (0.34%), contrasting the previous report of a much higher frequency of one in six males (16.67%) in Cambodia, which was likely caused by a small sample size. Hg N is also rare in other Southeast Asia populations (,1.5%), including those in Laos, Vietnam, Thailand, Indonesia, Malaysia and the Philippines (Table 1), thereby suggesting that Southeast Asia may not be the homeland of Hg N. Instead, the southern part of mainland East Asia (presumably southern China) is more likely the putative origin for Hg N, as reflected by the distribution of ancestral Hg N lineages (N*-M231 and N1*-LLY22g) and the observed higher STR diversity of multiple southern ethnic populations in China ( Table 3). The STR network analysis and contour map further support a southern East Asia origin of Hg N.
As proposed previously, the initial prehistoric migration of Hg N began in the south and moved south to north, starting in southern China. We are now able to draw a relatively more detailed migratory picture for Hg N lineage by estimating the ages of the Hg N haplotypes using STR variations. The initial northward migration probably started around 21 kya, reflected by the age of N1*-LLY22g (21.66 kya), the most prevalent N-M231 sub-haplogroup in East Asia. Along the path of northward migration in mainland China, two other N-M231 sub-haplogroups occurred at about 12-18 kya, later becoming the dominant Ychromosome lineages in Siberian populations as a result of local population expansion. Previously N1b-P43 and N1c-M46 were proposed to have experienced serial bottleneck events in northern East Asia and then dispersed into Siberia, Central Asia and Europe [3]. As the age difference between N1b-P43/N1c-M46 and N1*-LLY22g is comparatively small (3-5 kya), we can infer that the prehistoric migration of Hg N was relatively quick, coinciding with the end of the Last Glacial Maximum (LGM) in East Asia (22-18 kya). The postglacial migration of modern humans in East Asia can likewise be reflected by the northward migration of the C-M130 haplogroup along the coastline of mainland China, before moving further north to Siberia around 15 kya [8][9][10][11].
With the application of next generation sequencing on the Y chromosome, more Y-SNPs will be discovered, which can help increase the resolution of the Hg N haplogroup tee and provide more detailed phylogeographic information about the origin and prehistoric migration of this important Eurasian Y chromosome lineage.

Conclusion
Based on the dating of the Hg N haplotypes and their geographic distributions paired with the suggested counter-clockwise migratory route across Eurasia [3], we proposed a migratory map (Figure 4) of the Hg N lineages beginning in southern China about 21 kya, and expanding into northern China 12-18 kya, reaching further north to Siberia about 12-14 kya [3], and followed by a population expansion and westward migration into Central Asia and East/North Europe around 8.0-10.0 kya [16].  Supporting Information Table S1 The 169 sampled populations in this study. (DOCX)