The authors have declared that no competing interests exist.
Conceived and designed the experiments: HS BS. Performed the experiments: X-BQ HZ YP X-MZ. Analyzed the data: HS X-BQ. Contributed reagents/materials/analysis tools: R-LZM. Wrote the paper: HS BS.
The Y-chromosome haplogroup N-M231 (Hg N) is distributed widely in eastern and central Asia, Siberia, as well as in eastern and northern Europe. Previous studies suggested a counterclockwise prehistoric migration of Hg N from eastern Asia to eastern and northern Europe. However, the root of this Y chromosome lineage and its detailed dispersal pattern across eastern Asia are still unclear. We analyzed haplogroup profiles and phylogeographic patterns of 1,570 Hg N individuals from 20,826 males in 359 populations across Eurasia. We first genotyped 6,371 males from 169 populations in China and Cambodia, and generated data of 360 Hg N individuals, and then combined published data on 1,210 Hg N individuals from Japanese, Southeast Asian, Siberian, European and Central Asian populations. The results showed that the sub-haplogroups of Hg N have a distinct geographical distribution. The highest Y-STR diversity of the ancestral Hg N sub-haplogroups was observed in the southern part of mainland East Asia, and further phylogeographic analyses supports an origin of Hg N in southern China. Combined with previous data, we propose that the early northward dispersal of Hg N started from southern China about 21 thousand years ago (kya), expanding into northern China 12–18 kya, and reaching further north to Siberia about 12–14 kya before a population expansion and westward migration into Central Asia and eastern/northern Europe around 8.0–10.0 kya. This northward migration of Hg N likewise coincides with retreating ice sheets after the Last Glacial Maximum (22–18 kya) in mainland East Asia.
In recent years, extensive studies of the Y-chromosome lineages in East Asian populations have been conducted and found that the dominant haplogroups O-M175, D-M174, C-M130, and N-M231 in East Asian populations all have a southern origin
By contrast, N-M231, as a sister-clade of O-M175, is relatively less prevalent in East Asian populations (averaging around 6%) (
Region | Populations | Size | N-M231 | N% | References |
South Europeans | 1579 | 0 | 0 | Rootsi, |
|
West Europeans | 361 | 0 | 0 | Rootsi, |
|
North Europeans | 3595 | 1267 | 35.24 | Rootsi, |
|
East Europeans | 2508 | 510 | 20.33 | Derenko, |
|
Caucasus (pooled) | 1404 | 3 | 0.21 | Rootsi, |
|
Turks | 523 | 20 | 3.82 | Rootsi, |
|
Iranians | 185 | 0 | 0 | Derenko, |
|
West Asians | 668 | 23 | 3.44 | Cinnioğlu, |
|
Siberians | 3381 | 1294 | 38.27 | Derenko, |
|
Central Asians | 824 | 53 | 6.43 | Derenko, |
|
Koreans | 297 | 10 | 3.37 | Hammer,et al.2006; Derenko,et al, 2007; Rootsi,et al, 2007; Zhong, et al. 2011; present study | |
Japanese | 877 | 16 | 1.82 | Rootsi, |
|
Altai (Northeastern China) | 874 | 78 | 8.92 | Hammer, |
|
Altai (Northwestern China) | 377 | 13 | 3.45 | present study | |
Tibetans | 2459 | 147 | 5.98 | Rootsi, |
|
Northern Han | 947 | 69 | 7.29 | Rootsi, |
|
Southern Han | 1114 | 82 | 7.36 | Hammer, |
|
Taiwan Aborigines | 139 | 1 | 0.72 | Hammer, |
|
Taiwan Chinese | 110 | 6 | 5.45 | Rootsi, |
|
Tibeto-Burmans (Southwestern China) | 409 | 57 | 13.94 | Rootsi, |
|
Hmong-Miens (Southwestern China) | 477 | 6 | 1.26 | Rootsi, |
|
Daic people (Southwestern China) | 528 | 17 | 3.22 | Rootsi, |
|
Austro-Asiatic people (Southwestern China) | 155 | 16 | 10.32 | Zhong, |
|
Cambodians | 371 | 1 | 0.27 | Rootsi, |
|
Laotians | 803 | 4 | 0.50 | Cai, |
|
Vietnamese | 285 | 4 | 1.40 | Rootsi, |
|
Thai | 17 | 0 | 0 | He, |
|
Indonesian | 2291 | 2 | 0.09 | Rootsi, |
|
Malaysians | 72 | 0 | 0 | Rootsi, |
|
Filipinos | 135 | 0 | 0 | Rootsi, |
|
Southeast Asians | 230 | 3 | 1.30 | Hammer, |
|
South Asians | 2505 | 2 | 0.08 | Rootsi, |
|
Oceanians | 646 | 0 | 0 | Rootsi, |
In the present study, we systematically analyzed Hg N profiles in East Asia and Southeast Asia populations (a total of 6,371 males from 169 geographic populations) to trace the origin and prehistoric migration patterns of the Hg N lineage.
A total of 6,371 unrelated males from 169 populations in East Asia (
Population details are given in Table S1.
According to the hierarchical genotyping strategy, M231 was typed first and samples from the M231-positive individuals were then subjected to further subtyping, according to the high-resolution Y chromosomal haplogroup tree so that they could be assigned to a specific haplotype
To visualize the geographic distributions of Hg N and its sub-lineages, Golden Software Surfer 10.0 (Golden Software Inc., USA) with the Kriging algorithm was used to construct a contour map, and the data used was listed in
Median-joining networks for STR variations of the Y-chromosomal haplogroups were constructed using NETWORK 4.6 (Fluxus Engineering)
For each Y-chromosomal haplogroup/sub-haplogroup (defined by Y-SNPs), we estimated its age by Y-STR variations using the published method
The genetic diversity of the different geographic populations under Hg N and its sub-haplogroups were calculated using STR data by GenAlEx 6.5
For the analysis of Y-chromosomal STR alleles, DYS389II was named DYS389b after subtracting DYS389I because the PCR product of DYS389II contains both DYS389II and DYS389I loci.
We systematically screened a total of 6,371 unrelated males from 169 populations in China and Cambodia (
Population | Sample size | N% | N*-M231 | N1*-LLY22g | N1a-M128 | N1b-P43 | N1c-M46 |
Altai (Northeastern China) | 198 | 10.10 | 4.55 | 1.01 | 2.02 | 2.53 | |
Altai (Northwestern China) | 377 | 7.43 | 0.53 | 2.12 | 0.27 | 0.53 | |
Koreans | 64 | 6.25 | 3.13 | 1.56 | 1.56 | ||
Northern Han | 853 | 6.80 | 0.23 | 4.22 | 0.47 | 1.64 | |
Southern Han | 876 | 6.74 | 1.26 | 3.54 | 0.57 | 0.11 | 0.80 |
Tibetans | 2442 | 5.90 | 0.04 | 5.32 | 0.08 | 0.04 | 0.41 |
Tibeto-Burmans | 325 | 12.92 | 0.62 | 7.38 | 3.08 | 0.92 | |
Hmong-Miens | 308 | 1.95 | 0.32 | 0.65 | 0.32 | 0.65 | |
Daic people | 463 | 3.67 | 1.51 | 1.94 | 0.22 | ||
Austro-Asiatic people (Southwestern China) | 100 | 11.00 | 5.00 | ||||
Austro-Asiatic people (Cambodian) | 293 | 0.34 | 0.34 | ||||
Austronesians | 72 |
Note: samples were merged by language families.
Hg N is prevalent (>5%) in East Asia (e.g., among Han Chinese, Tibeto-Burman and Austro-Asiatic speaking populations), as well as in northern/central Asia and eastern/northern Europe with on average the highest frequency in Siberia (38.27%). Meanwhile, Hg N is relatively rare in southeastern, southern and western Asia, and completely absent in southern/western Europe. Within the Hg N lineage, there are 5 sub-haplogroups with distinctive geographic distributions. N*-M231 is presumably the ancestral haplogroup in Hg N, mostly present in southern East Asian populations including Daic, southern Han Chinese, Tibeto-Burman and Hmong-Mien in southern China (
A, N*-M231, B, N1*-LLY22g, C, N1a-M128, D, N1b-P43, E, N1c-M46 (Tat). (The regional populations used is listed in
The diagnostic mutations used to classify the sub-haplogroups are labeled on the tree branches. Each node represents a haplotype and its size is proportional to the haplotype frequency, and the length of a branch is proportional to the mutation steps. The colored areas indicate the geographic origins of the studied populations or language groups.
We constructed contour maps of the five N-M231 sub-haplogroups based on the geographic distributions of these lineages in Eurasian populations (
To examine the detailed diversity of each N-M231 sub-haplogroup, we constructed STR networks for the 5 sub-haplogroups based on data of 7 Y-chromosome STR loci (
Further comparison of the STR variation levels among the different populations also supports an East Asia origin of the Hg N. For the two ancestral lineages, N*-M231 and N1*-LYY22g, the STR diversity of southern populations is higher than northern populations in East Asia (
Haplogroup | Populations | Sample size | Y-STRs diversity ± SE |
N* | Northern Chinese | 4 | 0.268±0.100 |
Southern Chinese | 27 | 0.332±0.070 | |
N1* | Altai (Northeastern China) | 18 | 0.437±0.065 |
Han Chinese (mainland China) | 68 | 0.506±0.056 | |
Tibeto-Burmans (Southwestern China) | 154 | 0.437±0.063 | |
Hmong-Meins, Daic and Austro-Asiatic people (Southwestern China) | 18 | 0.475±0.050 | |
N1a | Altai (Northwestern China) | 5 | 0.206±0.076 |
Han Chinese (mainland China) | 11 | 0.201±0.051 | |
Tibeto-Burmans (Southwestern China) | 12 | 0.087±0.031 | |
N1b | Altai (Northwestern China) | 6 | 0.286±0.056 |
Siberians | 92 | 0.193±0.071 | |
Europeans | 38 | 0.303±0.084 | |
N1c | Altai (Northwestern China) | 8 | 0.286±0.066 |
Han Chinese (mainland China) | 21 | 0.277±0.074 | |
Tibeto-Burmans (Southwestern China) | 13 | 0.519±0.021 | |
Hmong-Meins, Daic and Austro-Asiatic people (Southwestern China) | 6 | 0.143±0.071 | |
Siberians | 119 | 0.283±0.054 | |
Europeans | 944 | 0.352±0.055 |
In order to date the major prehistoric population events along the northward and westward migration routes of the Hg N lineages, we used the STR data to calculate the STR variation ages of the 5 Hg N sub-haplogroups (
Haplogroup | Sample size | Age of STR variation (Kya ± SE) |
N*-M231 | 31 | 13.69±3.37 |
N1*-LLY22g | 258 | 21.66±4.48 |
N1a-M128 | 28 | 3.75±0.94 |
N1b-P43 | 136 | 18.90±7.73 |
N1c-M46 | 1111 | 11.70±1.87 |
Hg N is the most widely distributed Y chromosome haplogroup in Eurasia (
As proposed previously, the initial prehistoric migration of Hg N began in the south and moved south to north, starting in southern China. We are now able to draw a relatively more detailed migratory picture for Hg N lineage by estimating the ages of the Hg N haplotypes using STR variations. The initial northward migration probably started around 21 kya, reflected by the age of N1*-LLY22g (21.66 kya), the most prevalent N-M231 sub-haplogroup in East Asia. Along the path of northward migration in mainland China, two other N-M231 sub-haplogroups occurred at about 12–18 kya, later becoming the dominant Y-chromosome lineages in Siberian populations as a result of local population expansion. Previously N1b-P43 and N1c-M46 were proposed to have experienced serial bottleneck events in northern East Asia and then dispersed into Siberia, Central Asia and Europe
With the application of next generation sequencing on the Y chromosome, more Y-SNPs will be discovered, which can help increase the resolution of the Hg N haplogroup tee and provide more detailed phylogeographic information about the origin and prehistoric migration of this important Eurasian Y chromosome lineage.
Based on the dating of the Hg N haplotypes and their geographic distributions paired with the suggested counter-clock-wise migratory route across Eurasia
The shaded areas represent the haplogroup N distributions.
(DOCX)
(DOCX)
(DOCX)
We are grateful to all the volunteers who donated blood samples for this study.