Kazak mitochondrial genomes provide insights into the human population history of Central Eurasia

As a historical nomadic group in Central Asia, Kazaks have mainly inhabited the steppe zone from the Altay Mountains in the East to the Caspian Sea in the West. Fine scale characterization of the genetic profile and population structure of Kazaks would be invaluable for understanding their population history and modeling prehistoric human expansions across the Eurasian steppes. With this mind, we characterized the maternal lineages of 200 Kazaks from Jetisuu at mitochondrial genome level. Our results reveal that Jetisuu Kazaks have unique mtDNA haplotypes including those belonging to the basal branches of both West Eurasian (R0, H, HV) and East Eurasian (A, B, C, D) lineages. The great diversity observed in their maternal lineages may reflect pivotal geographic location of Kazaks in Eurasia and implies a complex history for this population. Comparative analyses of mitochondrial genomes of human populations in Central Eurasia reveal a common maternal genetic ancestry for Turko-Mongolian speakers and their expansion being responsible for the presence of East Eurasian maternal lineages in Central Eurasia. Our analyses further indicate maternal genetic affinity between the Sherpas from the Tibetan Plateau with the Turko-Mongolian speakers.


Estimation of Demographic History
For the estimation of demographic history via the Extended Bayesian Skyline Plot (EBSP) method of BEAST [1][2][3], mitochondrial genomes were first partitioned into HVS-I, HVS-II, and coding region. Thereafter, the coding region was divided again based on codon start positions. Different mutation rates were used for the coding and control regions of mtDNA. For the control region (i.e., HVS), 31.43 x10 -8 µ/site/year and for the coding region, 1.71 x10 -8 µ/site/year were used ( Table 1). As shown in EBSP (Figure 1), there was a sharp increase in effective population size of the ancestors of Jetisuu Kazaks about 25 kya. This date predates the Last Glacial Maximum (LGM), implying that the ancestral population(s) was not affected by glaciation in the northern hemisphere and might have experienced population growth due to rich natural resources for subsistence. However, this may hold true under the assumptions: (1) Maternal gene pool of Kazaks was formed 25 kya and maternal lineages remained unchanged until now; (2) Mutation rates of mtDNA used in EBSP are reliable; and (3) the mitochondrial sequences under investigation have never been subject to selection. Our data may violate the assumption (1). It is evident from mitochondrial haplogroups found in Jetisuu Kazaks ( Table 1) that although East Eurasian lineages comprised the core maternal gene pool of Kazaks, West Eurasian lineages might have been introduced to the population relatively recently. Age estimations of individual haplogroups are relatively young ( Table 2) compared to the time period of population expansion estimated by EBSP.

Coalescence Time Estimation
As given in Table 2, the estimated TMRCA (Time to the Most Recent Common Ancestor) of haplogroup C was 55795 years, that is the oldest haplogroup in Kazaks. However, the estimated TMRCA was obtained by analyzing 15 haplogroup C sequences, including a single C7 sequence. The presence of C7 in Kazaks must have resulted from a recent admixture. Both C4 and C5 are about 20,000 years old, while their combined TMRCA is about 29,000 years ( Table 2). The haplogroups with the largest TMRCAs include F1 (~31K), G2 (~36K), U4 (~45K), and U5 (~30K). These haplogroups seem to be older than haplogroups A, B5, C4, C5, and D4, which constitute the core of maternal gene pool of Kazaks. West Eurasian lineages T2, H, and HV are about 23K, 15K, and 21K years old, respectively.

Additional Lineages of Haplogroups C, D, and T
C4a2 has three branches, and two of them, C4a2a and C4a2c, have representatives from Jetisuu Kazaks (Figure 2). C4a2a is comprised of three newly named subbranches, C4a2a1, C4a2a2, and C4a2a3. C4a2a3 is represented by a single sequence (JA143), while C4a2a1 has multiple sub-branches represented by Altaian, Buryat, Kazak, and Uyghur sequences, clearly suggesting C4a2a as one of the maternal founder lineages of Turko-Mongolian speakers. The sub-branches of C4a2a1 were named as C4a2a1a through C4a2a1h. The Jetisuu Kazak sequence SH130 formed a subclade (C4a2a1c) together with an Altaian sequence. As shown in Figure 2, under C4a2c1, one Kazak sequence (JA101) clustered together with sequences from a Tajik sequence and a sequence from India (FJ383607, Jenu Kuruba [6]). Within C4a2c3, there were two Kazak sequences, AL009 and JA135, which were named C4a2c3a and C4a2c3b, respectively. The C4a2c3a Kazak sequence has a sister sequence in Kirghiz (Figure 2). A Barghut sequence (FJ951548) formed the basal branch of C4a2c3, implying an eastern origin for the haplotype. C4a2c2, another branch of C4a2c, is represented by a Ladakh sequence (HM036530) from India [7]. The tree topology as given in Figure 2, may imply that C4a2 originated in a Turko-Mongolian stock population in Central Asia or Siberia. Besides, the presence of C4a2c2 in India could be viewed as genetic legacy of the steppe nomads from Central Asia.
A single Jetisuu Kazak sequence (JA013) belonged to the subbranch of C4b6 (Figure 3). C4b6 forms a single branch represented by two sequences (Alt202 and Tuba7) in the current version of Phylotree (v17.0). mtDNA sequences included in our dataset have produced additional twigs of the branch, as given in Figure 3. The branch formed by the Kazak sequence (JA013)  from Jetisuu was named as C4b6b1. C4d was comprised of three sequences, one each in a Kazak, Tibetan and Uyghur. The branch formed by the Kazak sequence (SH106) was named C4d1a (Figure 4). The genetic link between Kazaks and Tibetans could be made possible through either Kirghiz or Uyghurs.
As shown in Figure 6, D4a3b had three subbranches, one represented by a Japanese (AP009445), another by a Han Chinese (AY255160), and a third by a Jetisuu Kazak. Thus, D4a3b has a wide geographic distribution. The Jetisuu Kazak sequence, JA110, was named D4a3b3.  Subbranch of D4b, D4b2b, has sequences in multiple ethnic groups including Altaians, Buryats, Kazaks, Tatars, Tubas (Tuvans), Uyghurs, and Japanese (e.g. AP008264) (Figure 7). Thus, D4b2b could be considered as a founder haplotype for Altaic speakers. As shown in Figure 8, two Kazak sequences belonged to D4j, one from the current study, and one from a previously published study [8]. The clade is diverse, with D4j5 and D4j8 being present in Buryat, Kirghiz, and Uyghur sequences, along with those of Kazaks, suggesting a common Turko-Mongolian origin. Interesting, D4j5a was represented by a Yukaghir sequence (EU482325). Yukaghir language is believed to belong to the Uralic language family, or at least, distantly related to the Uralic language family. As shown in Figure 9, another Tatar sequence (Tat7.BM10) is ancestral to T1a1 subbranches. This may reflect the geographic origin of haplogroup T in Central Asia and Siberia. T1a and T2b were discovered in archaeological samples of Yamna Culture sites [9] in Volga region, Russia, whence the Tatars were sampled [10].