Haplotype diversity in mitochondrial DNA reveals the multiple origins of Tibetan horse

The Tibetan horse is a species endemic to the Tibetan plateau, with considerable economic value in the region. However, we currently have little genetic evidence to verify whether the breed originated in Tibet or if it entered the area via an ancient migratory route. In the present study, we analyzed the hypervariable segment I sequences of mitochondrial DNA (mtDNA) in 2,050 horses, including 290 individuals from five Tibetan populations and 1,760 from other areas across Asia. Network analysis revealed multiple maternal lineages in the Tibetan horse. Component analysis of sub-lineage F3 indicated that it decreased in frequency from east to west, a trend reflected both southward and northward from Inner Mongolia. Analysis of population genetics showed that the Deqen horse of eastern Tibet was more closely related to the Ningqiang horse of northern China than to other Tibetan horses or the Yunnan horse. These results indicated that the Tibetan horse migrated first from Central Asia to Mongolia, moved south to eastern Tibet (near Deqen), then finally westward to other regions of Tibet. We also identified a novel lineage K that mainly comprises Tibetan and Yunnan horses, suggesting autochthonous domesticated origin for some Tibetan horse breeds from local wild horses. In conclusion, our study demonstrated that modern Tibetan horse breeds originated from the introgression of local wild horses with exotic domesticated populations outside China.


Introduction
Tibetan horses currently number at over 270,000 individuals [1] and are mainly distributed in the Qinghai-Tibet Plateau, the highest ecosystem in the world (average elevation >4,000 m). With outstanding adaptability to extreme, high-altitude environments, the Tibetan horse has historically been of high economic value for Tibetans, playing a critical role in transportation, stock farming, and trade (e.g., horse-silk trade with Han Chinese during the Tang and Song dynasties, 618~1279 AD). Archaeological records indicate that horses have been present in Tibet at least since the Neolithic period over 4,000 years ago. Generally, indigenous domestic a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 animals in Tibet are thought to have been introduced with the migration of Tibetan ancestors from northern China approximately 7,000 years ago [2,3]. Research based on mitochondrial DNA (mtDNA) revealed that Tibetan mastiff [4], Tibetan sheep [5] and Tibetan chicken [6] were derived from outside of Tibet, and Tibetan yak [7] and Tibetan pig [8] were appear to have a local origin. However, we currently have no direct genetic evidence regarding whether Tibetan horses originated in the region or were descended from exotic breeds.
The earliest horse domestication is widely accepted to have occurred on the Eurasian steppes of Kazakhstan between the fifth and fourth millennium BC [9]. Furthermore, as they spread out from their original range, domestic horses were repeatedly restocked with local wild horses [10], and multiple domestication events or introgressions of wild females were thought to have occurred [11][12][13][14][15][16][17]. Thus, most modern horses have several maternal origins that can be divided into seven major lineages (A~G) and 19 clusters [13][14][15][16]18]. Lineage F is more common in Eastern populations, supporting an ancestral line that entered the domestic horse gene pool from East Asia [13,14]. Existing Chinese domestic horses are grouped into lineage F and a new lineage H in the southwest, suggesting a combined origin of horses introduced from outside of China and local wild horses [19]. However, it is not clear whether this conclusion applies to Tibetan horses.
A previous study using seven mtDNA sequences tentatively concluded that Tibetan horses have multiple maternal lineages, similar to other horse breeds [20]. To verify this outcome, here we performed a comparative mtDNA analysis of the Tibetan horse with relevant populations across Asia. Our results should elucidate the origin of this ancient breed, clarifying the degree and nature of introgression from exotic horses.

Sequence alignment and sequence variation
Available mtDNA sequences were trimmed to be as long as possible for improved accuracy in genetic analysis. A 421 base-pair (bp) segment that was the hypervariable segment I region of mtDNA was trimmed from 15436-15856 bp (Genbank accession no. X79547) for all 721 sequenced horses in this study to analyze genetic diversity. The accession numbers of 721 sequences had been uploaded Genbank (MH032886~MH033606). Additionally, a 247 bp (15494-15740 bp) segment that was used to study maternal origin of horse in many studies [13][14][15][16] was trimmed for all 2,050 sequences to examine evolutionary relationships. Sequences were aligned in Muscle 3.8.31 [21] on the Linux virtual server, then trimmed and edited in BioEdit (http://www.mbio.ncsu.edu/bioedit/bioedit.html). The number and proportion of substitutions, transition/transversion ratio, number of haplotypes, nucleotide diversity (Pi), and haplotype diversity (h) were calculated in DnaSP v.5.1 [22]. The major maternal lineages The origin of Tibetan horse are defined by specific diagnostic mutations. Component analysis for lineage frequencies in each breed or population was estimated through counting.

Phylogenetic analysis
Network 4.0 (http://www.fluxus-engineering.com) was used to construct median joining networks for phylogenetic analysis of all haplotypes. Default parameter settings were used, except for the following. Star-like phylogenetic clusters were contracted into one ancestral type, epsilon (0) was chosen, weights of mutational hotspot site (15650) identified from previous studies [16,18] and mutational hotspot site (15600) from the present work were set to zero, and haplotypes over 2% were shown in final networks [19]. Bayesian consensus trees using Monte Carlo Markov chains (MCMC) were also constructed in MrBayes 3.1 [23] to infer the evolutionary relationship among haplotypes. Two independent runs were performed, each with two million generations, starting from a random tree. The sampling frequency was set to 1/1,000 with the first 20% tree discarded as burn-in. Average standard deviation of split frequencies was <0.05 to ensure clustering quality.

Detection of genetic relationships among horse populations
Fifteen horse populations (12 populations in this study, plus populations from Central Asia; Guizhou, 26˚36 0 N, 106˚42 0 E; and Guangxi, 23˚54 0 N, 106˚36 0 E) were used to analyze genetic relationships across geographical distribution. Small or geographically close populations were merged to ensure reliability of the results. For example, the NQ population included horses from Ningqiang, Guanzhong, and Guan mountain. Pairwise mean distances among the 15 populations were calculated using the distance option in MEGA 7.0 [24], with a bootstrap value of 1,000. Based on the resultant pairwise population mean distances matrix, a neighborjoining phylogenetic tree was then constructed by Phylip 3.695 [25].

Genetic diversity of Tibetan horses
We detected 91 haplotypes with 57 polymorphic sites in 272 Tibetan horses (S2 Table). The variable sites included 53 transitions (A/G and T/C) and four transversions (A/T). Maximum and minimum haplotype counts were 33 in Xigazê and 21 in Zogang, respectively. In Tibetan horses, nucleotide diversity ranged from 0.01798 ± 0.00113 in Qamdo to 0.02266 ± 0.00073 in Xigazê, while haplotype diversity (h) ranged from 0.920 ± 0.025 in Zogang to 0.977 ± 0.010 in Xigazê, suggesting considerable genetic diversity in Tibetan horse populations (Table 1). Overall, genetic diversity was highest in northern China horses, followed by Tibetan horses and then Yunnan horses, although nucleotide and haplotype diversity in the latter two populations were also relatively high.

Phylogenetic analysis of Chinese horses
Across 2,050 individuals, we found 262 haplotypes (S3 Table) for constructing a phylogenetic network (Fig 2). Nine major lineages (A-G, X4, and K) were identified. Eight lineages (A-G and X4) were named based on previous studies [15,16,18], while K was a novel lineage named in this study (S4 Table). Tibetan horses were in most of these lineages except G (Fig 2 and S3  Table).
Lineage K including three haplotypes was almost entirely composed of Chinese indigenous horses (98.3%, 121 individuals) (S3 Table), with 72.7% being Tibetan and Yunnan horses. Mirroring the genetics, most lineage K horses were distributed along the border of Yunnan and Tibet. Lineage X4 [18] could be divided into two sub-lineages both containing many Chinese native horses, especially those from Yunnan.
The Bayesian consensus tree including 262 haplotypes (S1 Fig) was consistent with the phylogenetic network analysis for major lineages. Lineages B, D, E, G, and X4 were clearly separated from lineage A. However, lineages C, F, and K were unexpectedly clustered with lineage A, likely because the diagnostic mutational motifs were similar across these four lineages.

Lineage F3 in the whole Asia horse populations
Lineage F included many Chinese indigenous horses and can be divided into three sub-lineages: F1, F2, and F3. Respectively, they contained 42.1%, 53.6%, and 55.4% Chinese horses. In F3, we found five ancient horse individuals from Siberia (FJ204322; >4,000 years ago) and Inner Mongolia (FJ204337, DQ900930, EU931606, EU931608; >2,500 years ago). Thus, sublineage F3 appeared to have existed at least 4,000 years ago, approximately 1,500 years before the horse's initial domestication in Kazakhstan.
The component analysis of sub-lineage F3 across Asia (Fig 3) revealed that the sub-lineage gradually increased from west to east, peaking in Inner Mongolia. Additionally, starting from Inner Mongolia, F3 decreased gradually southward and northward. Taken together, our results suggest that F3 likely originated in Inner Mongolia or nearby.

Lineage F3 and K in Tibetan and Yunnan populations
The component analysis of lineages F3 and K in ten Tibetan and Yunnan populations ( Table 2) further indicated that DQ and LJ contained 12.8% and 12.0% F3, respectively, a greater percentage than in the other Tibetan and Yunnan populations. Moreover, among the five Tibetan horse populations, the percentage distribution of lineage K was highest (24.4%) in ZG, followed by QD (23.0%), DQ (10.5%), XGZ (10.2%), and NQ-T (9.5%). Among the five Yunnan horse populations, the percentage distribution of lineage K was highest (13.9%) in LJ, then in WS (10.7%), KM (7.1%), TC (5.8%), and ZT (2.9%).

Genetic relationships among Asian populations
Pairwise genetic distances of 15 populations (S5 Table) revealed that the Central Asian and MG horses had genetic distances >0.025, higher than other 13 populations. The neighborjoining tree indicated that these 15 populations separated into two major clusters (Fig 4).      Tibetan horses (QD, ZG, NQ-T, and XGZ) formed their own group, while Yunnan horses (LJ, TC, KM, WS, and ZT) formed a separate group with Guizhou and Guangxi horses. Interestingly, the Tibetan DQ population was more closely related to northern China NQ population and Yunnan horses than to the other four Tibetan populations, possibly because Deqen is at the junction of Tibet and Yunnan.

Discussion
We found that the five Tibetan horse populations can be clustered into most previously reported lineages, suggesting a complex origin. Most Tibetan haplotypes were present in the originating Kazakhstan lineage, indicating that ancestors of Tibetan horses generally immigrated from other regions. Generally, livestock, such as horse and sheep, were usually migrated along with nomadic tribes as they were important transport tools and food resources for people. In comparison, domestic sheep appear to have undergone two migratory waves [26], spreading in the upper and middle reaches of the Yellow River about 3,000-5,000 years ago in conjunction with the Di-Qiang expansion [27]. Data such as these are extremely useful in helping researchers better understand the demographic forces underlying animal domestication and migration throughout history [28,29]. Generally, genetic diversity is higher in a domestic species' originating region, and the frequencies of certain lineages tend to exhibit a geographical gradient as they disperse outwards [13,30]. In this study, we found that Inner Mongolia was likely the ancestral region of sublineage F3, collaborating previous reports suggesting a Far Eastern origin for lineage F overall [13][14][15]18]. Furthermore, we found that the percentage of sub-lineage F3 was highest in Deqen (the boundary between Tibet and Yunnan), decreasing gradually westward and southeastward. However, upon reaching Xigazê, F3 percentage was higher than in adjacent areas. This pattern may be due to gene flow or immigration between Xigazê and other Tibetan or Yunnan horse populations through the ancient Tea-Horse Road. This was a trade route from approximately 1,000 years ago that saw the exchange of tea and other commodities from Yunnan for Tibetan horses.
We also observed a closer genetic relationship between Deqen and Ningqiang than between any of the remaining Tibetan horse populations. Additionally, northern China horse had higher genetic diversity than both Tibetan horse and Yunnan horse, suggesting increased gene flow in northern China. These data corroborate the close association between Qamdo and native Mongolian horses [31], likely reflecting introgression from northern horses into Tibetan breeds. Historical records provide some clues regarding the likely path of gene flow. Notably, the South Silk Road of the Tang dynasty (618~907 AD) connected central China with the Tibetan plateau, beginning from Xi'an of Shanxi province, then passing through Gansu, Qinghai, Qamdo, Deqen, westward to Xigazê, and finally reaching Nepal. Furthermore, the Mongolian westward expansion on horseback during the Yuan dynasty likely exerted a direct genetic influence on Tibetan horses. Considering this evidence, we propose that the migratory route of some ancestral Tibetan horses began in Central Asia, moved up to Inner Mongolia, then south to eastern Tibet (Deqen), and finally spread throughout the entire Tibet Autonomous Region.
Lineage K was novel to this study, but is very similar to a previously reported lineage H [19], comprising most indigenous horses of southwest China. Lineage K was mainly composed of Tibetan and Yunnan horses. Similar to our findings for sub-lineage F3, lineage K percentage was highest in ZoGang, gradually decreasing with westward and eastward movement. Low occurrence and restricted geographical distribution of a lineage indicate that its presence likely represents additional introgression into an existing domestic population rather than independent domestication events [26]. Overall, our data suggest that the domestication of Tibetan horses involved introgression from local wild horses.
To conclude, we demonstrated multiple maternal origins in Tibetan horses and provided evidence of an external migratory route into Tibet. Moreover, we found a novel lineage K with restricted geographical distribution (mainly in southwest China). Its presence in Tibetan populations suggests subsequent introgression into ancient local Tibetan horses and thus the latter group's direct contribution to the domestication process. In summary, our results support an origin for the Tibetan horse that includes introduced breeds from outside of China and introgression from local wild populations.