Domestication Process of the Goat Revealed by an Analysis of the Nearly Complete Mitochondrial Protein-Encoding Genes

Goats (Capra hircus) are one of the oldest domesticated species, and they are kept all over the world as an essential resource for meat, milk, and fiber. Although recent archeological and molecular biological studies suggested that they originated in West Asia, their domestication processes such as the timing of population expansion and the dynamics of their selection pressures are little known. With the aim of addressing these issues, the nearly complete mitochondrial protein-encoding genes were determined from East, Southeast, and South Asian populations. Our coalescent time estimations suggest that the timing of their major population expansions was in the Late Pleistocene and significantly predates the beginning of their domestication in the Neolithic era (≈10,000 years ago). The ω (ratio of non-synonymous rate/synonymous substitution rate) for each lineage was also estimated. We found that the ω of the globally distributed haplogroup A which is inherited by more than 90% of goats examined, turned out to be extremely low, suggesting that they are under severe selection pressure probably due to their large population size. Conversely, the ω of the Asian-specific haplogroup B inherited by about 5% of goats was relatively high. Although recent molecular studies suggest that domestication of animals may tend to relax selective constraints, the opposite pattern observed in our goat mitochondrial genome data indicates the process of domestication is more complex than may be presently appreciated and cannot be explained only by a simple relaxation model.


Introduction
The goats (Capra hircus) are one of the oldest domesticated animals, and based on archaeological evidence are thought to have been domesticated initially in the Fertile Crescent (<10,000 years ago) [1]. A recent molecular study by Naderi et al. [2] suggested that goats were domesticated from bezoars (C. aegagrus) in West Asia. Afterward, goats spread globally and played an important role in the Neolithic agricultural revolution and advance of human civilization. Nowadays, goats are distributed on all continents excluding Antarctica, and are also found on many peripheral and remote islands. About 840 million goats are kept in the world spanning humid tropical rain forest regions, dry, hot desert regions, and cold, hypoxic high altitude regions [3], and provide essential sources of meat, milk, and fiber.
In the last decade, numerous detailed molecular phylogeographic studies of goats have been carried out to clarify their origin of domestication and their transportation routes [2,[4][5][6][7][8][9][10] mainly based on mitochondrial D-loop sequences. These extensive studies revealed that there are six major haplogroups in the mitochondrial lineages of goats, namely haplogroup A, B, C, D, F, and G. According to Naderi et al. [9], haplogroup A is the most frequent haplogroup and more than 90% of goats inherit this haplogroup, and are largely distributed throughout the Old World. In addition, goats in the New World (South and Central America) all belong to haplogroup A [10]. In contrast, the other haplogroups show regional distributions [9]. The distribution areas of haplogroup B (5.92% of all goats [9]) are mainly in East and Southeast Asia. Interestingly, this haplogroup is dominant in Southeast Asia in contrast to haplogroup A being dominant in other regions. Chen et al. [8] suggested a secondary origin of domestication in China, and haplogroup B arose in this region. From the distribution area of the bezoars and from the results of the molecular phylogeographic study by Naderi et al. [2], Chen et al [8]'s hypothesis seems unlikely because the bezoar is not distributed in East Asia [11], and haplogroup B can be observed in the bezoar from West Asia [2]. According to Naderi et al. [9], haplogroup B can be divided into the sub-haplogroup B1 and sub-haplogroup B2.
Haplogroup C (1.44%) is mainly distributed in the European region. Haplogroup D (0.54%) is mainly in South and Central Asia. Haplogroup F (0.12%) is in Sicily, and Haplogroup G (1.11%) is in West Asia. Naderi et al. [2,9] showed that the phylogenetic relationships among these haplogroups are (F,(C,(B,(G,(D,A))))) based on the mitochondrial D-loop, and studies based on the D-loop are essentially all concordant with these relationships.
Despite this phylogenetic concordance, the coalescent times of these haplogroups remain controversial. The estimated time of the MRCA (most recent common ancestor) of goats (based on the split of haplogroup C from others in the simplified sampling scheme of early studies) were from 201,380 to 597,806 years ago [4,6,8]. These estimates were mainly based on 3 rd codon positions of the mitochondrial cytochrome b gene assuming a divergence time between goat and sheep at 5,7 Ma [12,13]. The timing of the population expansion of goats is also unclear. Luikart et al. [4] estimated the timing of population expansion of haplogroup B (<2,130 years ago) and haplogroup C (<6,110 years ago) assuming that the expansion of haplogroup A occurred <10,000 years ago (approximate time for initial domestication). However, Fang and Anderson [14] suggested that population expansion of the Asiatic pigs occurred around 275,000 years ago and about 190,000 years ago for European pigs which drastically predates the beginning of pig domestication (about 9,000 years ago) based on analysis of the D-loop. To avoid circular arguments, the assumed date for calibration should be independent from the evidence for domestication events. In this case, the biological or geological events at the inter-species level should be used as calibration points. Horai et al. [15] demonstrated that the use of mitochondrial genomes can provide accurate high-resolution estimates of intra-species coalescent times as compared to only D-loop sequence. Thus, in this study we used nearly complete mitochondrial protein coding genes to estimate the coalescent times of the major haplogroups of goats.
The second point that requires clarification is the difference in selection pressure among different lineages which can also be revealed accurately by complete mitochondrial genome analysis. Recently, Björnerfeldt et al. [16] and Wang et al. [17] indicated a higher v (non-synonymous to synonymous rate ratio) in domesticated animals compared with their wild progenitors. They interpreted this to indicate a relaxation of selective constraints during domestication. However, little is known about the difference between selective constraints at early versus late phases of the domestication process and goats provide a valuable opportunity for investigating this issue. As mentioned above: (1) goats have a long domestication history; (2) goats are distributed in variable environments all over the world and haplogroup A covers all areas, whereas the other haplogroups are distributed regionally; and (3) multiple maternal lineages were involved in the domestication process.
The aim of this study is to analyze a nearly complete set of mitochondrial protein encoding genes to reveal: 1) the molecular evolutionary circumstances prior to goat domestication (e.g., the coalescent times of major lineages and the timing of population expansion); 2) the process at the onset of domestication (e.g., the bottleneck and selection profile in wild progenitors); and 3) the posterior profile of domestication (e.g., the differential selection pressures before, during and after domestication).

Phylogenetic tree
The NJ tree based on the mitochondrial D-loop is shown in Figure 1. Our samples were classified into four haplogroups, namely A, B, C and D. None of our samples came from the haplogroup F or haplogroup G. Monophyly of each haplogroup was supported with relatively high bootstrap values (.80%) except for haplogroup A (46%).
Haplogroup A consists of the breeds from Mongolia (Mongolian indigenous goat), Japan (Japanese Saanen), Korea (Korean indigenous goat), Indonesia (Etawa), Bangladesh (Black Bengal) as well as two wild individuals of the bezoar (C. aegagrus). Haplogroup B consists of the breeds from Indonesia (Kambing Katjang, Etawa), Bangladesh (Black Bengal), and the Philippines (Philippine indigenous goat). All of them are belonging to the subhaplogroup B1. Haplogroup C consists of the breed from Mongolia (Mongolian indigenous goat). Haplogroup D also consists of the breeds from Mongolia (Mongolian indigenous goat) ( Table 1). In previous studies, samples from Southeast Asia were quite limited (e.g., [9]) especially from island nations such as Indonesia and the Philippines. The ML tree based on the nearly complete mitochondrial protein-encoding genes ( Figure S1) also shows a largely concordant branching pattern with the D-loop data. In the study of Naderi et al. [9] on the basis of D-loop data, bootstrap support values for haplogroup A were low (53%), and substantially higher values (94%) were obtained in the present analysis. The phylogenetic relationships among haplogroups are harmonious with previous studies [2,4,[6][7][8][9]. Although our samples do not include haplogroup F or G, the bootstrap values of the internal nodes are relatively high (94% for haplogroup A+D; 99% for haplogroup A+D+B).
Recently, Hassanin et al. [18] reported that several mitochondrial genome sequences of goats deposited in GenBank, including the reference sequence NC_005044 contain numts (pseudogenes of mitochondrial DNA transferred to the nuclear genome) and an unusually high percentage of sequence errors. These errors would tend to mislead inference of tree topologies. In order to evaluate the possible error caused by numts and other sources of sequence errors, we inferred the phylogenetic tree based on each of the 12 mitochondrial protein-coding genes separately. All 12 trees showed consistent topologies (data not shown), providing evidence that our data do not contain such spurious sequence regions.

Time scale for the evolution and domestication of the goat
The estimated divergence times within the comprehensive evolutionary framework of the Cetartiodactyla are shown in Figure S2. The estimates from amino acid sequences and nucleotide sequences were mostly concordant. Remarkably, our estimates are 1.5 times older than those of Hassanin and Ronpiquet [19]. They assumed the Bovinae/Caprinae splitting at 18.5 Ma (mega annum). This is the younger limit for this splitting [20]. Accordingly, their estimates can be considered as minimal ages.
The Bovinae/Caprinae splitting was estimated to be 25.362.3 Ma (amino acid) and 27.661.0 Ma (nucleotide). The goat/sheep splitting was 14.762.1 Ma and 16.261.2 Ma. This is much older than the fossil calibrations used by Luikart et al. [4] (5,7 Ma). This implies that if this younger fossil age is used as the calibration point, the divergence time will be grossly underestimated. The goat/Gobi ibex splitting was estimated to be 7.961.5 Ma and 7.961.0 Ma, and the goat/markhor splitting was 3.460.9 Ma and 3.460.8 Ma. As mentioned in the Materials and Methods, this age for the goat/markhor splitting was applied as the calibration point to estimate divergence times among extant goat haplogroups.
The estimates are shown in Figure 2. The split between haplogroup C and the other haplogroups was 0.8460.1 Ma, and the split between haplogroup B and haplogroup A + D was 0.3560.08 Ma. These estimates are much older than those of previous studies [4,6,7].
We also estimated the times of MRCAs of each haplogroup. Since our samples widely cover each haplogroup (Figure 1), the MRCAs of our samples are expected to approximate real MRCAs of domestic populations in each haplogroup. The times of the MRCA of haplogroup A was estimated to be 90,9506 16,460 years ago, the haplogroup B (sensu stricto sub-haplogroup B1) was 41,930618,980 years ago, the haplogroup C was 77,350629,570 years ago, and the haplogroup D was 32,300618,910 years ago. Our sample mainly consists of haplogroup A and haplogroup B (sub-haplogroup B1). Both of them show star-like branching patterns (the asterisks in Figure 2) and age estimates of 90,950616,460 years ago (haplogroup A) and 17,21068,900 years ago (sub-haplogroup B1), respectively. v (Non-synonymous rate/synonymous rate) ratios among lineages Based on the ML tree topology from the nearly complete mitochondrial protein-encoding genes, we estimated the v ratios for the inter-species branches (0.0564 for the smaller data, 0.0609 for the larger data), the deep branches (0.0448, 0.0478), and the shallow branches (0.1107, 0.1020). See Materials and Methods regarding the definitions for ''smaller and larger data'' and ''deep and shallow branches''.
Among shallow branches, the v ratios were different for different haplogroups (Figure 3a and Figure S3a). The v ratios of haplogroup A ($ $ A ) were 0.049 (smaller data set) and 0.053 (larger data set). Those of haplogroup B ($ $ B ) were (0.345, 0.387), those of Samples that were used for the small data set indicated by . The v ratios of the deep internal branches were also estimated. These branches were defined as follows: v a (common ancestral branch of haplogroup A), v b (common ancestral branch of haplogroup B), v c (common ancestral branch of haplogroup C), v d (common ancestral branch of haplogroup D), v x (common ancestral branch of haplogroup A+D), and v y (common ancestral branch of haplogroup A+D+B). The estimates of these v ratios are shown in Figure 3b (based on the smaller data set) and Figure S3b (based on the larger data set), respectively. The v ratios for the deep branches were as follows:$ $ a (0.2206, 0.1451),$ $ b (0.1483, 0.1395),$ $ c (0.0246, 0.0360),$ $ d (0.0356, 0.0360),$ $ x (0.0001, 0.0001),$ $ y (0.0001, 0.0001). The average v ratio of these six deep branches was 0.0448,0.0478, as mentioned above.

Differences of selection pressure
The v ratios of the deep branches (0.0448,0.0478) were almost the same as the ratios of inter-species branches (0.0564,0.0609). This implies that the slightly deleterious mutations [21] were mostly swept out from these ancestral lineages [22]. On the other hand, there was a statistically significant difference between the deep branches (0.0448,0.0478) and the shallow branches (0.1020,0.1107) based on a likelihood ratio test (LRT); p-values were 0.032 (smaller data set) and 0.041 (larger data set) ( Table 2: 3v model vs. 2v model). This suggests that most of the slightly deleterious mutations are still retained in the extant populations.
There were also significant differences of v ratios within the shallow and deep branches, respectively. Concerning the shallow branches, the$ $ A (0.049,0.053) is substantially smaller, and$ $ B (0.345,0.387) is substantially higher than those of the other ratios. To evaluate the difference of v in each haplogroup, the LRT was applied comparing the variable levels of heterogeneity of v among lineages. The results are summarized in Table 2. There was no significant difference between v C and v D . Therefore, we assumed these two haplogroups to be homogenous and estimated the average of v C and v D (v C+D ). Moreover the differences between v C +D and v B , and v C+D and v A were also not significant (5v model vs. 4v1 model, see table 2; 5v model vs. 4v2 model data not shown). However, the difference between v A and v B was significant ( Table 2). Insignificant difference between v C+D and v B , or v C+D and v A is probably due to the small sample size in our data set (2 sequences for haplogroup C, and 3 for haplogroup D). This analysis suggests that high selection pressure has continued to operate on haplogroup A.
The results of the McDonald and Kreitman's Test [23] are shown in Table 3. When haplogroup A and other haplogroups were compared, the differences of the synonymous and nonsynonymous substitutions among inter-or intra-haplogroups were not significantly different. However, when the other haplogroups were compared (e.g., haplogroups B vs. haplogroups C, haplogroups B vs. haplogroups D, and haplogroups C vs. haplogroups D), the differences were significant. In the latter cases, the relative non-synonymous substitution numbers were higher in the intra-haplogroup than in the inter-haplogroup. In contrast, the relative non-synonymous substitution numbers among haplogroup A are almost the same with that of the inter-haplogroup. This also supports the hypothesis that most of the non-synonymous substitutions have already been swept out from haplogroup A.
Concerning the deep branches,v v b (0.1395,0.1483) is significantly higher than others, andv v y (0.0001) is significantly smaller than others (Table 4: 4v2 model vs. 3v model, 4v6 vs. 3v model). The extremely small v ratios can be expected in this case due to their association with deep ancestral branches. Althoughv v a (0.1451,0.2206) is relatively large, it was not significantly larger than that of other branches, probably due to its short branch length (Table 4: 4v1 model vs. 3v model). Evolutionary history of domestic goat Our estimated times for the MRCAs of each haplogroup (32,300,90,950 years ago) and the times of nodes with star-like branching pattern (17,210 years ago and 90,950 years ago) substantially predate the beginning of goat domestication (about 10,000 years ago) [1]. Does this mean the population expansions of the wild progenitors of domestic goats occurred prior to the domestication events? Fang and Anderson [14] also reported a similar result for D-loop sequences of domestic pigs. Their mismatch distribution analysis suggested that the expansions of European and Asian domestic pig populations occurred nearly 190,000 and 275,000 years ago, respectively, and these estimated times extensively predate the domestication of the pig (9000 years ago) [24]. In this and Fang and Anderson's study [14], only domestic populations were used. Therefore, we re-analyzed Dloop sequences of the wild bezoars reported by Naderi et al. [2]. Although not significant in most cases, bezoars that are close to domestics (haplogroup A,F, except haplotype D) show negative values for Tajima's D [25]. This implies recent weak population expansion events. Conversely, all bezoars that are not close to domestics (wild haplogroups) show positive values for Tajima's D (Table S1). The results of the Bayesian Skyline Plot analysis also indicate the population expansion events occurred prior to domestication ( Figure 4). Haplogroups A and C show conspicuous, rapid expansions, and haplogroups B and G show slow expansions. In contrast to Tajiama's D, population size of haplogroup F has slowly declined.
To explain the population growth of the wild progenitors, two scenarios are possible. The first is Horwitz's incipient domestication [26]. Naderi et al. [2] suggested that ''this evidence of a population growth suggests a phase of demographic control and protection of some populations of bezoars in the wild, before the isolation of true early domestic herds by humans'' (p.17661-17662). The second is that the population growth was not caused by human activity. It is likely, however, that such a well established, expanded population frequently encountered ancient humans and some of them were involved in the domestication process.
The fluctuation of the population size estimated by the Bayesian Skyline Plot reveals that the wild population not involved in domestication experienced a rapid expansion around 250,000 years ago, and subsequently remained relatively constant in size for a long time. From around 10,000 years ago, the population size suddenly showed a rapid decrease. The timing of this decline is nearly concordant with the beginning of domestication [1]. It is possible that this was due to human activities such as hunting, destruction of suitable habitat for the wild population, and resource competition with domestic populations.
Since our time estimates suggest the MRCAs of each haplogroup extensively predated the beginning of domestication,  the difference of the v ratios of the ''deep'' branches do not appear to bear directly on the process of domestication (e.g., humanmediated transportation, artificial selection of breeds, etc.), but rather on prior historical events (e.g., natural selection on wild progenitors and isolation of the wild progenitor population at the beginning of domestication). The extremely small value ofv v x and v v y (0.0001) can be attributed to these being deep internal branches in which deleterious mutations have been largely swept out leaving mostly neutral fixed mutations. The large value ofv v b (0.1483) may be related to the isolation of wild progenitors at the beginning of domestication. In the case of haplogroup A, domestic goats seem to have been chosen widely from many lineages of wild progenitors ([2]'s Fig. 1). Similar patterns can be observed in haplogroups D and G. In contrast, domesticates in haplogroup B most likely were sampled more narrowly from a specific lineage of wild progenitors. We hypothesize that this biased isolation may have caused a severe bottleneck effect.
According to the nearly neutral hypothesis [21,27,28], negative selection is strict and slightly deleterious mutations are removed quickly in a large population, however, as population size gets small, slightly deleterious mutations behave as if neutral. Hence slightly deleterious mutations could be easily fixed in haplogroup B. Although similar biased isolation can be observed in haplogroup C ([2]'s Figs. 1 and 2), the v ratio of this deep branch is smaller than those of other deep branches (v v c = 0.0246: smaller data). Naderi et al. suggested that domestic goats that inherit haplogroup C originated from a wild population in Eastern Turkey and that this population had relatively recently differentiated from a much larger Iranian (Zagros) population ([2]'s Fig. 2). Probably, after differentiation, most of the slightly deleterious mutations were swept out from the Turkish population for an unknown reason.
On the other hand, the difference of the v ratios of the ''shallow'' branches seems directly related to the process of domestication. Here, we would like to focus attention on the small v ratio of the haplogroup A. There are approximately 840 million goats in the world [3]. Naderi et al. [9] showed that more than 90% of them inherit the mitochondrial type represented by haplogroup A, and this is also the dominant type in most regions of the Old World. Moreover, Amills et al. [10] reported that all of the Central and South American goats inherit this type of mitochondria. In contrast, the population sizes of other haplogroups are much smaller and the distribution areas are limited. To take an example, sub-haplogroup B1 (second largest haplogroup) is inherited by 4.4% of the goats and their distribution areas are limited to East, Southeast, and South Asia. This can also be explained in the framework of the nearly neutral theory. As we mentioned above, the population size of haplogroup A is very large. In addition, it is the most dominant in the early stages of domestication (no less than 87% at the beginning of domestication: [2]). Accordingly, it can be implied that negative selection has been strict in such a large population and thus most of the slightly deleterious mutations would be removed by natural selection. Björnerfeldt et al. [16] and Wang et al. [17] reported higher v ratios in domesticated animals than their wild progenitors, and suggested relaxation of selective constraints during domestication events. In addition, the reduction of the effective population size caused by a bottleneck at the beginning of the domestication process [29] and subsequent inbreeding during breeding-improvement seems to have contributed to the higher v ratios observed in domesticated animals. In our study the samples of wild progenitors were limited (only 2 individuals), and therefore we could not evaluate the difference of the v ratios between the domestic goat   [16]), domestic yaks (v = 0.231 [17]), and pigs (v = 0.172; Yonezawa and Hasegawa, unpublished data). However, the v ratio seen in haplogroup A was as low as the deep ancestral branches. This implies that most of the slightly deleterious mutations were already swept from the population, and that severe selective constraints have continued to operate in this haplogroup. The goats are also known as the ''poor man's cow'' [30] and are often not subject to highly developed industrialized agricultural practices. It is reasonable to expect that relatively strong selection pressures similar to those present in wild progenitors may have remained intact for domesticated goats that are subjected to extensive pastoralism such as being raised in seminatural habitats or herded within completely natural free-range environments. The goat is also known to be one of the oldest domestic animals [1]. This implies that even though slightly deleterious mutations may have accumulated in the early phase of domestication, such mutations in a large and old population would be expected to be eventually eliminated from the population given a long enough time such as 10,000 years. Goats are also kept in widely variable environments spanning the humid tropical rain forest, the cold and hypoxic condition of the high altitude, extremely dry desert regions, and the remote isolated habitats of small islands resulting from human transportation. This broad diversity of conditions may have served to help keep selection pressure relatively intense in the goat population from historical periods up to the current state of goat breeding.
It is worth noting that our$ $ B (0.345, 0.387) is even higher than ratios reported for other domestic animals. In the present study, our data includes only sub-haplogroup B1 which is distributed widely in East, South, and Southeast Asia [9]. As such, the unique sub-haplogroup B2, which is observed only in goats kept in the China-Mongolian region and the wild bezoars in West Asia [2,8,9] are not included in our analysis. Chen et al. [8] pointed out that haplogroups B1 and B2 show star-like tree structures and that South Asia haplotypes are derived from the Eastern Asian haplotypes. Therefore, it is possible that the China-Mongolian region was a secondary domestication site that served as a ''transportation'' center in Asia. Since Southeast Asia is the only place where haplogroup A is not dominant, it is possible that there was a severe bottleneck in the transportation of goats from Western Asia to South Asia and Southeast Asian regions via China-Mongolia. Thus, the analysis of the mitochondrial genome data for haplogroup B2 should shed light on the enigmatic process of drift vs. selection operating during historical transportation of goats.
Regardless of these open questions, the extremely low v ratio of haplogroup A and the extremely high v ratio of haplogroup B add significant insight into the complex relationship between the population genetic structure of domestic animals and the importance of selection, breeding environment, demographic history, and the purpose of domestication in shaping their biological diversity.

Conclusion
Our analyses of the nearly complete mitochondrial proteinencoding genes of the goat revealed that: (1) the timing of population expansion of goats occurred in the Late Pleistocene and extensively predates the beginning of goat domestication ,10,000 years ago. This result is consistent with the population expansion of Asian and European pigs; (2) the v ratio of the most dominant type represented by haplogroup A is extremely low which implies that most of the slightly deleterious mutations have been swept out of the goat population by selection. The apparently strong selective constrains in goats are probably due to the extremely large population size of haplogroup A present from the beginning of domestication, and also the highly variable and often extreme breeding environments of this animal; (3) conversely, the v ratio of haplogroup B (both for shallow and deep branches) was extremely high. This suggests that the biased sampling of domestic goats from their wild progenitors in haplogroup B during the beginning of domestication, and subsequent transportation to South and Southeast Asian via China-Mongolia likely created extreme bottlenecks that facilitated fixation of slightly deleterious mutations in the lineage of this haplogroup.

Ethics Statement
All of the experimental works involving animals in this study followed the guidelines of the Animal Experimental Ethics Committee of the Tokyo University of Agriculture, Japan, and has been approved by them.

Samples
Blood samples were collected from 450 goats within the original areas of the populations, for 9 breeds from 6 Asian countries, i.e., Japan, Korea, Mongolia, Indonesia, Philippines and Bangladesh ( Table 1). The detailed information of the samples is described in Nomura et al. [31]. Blood samples of bezoar were collected from Gunma Safari World Co. Ltd, Japan, the markhor (Capra falconeri) was collected from the Yumemigasaki zoo, and the Gobi ibex (Capra sibirica) was provided by the Research Institute of Animal Husbandry, Mongolian State University of Agriculture.

Experimental protocols
DNA was extracted from blood samples using a proteinase K digestion and a phenol-chloroform extraction [32]. To identify the mtDNA haplogroups, a 481 bp fragment of the first hypervariable segment (HVI) of the control region was amplified for all samples using the primers D-HVI-CAP-FI and D-HVI-CAP-RI (Table S2), and sequenced. Subsequently, 40 goats that represent haplogroup A, B, C, and D (see below section ''Haplogroup identification'') were selected, and each protein coding region was amplified and sequenced using 12 pairs of PCR primers (Table S2). PCR amplifications were conducted in 25 ml reaction volume with 1.5 mM MgCl 2 , 0.3mM of each primer, 200 mM of each dNTP, 1U TaKaRa Taq TM HS (TaKaRa BIO INC. Otsu, Japan) and 100 ng DNA. The PCR profile consisted of an initial denaturation step at 94uC for 5 min, 35 amplification cycles (denaturation at 94uC for 1 min, annealing at 52-66uC for 1 min and extension at 72uC for 2 min) and a final extension at 72uC for 10 min. PCR products were purified using the Wizard H SV Gel and PCR Clean-Up System (Progema, Madison, WI, USA) and used for sequencing with the Big Dye H Terminator v3.1 Kit (Applied Biosystems, Foster City, CA, USA) on an ABI prism 3100 Avant DNA analyzer. The sequences were aligned using the MEGA v. 4.0 program [33], against reference sequences for goat (GenGank accession number NC_005044), cattle (V00654) and sheep (AY858379). All newly determined sequence data were deposited in DDBJ (DNA Data Base of Japan: www.ddbj.nig.ac. jp/) and accession numbers were shown in Table S3.

Haplogroup identification
Naderi et al. [9] analyzed the mitochondrial D-loop (hypervariable region 1: HVR1) of 2430 domestic goats (1540 haplotypes) from all regions of the Old World and indicated that there are six major haplogroups (A, B, C, D, F, and G). They also characterized 22 reference sequences that define the 6 haplogroups. For the purpose of haplogroup identification, we inferred the phylogenetic tree based on the D-loop sequences of our samples together with these 22 reference sequences. The sequences were manually aligned and carefully checked by eye. The phylogenetic tree was inferred by the neighbor joining (NJ) method [34] with the K80+G model [35,36] using the MEGA v. 4.0 program [33]. The shape parameter (a) of the G distribution was fixed at 0.22, as estimated using the BASEML program of PAML v. 4.2 [37]. The confidence values for internal branches were evaluated by the bootstrap method [38] with 10,000 replications.

Alignment and inference of the phylogenetic tree based on the protein-encoding genes
The nucleotide sequences of protein-encoding genes on the H strand were manually aligned and carefully checked by eye. The following regions were excluded from the alignment: initiation and termination codons, and overlapping regions between ATP6 and ATP8, ND4 and ND4L, ND5 and ND6. The mitochondrial genomes from two goats were downloaded from NCBI (GU068049, GU295658) and were aligned together with our original sequences. The markhor was included in this alignment as an outgroup.
Since we could not determine nucleotide sequences of all genes in several individuals, our full data set contains missing regions. Therefore, we made two sets of alignments, defining ''the smaller data set'' to have no gaps, and ''the larger data set'' to contain gaps and missing sequence regions. After concatenating all the data, identical sequences were excluded from the final alignment. The smaller data set consists of 34 individuals (33 goats and one markhor) with 10 genes (ATP6, COX1, COX2, COX3, ND1, ND2, ND4L, ND4, ND5, cytochrome b: 10,188 bp in total), and the larger data set consists of 40 individuals (39 goats and one markhor) with 12 genes (ATP6, ATP8, COX1, COX2, COX3, ND1, ND2, ND3, ND4L, ND4, ND5, cytochrome b: 10,683 bp in total). The list of the variant sites is shown in Table S3.
The phylogenetic tree was inferred by the maximum likelihood (ML) method [39] using the RAxML v. 7.0.3 program [40] with the GTR+I+G 4 model [36,41,42]. Taking into account the different tempo and mode of nucleotide substitutions, the three codon positions were analyzed separately. The branch lengths were also estimated independently. Gap sites were treated as missing data. To evaluate the confidence at nodes of the internal branches, we applied the rapid bootstrap method [40] with 1000 replications. For the bezoar, females become sexually mature at 2-3 years [11]. Thin auxiliary lines indicate 500,000 increments, bold auxiliary lines indicate 1,000,000 increments. doi:10.1371/journal.pone.0067775.g004

Estimation of selection pressure and population expansion
To evaluate the selection pressure, the non-synonymous/ synonymous rate ratio v (dN/dS) was analyzed. The branch model that allows different levels of heterogeneity for the v ratio among the lineages [43] was applied using the CODEML program in PAML v. 4.2 [37]. It is known that the amino acid substitution rate estimated by intra-species comparisons is much faster than those of inter-species comparisons, probably because slightly deleterious mutations are not completely swept out of populations within the smaller time scales of intra-species comparisons [22]. Moreover, the v ratios of the shallow branches are usually much higher than those of the deep branches probably from the same reason [17,44]. Therefore, we placed the branches into three distinct categories: (1) Shallow branches, which are defined as all terminal and internal branches descended from the MRCAs (most recent common ancestor) of each haplogroup; (2) Deep branches, which are defined as the internal branches between the MRCA of all goats and MRCAs of each of the haplogroups; and (3) Inter-species branches, which connect the markhor and the MRCA of all goats.
McDonald and Kreitman's Test [23] was also applied to evaluate differential selection pressure between intra-and interhaplogroups using the DnaSP v. 4.2 program [45]. Tajima's D [25] was also estimated using the same program to detect recent population expansion events. Fluctuation of the ancestral population sizes were estimated by the Bayesian Skyline Plot method [46] using the BEAST program v. 1.7.4 [47] with the HKY+G [36,42] model under a strict clock. We newly estimated the mutation rate of D-loop for goats using our alignment for the ''Haplogroup identification'' with the BASEML program of PAML [37], assuming the MRCA of haplogroup A lived 91,000 years ago. This new rate was estimated at 2.73610 27 / site/year.

Divergence times estimations
Previous studies [4,6,7] estimated the divergence times among the major goat haplogroups, assuming the goat/sheep split to be 5.0 to 7.0 Ma (mega annum) based on the ungulate fossil record [13]. However, the fossil record in general does not always point to the ''real'' divergence time because the first stratigraphic appearance of taxa in the fossil record may be subjected to sporadic sedimentary disruptions due to erosion or lack of sedimentation during regression and/or irregular sedimentary processes. Because of these uncertainties, an assumed phylogeny implies such gaps if two sister taxa have different times of first appearance or if a gap exists between the last appearance of an inferred ancestor and the first appearance of its inferred descendant [48]. Accordingly, we should not regard the first appearance of the first fossil record as the ''timing'' of the split, but rather that the split of two lineages was older than the age of the first fossil record (the younger limit). Additionally, in many cases, it is difficult to assume a particular split is younger than the confirmed age based on the fossil record (the older limit). For this reason, at first we estimated the divergence times of the goat and the markhor within the comprehensive evolutionary framework of the Cetartiodactyla using several reliable fossil records for calibration. The phylogenetic tree was inferred based on the amino acid sequences of the concatenated 12 mitochondrial protein coding genes encoded in the H strand with the RAxML v. 7.0.3 program using the mtREV+F+G 4 model [49]. Assuming the basal position of Camelidae [50], the divergence times were estimated using the relaxed clock model [51] using the MCMCTREE program [52] implemented in PAML v. 4.4. In this analysis, the normal approximation method was used to reduce the computational burden. To estimate the Hessian matrix (variance -covariance matrix of branch lengths), we used the mtmam+G 5 model [53] for the amino acid sequences using the CODEML program in PAML, and the GTR+G 8 model for the nucleotide sequence using the BASEML program in PAML, where the model was applied separately to each of the three codon positions. The independent rates model between ancestral and descendant lineages was also applied (e.g., [54]). The prior distributions were set as follows: For the amino acid sequence, rgene_gamma = (4, 5), and s2_gamma = (1, 0.8). For the nucleotide sequence, rgene_gamma = (4, 7), and s2_gamma = (1, 0.8).
For the divergence time estimation among goats, the strict molecular clock method was applied, which assumes homogeneity of the substitution rate among lineages. Even if the evolutionary rate does not differ among lineages, evolutionary divergence cannot be read directly from the observed difference between two sequences because of multiple substitutions with extreme site heterogeneity [61]. In addition, as mentioned above, the substitution rates in intra-species comparisons are higher than that of inter-species comparisons. Ho et al. [62] demonstrated a high evolutionary rate in the short term (,1 Ma) and a low rate in the long term (.1 Ma), and approximated this rate change using the exponential decline curve. Williamson and Orive [63] investigated the impact of purifying selection on the shape of genealogy and the distribution of mutations. As a result they demonstrated that although the shapes of trees topology remained largely unchanged, the distributions of the mutations on the trees shifted. Since a majority of coalescent analyses assume neutrality, this effect of purifying selection could create a bias in associated estimations. Therefore, we used only 3 rd codon positions because most of the substitutions in the 3 rd codon positions are synonymous and are therefore likely to be neutral. Thus, they are thought to be relatively free from this rate change. The GTR+G 4 model was used for the present analysis. We confirmed that there are almost no multiple substitutions among the goat and markhor (data not shown) and estimated the split of the goat and markhor to be 3.4 Ma. Figure S1 The maximum likelihood tree of domestic goats based on the nearly complete mitochondrial protein-encoding genes of the smaller data set. The GTR+I+G 4 model was used. Taking into account the different tempo and mode of nucleotide substitution, each of the three codon positions was analyzed separately. The branch lengths are proportional to numbers of nucleotide substitutions. The markhor was used as an outgroup. Nodal numbers indicate bootstrap probabilities (rapid bootstrap method: 1,000 replications).