A genome-wide scan for diversifying selection signatures in selected horse breeds

The genetic differentiation of the current horse population was evolutionarily created by natural or artificial selection which shaped the genomes of individual breeds in several unique ways. The availability of high throughput genotyping methods created the opportunity to study this genetic variation on a genome-wide level allowing detection of genome regions divergently selected between separate breeds as well as among different horse types sharing similar phenotypic features. In this study, we used the population differentiation index (FST) that is generally used for measuring locus-specific allele frequencies variation between populations, to detect selection signatures among six horse breeds maintained in Poland. These breeds can be classified into three major categories, including light, draft and primitive horses, selected mainly in terms of type (utility), exterior, performance, size, coat color and appearance. The analysis of the most pronounced selection signals found in this study allowed us to detect several genomic regions and genes connected with processes potentially important for breed phenotypic differentiation and associated with energy homeostasis during physical effort, heart functioning, fertility, disease resistance and motor coordination. Our results also confirmed previously described association of loci on ECA3 (spanning LCORL and NCAPG genes) and ECA11 (spanning LASP1 gene) with the regulation of body size in our draft and primitive (small size) horses. The efficiency of the applied FST-based approach was also confirmed by the identification of a robust selection signal in the blue dun colored Polish Konik horses at the locus of TBX3 gene, which was previously shown to be responsible for dun coat color dilution in other horse breeds. FST-based method showed to be efficient in detection of diversifying selection signatures in the analyzed horse breeds. Especially pronounced signals were observed at the loci responsible for fixed breed-specific features. Several candidate genes under selection were proposed in this study for traits selected in separate breeds and horse types, however, further functional and comparative studies are needed to confirm and explain their effect on the observed genetic diversity of the horse breeds.


Introduction
The present horse population abounds in the variety of phenotypes resulting mainly from selective breeding directed at improvement of particular phenotypic features. Since the domestication, various selection criteria aimed towards the improvement of horse usability in transportation, agriculture or horsemanship have been applied [1][2]. This drove the specialization of particular populations in terms of utility and finally resulted in a creation of formal breeds constituting largely closed populations with high genetic uniformity of individuals within breeds [3]. The current selection in the most of horse breeds is primarily directed at the improvement of traits connected with appearance and performance. However, apart from the highly specialized breeds, there are also horse populations which are valued for their primitive nature manifesting in their robust constitution for primitive living conditions [4]. These horse populations were mainly shaped by natural selection, however, selective breeding for breed standard and mating in closed populations makes their genetic characteristics largely similar to that found in other horse breeds.
Both natural and artificial selection lead to changes in allele frequencies between populations, and over time, different variants and haplotype structures are being fixed within separate breeds [5]. Various concepts and statistics have been used to detect such selection signatures from genome-wide SNP data in livestock species. Some of them are based on within breed genome characteristics and other relay on genetic variation among breeds. Most of the available methods are based on: (i) the high frequency of derived alleles and other consequences of hitchhiking within population, (ii) the length and structure of haplotypes, measured by extended haplotype homozygosity (EHH) or EHH-derived statistics, and (iii) the genetic differentiation between populations, measured by F ST or related statistics [6]. The genetic differences arising as a result of selection are presumably concentrated at the functional variants loci being beneficial for the selected traits. Thanks to the linkage disequilibrium across the genome, the approximate position of these functional variants can be identified using neutral genomewide SNP panels and a comparative analysis of allele frequency distribution between different breeds [5]. A commonly used approach to detect diversifying selection signatures is based on the measure of population differentiation due to locus-specific allele frequencies variation between populations, which is quantified using the F ST statistic [7] and further averaged at specific chromosome distances to account for stochasticity [8]. F ST provides information on the genomic variation at a locus among populations relative to that found within populations [9]. These among-breed differences, which can be considered as diversifying selection signals, have been previously successfully used to map genomic loci spanning the genes with variants responsible for a number of phenotypic features in different species, including coat color, size, muscling, production and reproduction [10][11][12][13][14]. The identification of selection signals across the genome connected with candidate gene approach was also useful to detect loci associated with size, performance, coat color and gait in several horse breeds [2,4,15,16].
To extend the knowledge in this area and provide more data on candidate gene loci responsible for important breed specific features, in this study, we attempt to identify selection signatures by the analysis of genetic diversity of six different horse breeds which can be classified into three major categories: light, draft and small size primitive horses. Among the light horses, apart from well-known Arabian horses we analyzed Małopolski horse which is currently wellbalanced riding horse that was originally developed mainly from native Polish population crossed with Thoroughbreds and Arabians. The primitive breeds analyzed in this study included Hucul and Polish Konik. Both these breeds are represented by a small size horses with several characteristics of feral population. Hucul horses originate from Carpathian Mountains and most likely, they are descendants of various types of horses: Tatar, Oriental, Arabian, Turkish, Przewalski's horses, as well as horses with Norse blood. The Polish Konik originate from a now-extinct Tarpan horse from native Polish population and is characterized by several primitive features, including mouse-dun coat color and a dorsal stripe. The draft horses analyzed in this study included Sokolski and Sztumski breeds. Sokolski horse derives from crossbreeding of local Polish mares of Polish Coldblood type with imported Ardennais and Breton sires. The Sztumski horse is the largest and the heaviest of cold-blooded horses maintained in Poland which was originally created on the basis of local population crossbred mainly with Ardennes and Belgian sires [17]. The direct comparison between these horse types and breeds can provide candidate gene loci presumably responsible for constitution, size and coat color and may also contribute to the recognition of the genetic background and source of variation of horse phenotypic features. Five out of six breeds analyzed in this study are included in the conservation programs following FAO National Rare Livestock Breeds Preservation guidelines and additionally, according to our best knowledge, these native horse breeds were not studied before in terms of diversifying selection and thus this study provides new data on horse breeds genetic diversity and variation.

Animals, samples and genotyping
The study material comprised blood samples obtained from 571 horses (both males and females) randomly sampled from different herds and belonging to six different breeds. The breeds were selected to represent three major horse types: light horses-Arabians (n = 124; AR) and Malopolski horse (n = 56; MLP); primitive horses-Hucul (n = 116; HC) and Polish Konik (n = 99; KN) breeds; and draft horses-Sokolski (n = 107; SOK) and Sztumski (n = 69; SZTUM). The blood was collected from jugular vein by a veterinary doctor in an amount of 10 ml to a vial containing EDTA K3. Samples from Arabian horses were obtained from three different studs: SK Janów (Lubelskie province), SK Michałów (Świętokrzyskie Province) and SK Białka (Lubelskie province) which were our project partners. In the case of Małopolski horsesthe material was sampled in all above studs and in farms of individual breeders with their personal consent. The biological material of Hucul horses was obtained in the Gładyszów Stud (Małopolskie province) and ZDIZ PIB Odrzechowa (Podkarpackie province) with the consent of their Chairmans. In the case of Polish Konik, the material was sampled in: Popielno Research Station, IRiŻZ PAN (Warmian-Masurian province) and Kalitnik-PTOP Research Station (Podlasie province) with the consent of the Presidents. The biological material from draft horses was collected in herds participating in the programs for the conservation of genetic resources of Sztumski and Sokolski horses. The herds were located in the Podlasie (Sokólski) and Pomorskie (Sztumski) provinces. Each farmer participating in the program signed a cooperation agreement with the NRIAP, where they undertake to provide data and biological material for research purposes. Breeders participating in the conservation program are obliged to keep animals in accordance with the criteria of animal welfare, which is a subject to the control of district veterinarians. The genomic DNA was purified from the blood using a Sherlock AX kit (A&A Biotechnology) and after a quality control was genotyped with the use of the Neogen Equine Community BeadChip assay (Illumina) according to the standard Infinium Ultra protocol. All animal procedures were approved by the Local Animal Care Ethics Committee No. II in Kraków-permission number 1293/2016 in accordance with EU regulations.
Only genotypes with call rate >0.97 were used for the analysis. The initial SNPs set was filtered to remove markers located on the sex chromosomes (EquCab2.0 genome build). The initially filtered SNP panel included 61,268 markers, which was further reduced by applying population-wide filters. The filters included MAF threshold of 5% and <20% of missing genotypes in the whole studied population. Additionally, SNPs with critical p-value for HWE <1.0E-06 in each breed separately were excluded. This resulted in the final SNP panel of 52,023 markers, scattered across the genome with an average inter-marker distance of 43.0 kb.

Identification of diversifying selection signatures
The genome regions with differentially fixed variants or strongly differing in alleles frequency between separate breeds were identified using pairwise Wright's F ST [9], the conventional measure of population genetic differentiation. The F ST values obtained for pairwise comparisons at each SNP were breed-standardized according to a methodology proposed by Akey et al. 8]. The standardized F ST values were calculated (d i ) as: where E½F ij ST � is expected value and sd½F ij ST � denotes the standard deviation of F ST between breeds i and j calculated from all analyzed SNPs. To account for random locus-by-locus variation, a 10-SNP sliding window was applied on the obtained d i values. Candidate regions affected by diversifying selection were defined as the 99.9th percentile of the empirical distributions of window-averaged d i values. The overlapping regions under selection were merged and (while searching for potentially linked genes) regions were extended on both ends by 25kb. Additional comparisons using d i values were made between major horse types (light, primitive and draft) to detect potential selection differences between them. The linkage disequilibrium (LD) and haplotype block structure at the most diversified regions among the studied breeds were analyzed using HaploView 4.2 [18] software examining pairwise LD on the distance up to 500kb and detecting blocks based on a method proposed by Gabriel et al. [19]. Additionally, a detailed analysis was performed for previously detected candidate genes loci for horse body size (LCORL/NCAPG) [2,20] and the dun coat color phenotype (TBX3) [21].
Populations differentiation visualization was performed using principal component analysis on SNP genotypes and cladogram created based on weighted F ST distances using neighborjoining method [22].
The functional annotation of genes detected within the strongest selection signals was performed using the KOBAS 3.0 web server [23] and Panther Classification System [24]. A gene list enrichment analysis was done according to all known horse genes (genome) applying a correction for multiple testing.

SNP panel statistics and general genetic differentiation of the studied breeds
The data filtering allowed obtaining a common set of 52,023 SNPs polymorphic across the whole population with a mean inter-marker distance of 43.0 kb (±45.1). The number of polymorphic SNPs (MAF>0.01) per breed ranged from 47,495 to 50,775 in HC and MLP breeds, respectively. The average MAF across all SNPs was the lowest in KN (0.214) and the highest in MLP (0.262) breed. The averaged observed heterozygosity per breed ranged from 0.296 in KN to 0.353 in MLP (Table 1). Mean and weighted overall pairwise F ST distances were the lowest between both draft horse breeds (0.012 and 0.014) and the highest level of genetic differentiation was found between the Arabians and the primitive or draft horses ( Table 2). The breeds differentiation was additionally visualized using the principal component analysis based on SNP genotypes and a cladogram of mean pairwise F ST distances created using the neighbor joining method (NJ) (Fig 1). The PCA analysis showed clear separation of genetic profiles of   Table 2. Neighbor Joining method was used to order the breeds according to their relative genetic distances.
https://doi.org/10.1371/journal.pone.0210751.g001 both light horses from remaining breeds and a clearly distinct genetic profile of Hucul horses, while the NJ method showed the visible similarity of genetic profiles within major horse types with an exception of primitive horses, which were clearly separated (Fig 1).

Breed-specific selection signatures
Signatures of diversifying selection among the studied horse breeds were detected based on the breed-normalized pairwise F ST distances (di) (S1 File). After smoothing of the obtained d i values by moving average, top 0.1% of the observations were considered as the most pronounced selection signals associated with breed specific features. The merging of overlapping regions allowed for detection from 10 (MLP, SOK) to 15 (KN) genomic loci with strong selection signals per breed with a size ranging from 163.9 kb to 4.4 Mb ( Table 3). The highest number of the strongest selection signals across all breeds was detected on ECA1 and ECA11 and only single regions were detected on ECA12, 14, 16, 19, 21 and 24. Several genomic regions with strong section signals overlapped between different breeds and these regions were located on chromosomes 1, 2, 3, 7, 8, 11, 15 and 22 (Table 3, Fig 2). The most commonly observed selection signal common for different breeds (HC, KN, SOK, SZTUM) was detected on ECA11 between 22.9 and 23.7 Mb of the chromosome sequence.
To analyze gene content of the genome regions spanning the strongest detected selection signals (top 0.1% of di values), each region was expanded by 25 kb on up-and downstream directions to account for potentially linked genes. This allowed for detection from 65 (SOK) to 169 (MLP) ENSEMBL genes per breed (S2 File). The analysis of genes common for different breeds showed that as many as 18 genes were common for HC, KP, SOK and SZTUM breeds and no common genes were observed between MLP and both the primitive and draft horses (S3 File) [25]. The functional classification of well-annotated genes using Panther software according to GO terms showed that all the genes detected across breeds (506 unique genes) were mainly involved in cellular processes (32% of genes) like e.g.: cell communication and cell cycle or metabolic processes (22%), including: primary metabolic processes (89 genes), nitrogen compound metabolic processes (50 genes) or phosphate-cogitating compound associated processes (28 genes). All the genes were connected with a large number of Panther pathways, for which the highest number of genes was connected with inflammation mediated by chemokine and cytokine (8 genes), TRH receptor signaling, TGF-beta signaling, CCRK signaling map or GRH receptor signaling pathway (5 genes in each).
When functional classification was performed for individual breeds, visible differences in the enriched GO categories were detected (S4 File). The pathway analysis performed on our dataset for genes found in the Arabian horses showed the presence of some genes connected inter alia with: ATP synthesis coupled electron transport (COX4I1), bile secretion (ABCG8, ABCG5, ADCY1), fat digestion, oocyte meiosis (ADCY1, ANAPC7), ovarian steroidogenesis (ADCY1) and insulin signaling (CBLB) or secretion (ADCY1) pathways. Genes detected in the Malopolski horse were associated with largely similar biological pathways as those found in the Arabian horses and included inter alia: progesterone-mediated oocyte maturation (PRKACA, ANAPC5, ANAPC7), oocyte meiosis (PRKACA, ANAPC5, ANAPC7), fatty acid metabolism (TECR), but also covered other processes associated with immune system functions, like e.g.: leukocyte transendothelial migration (MYL2), antigen processing and presentation (PRKACA) and inflammatory mediator regulation of TRP channels (PRKACA, PRKCD). Within selection signatures characteristic for the primitive Hucul breed, we found several genes involved inter alia in processes like: olfactory transduction (LOC100066263, LOC100066541, LOC100055475, LOC100066487, LOC100060476, LOC100060509, LOC100066238), cardiac muscle contraction (ACTC1) and alanine, aspartate and glutamate metabolism (GFPT2, CPS1). Among 100 genes detected in the strongest selection signatures found in the Polish Konik horse we detected genes connected e.g. with: inflammatory mediator regulation of TRP channels or T cell receptor signaling pathway (PLCG1), chemokine signaling pathway (GNG4) and glycerolipid metabolism (LPIN3). The genes detected in the draft Sokolski horses were connected inter alia with: cardiac muscle contraction (ATP1A1, CACNB1), bile secretion (ATP1A1, ABCG8), fat digestion/absorption (ABCG8) and insulin secretion (ATP1A1), so with pathways largely similar to those found in the light horses. Genes associated with diversifying selection signatures in the Sztumski horse were connected for example with: prolactin signaling pathway (CWC25), cytokine-cytokine receptor interaction (LASP1) or general metabolism (PCGF2) (S5 File).

Signatures of diversifying selection between major horse types
To identify genome regions differentially fixed between different major horse types, d i values were calculated between pairs of breeds assigned to three categories: light (AR, MLP),  This comparison revealed 14 genome regions strongly differing in alleles frequency between the light and draft horses, 11 regions differing between the light and primitive horses and 9 differing between the primitive and draft breeds. The highest number of such divergently selected regions was detected on ECA2 (6 regions), 8 and 22 (4 regions on each). The size of individual regions ranged from 124.2 kb to 1.1 Mb. The regions directly overlapping between at least two different comparisons were found on ECA2, 4, 8, and 19 and these overlaps were found for comparisons between the light horses and two remaining horse types ( Table 4). The comparison of selection signature plots (Fig 3) revealed similar pattern of allele frequency differences between the light and two other horse types and distinct pattern of frequency differences between the primitive and draft horses.
Within the detected selection signatures for major horse types we detected in total 220 unique genes-from 77 to 87 for separate comparisons (S7 File). The genes with differentially fixed variants between the light and draft horses were associated with several biological pathways, of which the most pronounced were those connected with progesterone-mediated oocyte maturation, oocyte meiosis (ANAPC5, ANAPC7, ADCY1), cardiac muscle contraction (MYL2, ATP1A1), adrenergic signaling in cardiomyocytes (ADCY1, MYL2, ATP1A1), salivary or bile secretion, insulin secretion and thyroid hormone synthesis (ADCY1, ATP1A1). The genes were also connected with biological processes responsible for energy homeostasis, like: ATPase activity or mitochondrial inner membrane function (ATP5F1E) (S8 File).
The genomic regions differing between the primitive and draft horses encompassed 87 different genes among which we detected ones connected with Ras signaling pathway (GNG4, FGF10), sulfur relay system (NFS1), MAPK signaling pathway (CACNB1, FGF10), cardiac muscle contraction (CACNB1) and terpenoid backbone biosynthesis (GGPS1). The genes were also annotated to a wide range of biological processes including inter alia: keratinocyte differentiation, angiogenesis and bone development (MED1) (S8 File).
Special attention was given to genes located within the strongest signals of diversifying selection between different horse types and involved an analysis of the most divergently selected regions per comparison. For comparison between the draft and light horses we analyzed in details a region on ECA7 (39.8-40.8 Mb), for comparison between the primitive and light horses a region on ECA8 (20.8-21.5 Mb) and for comparison between the draft and primitive horses a region on ECA11 (22.8-23.9 Mb) (Fig 3). Additionally, for these regions we established linkage disequilibrium and haplotype block structure to identify haplotypes potentially being under selection.
Within the region on ECA7 (39.8-40.8 Mb), apart from two pseudogenes and one uncharacterized protein coding gene, we found two genes (namely: NTM, coding for Neurotrimin and OPCML, coding for opioid binding protein) involved in central nervous system functioning [26]. The analysis of LD and haplotype structure at the analyzed locus showed the presence of two haplotype blocks with common haplotypes showing frequency above 0.7 in all analyzed breeds (S9 File). The region also neighbored with a large chromosomal area of high LD in AR and MLP horses located upstream of the analyzed selection signature (40.1-52.   haplotype blocks with frequency higher than 0.5 were found for three out of four compared breeds (S10 File).
Within the large region with the highest differences in alleles frequency between the draft and small size primitive horses on ECA11 (22.8-23.9 Mb), we detected 30 different genes (including two pseudogenes and five uncharacterized proteins). One of them, LASP1 gene, was previously proposed as a candidate gene for size in horses [20]. The analysis of LD structure at the locus showed rather fast linkage disequilibrium decay and haplotype block structure difficult to associate with the selection signal (Fig 4). Nevertheless, overlapping haplotype blocks were detected in the draft horses, spanning haplotypes with frequency exceeding 0.7.

Diversifying selection signatures at candidate gene loci for size and gene responsible for coat color dilution
In previous studies, the ECA3 locus encompassing LCORL and NCAPG genes was shown to be differentially selected between several draft and miniature horse breeds [2] and was also Diversifying selection signatures in selected horse breeds presented as explaining a significant portion of genetic variance for this trait in an acrossbreed study [20]. In our results, one of the four strongest diversifying selection signals detected between the small size primitive horses and draft horses, encompassed a previously described locus on ECA3 (104.8-105.9 Mb), spanning both mentioned genes. The analysis of the haplotypes structure at the locus showed overlapping haplotype blocks for three of the analyzed breeds (KP, HC, SOK), however, they spanned a neighboring pseudogene locus rather than LCORL/NCAPG positions (Fig 5). This observation may be connected with the possible location of functional variants in the upstream region of LCORL (NCAPG is transcribed in opposite direction), possibly influencing its promoter activity.
Another locus which was analyzed in details was a genome region on ECA8 spanning TBX3 gene. Previous studies showed that the dun phenotype (which is characteristic for the studied Polish Konik horse) is related to mutations occurring in TBX3 gene [21,27]. The analysis of diversifying selection signatures characteristic for individual breeds performed in this study showed that the strongest selection signal found in the Polish Konik directly overlapped with this locus (ECA8: 17.9-18.6 Mb). At the locus, we found the low level of LD and poor haplotype structure, however, the detected d i signal almost exactly matched the TBX3 genomic position (Fig 6).

Discussion
In this study we performed a genome-wide scan for diversifying selection signatures in horse breeds representing performance, exterior and size selected animals. As a comparative population we included primitive, extensively selected horses with well-developed primitive characteristics like: good fertility, disease resistance and adaptation to harsh environmental conditions [28]. Many genome regions divergently selected between separate breeds were identified in this study as well as selection signatures characteristic for specific horse types (light, draft and primitive). This allowed for an identification of several candidate genes and associated metabolic pathways which may be potentially responsible for the divergent phenotypes of the studied breeds. To allow comprehensive analysis of the obtained results we focused Diversifying selection signatures in selected horse breeds only on the genome regions with the strongest selection signals, presumably being near fixation in separate breeds and spanning variants responsible for the well-established (fixed) breed-specific traits. To manage a variety of genes found within the selection signals we performed a pathways analysis aimed at the identification of enriched processes. Having in mind that the detected selection signatures are associated with a variety of phenotypic features differentiating the studied breeds (which are conditioned by complex molecular mechanisms), we expected only very few genes connected with separate biological processes in the enrichment analysis. Nevertheless, this analysis allowed us to reduce the complexity of the obtained data and allowed us to search for the pathways and underlying genes potentially being the targets of diversifying selection.
The highest number of the strong diversifying selection signals across all breeds was detected in this study on ECA1 and ECA11. These autosomes were previously shown to contain loci affecting size in horses, which both were detected using population genetics [2,16] and quantitative genomics methods [20]. Several genomic regions with strong selection signals overlapped between different breeds and these regions were located on chromosomes 1, 2, 3, 7, 8, 11, 15 and 22 (Table 3, Fig 2). The most commonly observed selection signal common for different breeds was detected on ECA11 between 22.9-23.7 Mb of the chromosome sequence. This region was previously shown as overlapping with the LASP1 gene locus with a presumed effect on growth and body size traits [2,20].
The analysis of linkage disequilibrium at the most pronounced selection signals found in this study on ECA7, 8 and 11 between major horse types showed rather low level of LD and a poorly conserved haplotype structure in the analyzed breeds, suggesting that the variants selected at the regions are evolutionarily old and their frequency increased during breeds formation rather than under the influence of recent selection. This can be supported by the fact that an high LD expected in regions bearing variants under strong ongoing positive selection [29], actually decayed at the analyzed loci as a result of repeated meiosis and recombinational mechanisms acting over several generations. This is in accordance with the suggestions that different statistical methods, like e.g. extended haplotype homozygosity tests are more suitable to detect selection signatures associated with ongoing selection directed at new functional variants [30]. Nevertheless, selection signals associated with variants which are not fully fixed in the studied populations should be present among signatures that did not reach the applied in this study stringent threshold of 99.9% of the highest d i values and more in-depth analysis of our data would allow detecting variants currently segregating in the studied populations [31].

Diversifying selection signatures among major horse types
Within the region on ECA7 (39.8-40.8 Mb), divergently selected between the draft and light horses, apart from two pseudogenes and one uncharacterized protein coding gene we found two genes (namely: NTM, coding for neurotrimin and OPCML, coding for opioid binding protein) that were involved in central nervous system functioning. The study performed in humans suggested that NTM gene (coding for Neurotrimin) locus is associated with the level of IQ and two other genome wide association studies (GWAS) reported an association between NTM variation and cognitive function performances in humans [32,33]. The diversifying selection signature at this locus between the draft and light horses may potentially contribute to their differing temperaments and/or potentially, the ability to develop different gaits as a function of motor coordination managed by the brain.
The locus on ECA 8 (20.8-21.5 Mb) with clear divergences in alleles frequency between the primitive and light horses encompassed 14 genes, including one uncharacterized protein.
Among the genes as potential selection candidate we suggest MYL2 (Myosin Light Chain 2) gene (detected at the peak of the selection signature), which is coding for a contractile protein that plays a significant role in heart development and contraction [34,35]. The mutation in MYL2 was shown to be responsible for hypertrophic cardiomyopathy (HCM) in humans [36]. The association of the gene with heart contraction and development suggests its potential role in physical effort and advocates selection pressure on its locus in light horses in terms of their performance and endurance which are strongly dependent on heart and lung efficiency [37]. The MYL2 gene was also identified in a genome-wide study on quantitative trait loci affecting show-jumping performance in Hanoverian warmblood horses [38].
Interestingly, one of the genome regions on ECA5 differentiating light and primitive horses encompassed SLC16A1 (Solute Carrier Family 16 Member 1) gene. The MCT1 protein encoded by this gene catalyzes the movement of different monocarboxylates such as lactate and pyruvate across the plasma membrane [39]. In a previous study, Ropka-Molik et al. [40] showed that SLC16A1 gene expression is stimulated during training regimens preparing horses for flat racing. In the studied Arabian horses they detected that the SLC16A1 expression gradually increased in muscle tissue, starting from resting conditions up to top training form [40]. Furthermore, the performed association analysis showed that there was a significant association between g.55589063T>G SNP located in the 5'UTR of SLC16A1 gene and the selected racing results [41]. All these evidences suggest the potential significance of SLC16A1 gene and MCT1 protein for horse performance and that the detected selection signal associated with genetic differences between primitive and light horses may be a result of selection (applied especially in the Arabian and Malopolski horses) towards the improvement of racing abilities.
Within the large region on ECA11 (22.8-23.9 Mb) divergently selected between the primitive (small size) and draft horses (so presumably responsible for size), we detected 30 different genes. Among the genes we detected LASP1 gene, previously suggested as a candidate gene for size in horses [20]. However, some doubts were also associated with this gene as a candidate for size [16] probably due to the insufficient knowledge on this gene functions, hampering clear description of its role in body size regulation [42].
Apart from selection signatures representing differentiation of horse types, a detailed analysis was performed to detect genome regions divergently selected in individual horse breeds. The analysis was aimed at the detection of candidate genes affecting breed-specific characteristics.

Selection signatures found in light horses (Arabian and Malopolski)
Both the Arabian and Malopolski horses are characterized by several common features and are lightweight horses with a breeding purpose mainly directed at aesthetics, racing performance and gait quality. Our study allowed identifying in these horses selection signatures spanning genes involved in regulation of pathways associated with metabolic processes and energy production from lipids (such as: bile secretion, fat digestion and absorption, fatty acid metabolism and regulation of lipolysis in adipocyte) or from carbohydrates (insulin secretion, insulin signaling pathway) (S2 and S5 Files). Several previous studies based on animal models and humans suggest that exercise training improves lipid and cholesterol metabolism [43][44][45]. According to Meissner et al. [46], physical activity increase bile acid synthesis and as a result, increase fatty acid absorption in exercise-trained animals. Furthermore, it was also shown that exercise increases the insulin-signaling cascade activity, which can be associated with alterations in insulin sensitivity [47]. Similarly, Ropka-Molik et al. [48] pointed out the significant role of mechanisms modulating glucose uptake and lipid metabolism in maintaining body homeostasis during long-term exercise in horses, thus providing strong grounds that elements of this mechanism presumably are targets of selection in light horses. Insulin receptor signaling pathway was also shown to be enriched by genes detected in a selection sweeps found in a recent study employing next generation sequencing data from 52 horses of different breeds [49]. This suggests that insulin signaling is a biochemical cascade important for general horse functional features and their element are under selection in various types and horse breeds.
Moreover, our results obtained for the Arabian horses which dominate in endurance riding and racing [50] pinpointed genes classified as involved in vascular smooth muscle contraction (ADCY1), taurine and hypotaurine metabolism (GAD1) as well as in oxidative phosphorylation (COX4I1), that is processes with clear implications for racing and athletic performance. It is well known that blood flow through the muscle (conditioned inter alia by vascular smooth muscle contraction) can change over time and highly increase during maximal exercise [51][52][53]. This blood flow adjustment is primarily related to increased oxygen demands of the muscle tissue and it is essential for physical performance during racing. The importance of the role of taurine in exercise endurance has been highlighted by many reports. In mice, the individuals that lacked the taurine transporter (TauT) gene and contained severely reduced muscle taurine content exhibited decreased ability to perform physical exercise in both treadmill and forced swimming tests [54,55]. Furthermore, Dawson et al. [56] and Miyazaki et al. [57] proved that taurine supplementation prolongs the time to exhaustion during treadmill running, by the release of intramuscular taurine into the blood. Processes of oxidative phosphorylation are even more important for physical endurance. The ATP demand rapidly increases to meet the high rate of ATP consumption during rest-to-work transition and hence efficient processes of ATP synthesis are crucial for muscle kinetics [58]. These pieces of evidence suggest that the detected genes connected with vascular smooth muscle contraction, taurine and hypotaurine metabolism as well as oxidative phosphorylation play a significant role in horse athletic performance and are promising candidates for the endurance traits in the studied Arabian horses.

Selection signatures found in primitive horses (Hucul and Polish Konik)
Hucul and Polish Konik, apart from clear genetic differences (also demonstrated in this study) share several common features resulting from their primitive character and common phylogenic origin. Both these horse breeds are presumed to originate from Eastern European wild horse (Tarpan) and are characterized by several primary features adapting them to living in natural conditions. These features are mainly shaped by natural selection and represent adaptation to survival in harsh environment and are connected with ability for searching food, defense against predators, disease resistance and tolerance of adverse climatic conditions [28].
In Hucul horses, selection signatures overlapped with several genes classified as engaged in olfactory transduction pathway (LOC100066263, LOC100066541, LOC100055475, LOC100066487, LOC100060476, LOC100060509, LOC100066238). It is generally thought that the olfactory system is extremely important for the survival of most mammals and is essential for finding foods, avoiding dangers or identifying mates and offspring [59][60][61][62][63]. It helps to survive in harsh environmental/living conditions to which Hucul horses are well adapted.
Our analysis of selection signatures in the Polish Konik breed allowed the detection of a strong selection signal directly at TBX3 gene locus. This gene was previously reported as being causative for the dun phenotype [21] by affecting expression of KITLG gene. There is also strong evidence demonstrating that two single nucleotide polymorphisms within this gene are responsible for coat color dilution and occurrence of three alleles: dun (D), non-dun1 (d1), and non-dun2 (d2) was presented [21]. The detection of the strong selection signal unambiguously associated with this locus in the Polish Konik horse (sole breed with this coat color phenotype among the studied breeds) confirms the involvement of TBX3 gene in coat color dilution in this breed and the usefulness of the applied population-based approach for detection of functional variants loci in the horse genome.

Selection signatures detected in draft horses
Cold-blooded horses thanks to their muscling, size, strength and stamina are perfect for field work and transport. In the analyzed heavy draft horses, we detected selection signatures encompassing genes involved in pathways essential for maintaining body homeostasis like: aldosterone-regulated sodium reabsorption pathway responsible for sodium homeostasis and blood volume and blood pressure control [64], as well as mineral absorption or endocrine and other factor-regulated calcium reabsorption pathways, playing a central role in the homeostasis of ions [65]. We also identified genes involved in aforementioned pathways associated with metabolic processes and energy production (such as: bile secretion, fat digestion and absorption, protein digestion and absorption, insulin secretion or metabolic pathways) (S2 and S5 Files). All these mechanisms play a crucial role in regulation of biological and cellular functions that are necessary for maintaining body homeostasis during heavy physical effort, which is one of the most important traits selected in draft horses.
In both studied draft horse breeds, we detected a strong selection signal on ECA3, overlapping with previously described LCORL/NCAPG locus affecting size in the horse [2,20,66]. These genes were associated with size and growth traits in humans and livestock animals. It was shown that the DCAF16-NCAPG region is a QTL for the average daily gain trait in cattle [67]. Furthermore, the region including NCAPG and LCORL loci was reported to be involved in the determination of size, including both height and mass in cattle [68][69][70][71][72] as well as in pigs [10]. The recent studies confirmed also a significant link between these loci and body height in several horse breeds, including Belgian draft horses [2,20,[73][74][75][76].

Conclusions
Summarizing, in this study we applied a population differentiation-based approach to detect genomic regions divergently selected between six breeds representing light, draft and primitive horses. The analysis of the most pronounced selection signals allowed us to detect several genes connected with processes potentially important for breeds phenotypic differentiation and associated with energy homeostasis during physical effort, heart functioning, neuron development, fertility, disease resistance and motor coordination. All these processes are potentially important for the traits selected in the analyzed breeds, especially for athletic performance, health and gait quality. Our results also confirmed the previously described association of loci on ECA3 and ECA11 with the regulation of body size in our draft and primitive (small size) horses. The efficiency of the applied statistical approach was also confirmed by the identification of a strong selection signal in the blue dun Polish Konik horse at the locus of TBX3 gene, which was previously shown to be causative for dun coat color dilution.