An Overview of Ten Italian Horse Breeds through Mitochondrial DNA

Background The climatic and cultural diversity of the Italian Peninsula triggered, over time, the development of a great variety of horse breeds, whose origin and history are still unclear. To clarify this issue, analyses on phenotypic traits and genealogical data were recently coupled with molecular screening. Methodology To provide a comprehensive overview of the horse genetic variability in Italy, we produced and phylogenetically analyzed 407 mitochondrial DNA (mtDNA) control-region sequences from ten of the most important Italian riding horse and pony breeds: Bardigiano, Esperia, Giara, Lipizzan, Maremmano, Monterufolino, Murgese, Sarcidano, Sardinian Anglo-Arab, and Tolfetano. A collection of 36 Arabian horses was also evaluated to assess the genetic consequences of their common use for the improvement of some local breeds. Conclusions In Italian horses, all previously described domestic mtDNA haplogroups were detected as well as a high haplotype diversity. These findings indicate that the ancestral local mares harbored an extensive genetic diversity. Moreover, the limited haplotype sharing (11%) with the Arabian horse reveals that its impact on the autochthonous mitochondrial gene pools during the final establishment of pure breeds was marginal, if any. The only significant signs of genetic structure and differentiation were detected in the geographically most isolated contexts (i.e. Monterufolino and Sardinian breeds). Such a geographic effect was also confirmed in a wider breed setting, where the Italian pool stands in an intermediate position together with most of the other Mediterranean stocks. However, some notable exceptions and peculiar genetic proximities lend genetic support to historical theories about the origin of specific Italian breeds.


Methodology
To provide a comprehensive overview of the horse genetic variability in Italy, we produced and phylogenetically analyzed 407 mitochondrial DNA (mtDNA) control-region sequences from ten of the most important Italian riding horse and pony breeds: Bardigiano, Esperia, Giara, Lipizzan, Maremmano, Monterufolino, Murgese, Sarcidano, Sardinian Anglo-Arab, and Tolfetano. A collection of 36 Arabian horses was also evaluated to assess the genetic consequences of their common use for the improvement of some local breeds.

Conclusions
In Italian horses, all previously described domestic mtDNA haplogroups were detected as well as a high haplotype diversity. These findings indicate that the ancestral local mares harbored an extensive genetic diversity. Moreover, the limited haplotype sharing (11%) with the Arabian horse reveals that its impact on the autochthonous mitochondrial gene pools during the final establishment of pure breeds was marginal, if any. The only significant signs of genetic structure and differentiation were detected in the geographically most isolated contexts (i.e. Monterufolino and Sardinian breeds). Such a geographic effect was also confirmed in a wider breed setting, where the Italian pool stands in an intermediate position together with most of the other Mediterranean stocks. However, some notable exceptions Introduction A great variety of horse breeds developed, over time, in various Italian cultural contexts and geographic habitats. Light horses (hotblood/warmblood; withers height: 148-170 cm) are typical of the drier central and southern regions, while the northern wet regions are characterized by heavy horses (coldblood; withers height: 148-165 cm). Harsh conditions of marginal and insular areas fostered the smaller size horses (ponies; withers height: 115-147 cm). Until the 1940s horse breeding was mainly linked to the production of animals for military purposes, agricultural labors, forestry and local carriages. Beginning in the fifties, the mechanization of agriculture and transportation caused a rapid decline of horse breeding; such trend has been currently mitigated by a renewed cultural interest in rural life. Most recently, the increased leisure-time physical activities have resulted in a growing consideration and demand for "riding horses"; riding refers to the use of horses for leisure/pleasure purposes including competition events (jumping, driving, flat racing, etc.). In Italy, the demand for riding horses includes: cosmopolitan breeds (Thoroughbreds and Arabs), many autochthonous Italian breeds described in Studbooks, many local Italian populations with "Anagraphic Register of equine populations identifiable as local ethnic groups" and several crossbreedings between all of them.
Phenotypic traits and genealogical data are often insufficient to ascertain the horse history and origin. Molecular analyses provide a needful and reliable tool that can be employed along with the morphometric approach and traditional breeding strategies for an efficient management of genetic resources [1]. Due to its high mutation rate, lack of recombination and maternal inheritance, the control region of the mitochondrial DNA (mtDNA) is a powerful marker system for phylogenetic and phylogeographic studies. MtDNA studies on horses have proved to be capable to identify intra-and interbreed relationships [2][3][4][5][6][7][8][9], particularly when combined with historical information [2,10,11]. Unfortunately, most previous studies have been carried out on a very short and hypervariable segment (~350 bp) of the control region (HVSI: nucleotide positions 15,469-15,834) [10,[12][13][14][15]. In 2013 Khanshour and Cothran [9] have shown in Arabian horse populations that the degree of informativeness can be extensively improved by increasing the length of the analyzed mtDNA control-region sequence. Most recently, similar to many other livestock species [16][17][18] also the sequence variation of the entire equine mitogenome was investigated [19][20][21], contributing extensively to our current understanding of the domestication process. Seventeen different mtDNA haplogroups were identified in domestic breeds leading to the conclusion that the domestication of the wild horse, Equus ferus, has been a widespread process that persisted for several thousands of years (throughout the Neolithic) and occurred at different places, mostly centered in the Western Eurasian steppes [22], as also suggested by archeological evidences [23]; but possibly also in Western Europe [19]. The spread of domestic herds across Eurasia involved an extensive introgression from the wild; in particular, it has been proposed that the horse was introduced in Italy with the arrival of Indo-European populations in the Bronze Age and used for military, riding and agricultural purposes [24].
Despite the pivotal role that horses have played in human society's development, multiple aspects of modern breeds' origin and history remain unclear. In Italy, several local breeds have reached a national recognition due to their phenotypic characteristics and to particular sociocultural and productive peculiarities (a complete list is available at http://www.fao.org/dad-is/). However, genetic studies of Italian horse breeds are still limited [25][26][27][28] and there are only a few examples of maternal inheritance investigations, but they generally focused on a specific geographic area [14,29,30] or included a limited number of samples per breed [31,32].
To obtain a more comprehensive overview of the Italian horse mitochondrial gene pool we have here determined and phylogenetically analyzed the mtDNA control-region variation of 407 horses from ten of the most important Italian riding horses (including hotblood/warmblood horses and ponies): Bardigiano, Esperia, Giara, Lipizzan, Maremmano, Monterufolino, Murgese, Sarcidano, Sardinian Anglo-Arab and Tolfetano (Fig 1 and Table 1).

Results and Discussion
An overview of the mtDNA sequence variation More than half of the mtDNA control region, precisely 610 bps (from np 15491 to np 16100), was sequenced in all 407 Italian samples. An additional collection of 36 Arabian horses, which were heavily used in the improvement of some Italian breeds, was analyzed and used as an external reference group. Overall, we identified from seven to 52 haplotypes in the different Italian breeds and 14 in the Arabian horses, summing up to a total of 126 distinct haplotypes. Seventy-eight were unique (found only in a single Italian breed) while 34 were shared among different Italian breeds. Only four haplotypes were in common between Italian and Arabian horses (S1 Table) and these might represent the legacy of recent maternal gene flow from Arabian horses into Italian breeds. Taking into account that the four haplotypes encompass only eleven horses [Maremmano (5), Lipizzan (3) and Sardinian Anglo-Arab (1) horses, Bardigiano (1) and Esperia ponies (1)], this observation indicates that the Arabian horse contributed at  Table 2. most marginally in the formation of the modern mtDNA gene pools of these breeds; this is in agreement with the scenario that the introgression from the Arabian horse was stallion-mediated.
The overall sequence alignment of Italian samples revealed 91 polymorphic sites (S), represented by 90 transitions and three indels (two deletions at nps 15532 and 15868, and one insertion at np 16063; we found also a transition at nps 15868 and 16063) ( Table 2).
The analysis of molecular variance (AMOVA) established that the majority of the observed variance is attributable to differences among samples within breeds (93.57%). However, the remaining among-breeds' component of genetic variation (6.43%) could be associated with a significant value of the fixation index (F ST = 0.064, p-value < 0.001). We examined different possible structures by establishing and comparing different population groups, which were artificially created by considering various features in turn, such as: breeding conditions (semiferal vs controlled); height at the withers (ponies vs others); geographic prevalence (e.g. indigenous of Sardinia vs others). Actually, the only significant sign of genetic differentiation was found between the two local Sardinian breeds (Giara and Sarcidano) and the other breeds (Table 3), particularly when considering Monterufolino as a third independent group (F CT = 0.063, p-value < 0.001). This is consistent with the genetic distances between populations: Monterufolino is genetically the most distant breed, while Giara and Sarcidano are confirmed as the most closely related (S1 Fig; pairwise distances above diagonal and Nei's distances below diagonal).

Phylogenetic analyses and haplogroup classification
The reconstructed network of the control-region sequences (Fig 2) clearly defines some major branches corresponding to the horse haplogroups identified so far [19].
The haplogroup classification was confirmed and refined through an accurate analysis of diagnostic mutational motifs identified in the control-region haplotypes (S1 Table). As expected, the Przewalski's specific haplogroup F was absent in our batch of domestic horses. The stochastic distribution of our haplotypes among the remaining 17 haplogroups confirms that it is not possible to identify breed-specific mitochondrial clades, at least at this level of resolution. About one fourth (N = 109) of the 407 Italian samples carries the haplogroup L mutational motif (nps 15494, 15495 and 15496), which was often reported as the most common in a wide range of Italian (Bardigiano, Giara, Haflinger, Italian Heavy Draught, Italian Trotter, Lipizzan, Maremmano, Murgese, Sanfratellano, Sarcidano, Sicilian Indigenous and Ventasso horse) and Western Eurasian breeds [6,8,19,[29][30][31][32][36][37][38]. Haplogroup L is also the most common in seven Italian breeds analyzed in this work, while it is absent among the Arabian samples (Table 4). Φ ST = 0.097 0.000*** a Φ CT = variation among groups divided by total variation, Φ SC = variation among sub-groups divided by the sum of variation among sub-groups within groups and variation within sub-groups, Φ ST = the sum of variation groups divided by total variation. b ns = P > 0.05 The second most common haplogroup was G (19.4%) with the highest values in Giara (56.4%) and Sarcidano (26.7%), followed by I (11.3%), which peaks in Sarcidano (40.0%),  followed by Giara (20.5%) and Esperia ponies (21.4%). According to the literature, haplogroups G and I should be more common in Asia and the Middle East, respectively [19]. The highest number of haplogroups was identified in the Maremmano breed (N = 16), followed by Bardigiano (N = 10) and Murgese (N = 10). As for the "insular" stocks, Giara and Sarcidano present only the major haplogroups (G, I, L, and M), while Sardinian Anglo-Arab displays a wider range of haplogroups, including A (1.9%), B (3.7%), E (11.1%) and N (1.9%). These data confirm the close genetic relationships among the Sardinian horse populations, especially between the Sarcidano and Giara breeds that share the same haplogroups and often the same haplotypes, as displayed in the presented network (Fig 2). Such a reconstructed network, based only on local Italian breeds and control-region data, allowed to date the mtDNA haplogroups to very ancient times (Table 5).
In order to graphically display (and summarize) the mitochondrial relationships among the analyzed breeds, we performed a principal component analysis (PCA)-a method that considers each haplogroup as a discrete variable and allows a summary of the initial dataset into principal components (PCs). After variables reduction to PCs (haplogroup frequencies based on different haplotypes, S2 Table), the coordinates of the observations for the eleven populations were reported in a two-dimensional plot representing the horse genetic landscape of Italy (Fig 3).
The outlier position of Monterufolino is confirmed particularly along the first PC, while the second PC splits the Arabian horses from the other breeds. Moreover, Sardinian breeds clearly separate from Italian ones as also shown by the centroids (the centroid is the geometric center of a two-dimensional shape, as depicted here by breeds typical of a certain macro-geographic area, and it is calculated as the arithmetic average position of all points/breeds). It is well known that the mtDNA inheritance might be influenced by major stochastic processes, which in turn can be amplified by local bottlenecks and founder effects. Actually, the gene pools of geographically isolated populations are dramatically shaped by initial founding events (particularly in a uniparental system such as the mtDNA) that usually lead to low level of within-population genetic distances, as those reported for Giara and Sarcidano by both the PCA and the AMOVA (Table 3), in agreement with some previous studies [31]. The ostensible partial disagreement with the results reported by Morelli et al. [29], which considered Giara and Sarcidano as two distinct gene pools, could reside in the absence of two of the four haplogroups (I and M) shared by our Giara and Sarcidano samples. Moreover, we identified six different haplotypes shared by Giara and Sarcidano horses (one restricted only to these two breeds), which sum up to 84% of total samples (58 out of 69; S1 Table and Fig 2). In order to determine whether the overall haplogroup frequencies in the Italian horse populations were indeed different from those of other populations worldwide, we repeated the PCA by including other GenBank data (S3 and S4 Tables). The overall plot, depicted by PCs 1 and 2 (Fig 4) confirms the outlier position of Monterufolino and the Sardinian horses, but at the same time highlights an overall geographic pattern from Northern Europe to Eastern Asia, as shown by the centroids position of each macrogeographic area.
The Italian breeds stand in an intermediate position together with most of the other Mediterranean stocks. The only notable exceptions are represented by the Bardigiano, which shows possible influences from Northern Europe, and particularly by the Murgese that seems to be closely related to the Asian breeds.

The mtDNA peculiarities of some Italian breeds
A strong founder effect is evident in Monterufolino, the only Italian breed with a haplotype diversity lower than 0.8 and placed in an outlier position in both the Italian and the Eurasian population contexts (Figs 3 and 4). Such a peculiar gene pool could be easily connected to the breed's history. In the nineties, its total population counted less than ten individuals [34] and we were able to identify the considerable number of seven distinctive founding mares.  Table) from the eleven breeds analyzed in this study. The rarest haplogroups (with overall frequencies 0.5%) H and K were phylogenetically grouped with the corresponding sister clades I and J, respectively. The geographic labels, indicated in bold, represent the centroids of breeds typical of Italy (in blue) and Sardinia (in green). The PCA analysis also revealed a peculiar localization of the Bardigiano pony within a Northern European genetic context, which was never reported in previous analyses (Sabbioni et al. 2005) [39]. This uniqueness among the Italian breeds could be explained by both its phenotype and its history. The Bardigiano is considered indigenous of Italy [34], but its origin could be traced back to the horses ridden by northern invaders during their incursions into the Italian Peninsula in the V century [40]. This original maternal legacy survived the recent dilution process due to the introduction of a diverse range of stallions from various breeds after World War II, especially Franches Montagnes.
Another peculiar position among Western Asian breeds is occupied by the Murgese horse, an ancient breed originated in Apulia during the Spanish domination (XVI-XVIII centuries). It is thought that the breed was developed by crossing a Spanish stock (partially Arab) with native horses, which share the same origin with the Neapolitan horse. Afterward a strict selection began in the early nineties and probably some matrilines from abroad were introduced. We identified 21 different haplotypes from the 46 presumed founding mares and based on our data they were mostly brought from Asia.
A further interesting finding is the clear separation between the Lipizzan horses from Italy and those from abroad (Fig 4). The Lipizzan breed dates back to the XVI century, when it was bred at Lipica (now in Slovenia). In the following centuries several maternal lines have been developed from eight traditional Lipizzan studs [4,41]. Strict breeding rules were followed to keep separate different genetic reserves as demonstrated from the above mentioned peculiar PCA position of the Lipizzan horses from the Italian breeding farm of Monterotondo, whose eleven founding maternal lines are completely represented by the eleven different haplotypes reported in S1 Table. Conclusion Besides confirming a widespread mitochondrial variability in Italy, as already reported [29,31,32], this study provides a more comprehensive reassessment of the mitochondrial genetic relationships among ten typical Italian hotblood/warmblood horse and pony breeds. The different mtDNA haplotypes are not preferentially distributed among breeds. The only significant haplotype-based population structure was recognized when considering as a possible differentiation factor the (geographic) isolation of the Monterufolino and Sardinian breeds. The same four haplogroups were identified in the Giara and Sarcidano breeds (often along with the same haplotypes), whose mitochondrial similarities were confirmed in a wider Eurasian context through the PC analysis. The outcoming mtDNA genetic landscape of Eurasia shows a clear geographic pattern and highlights a group of closely related intermediate breeds mostly from the Italian Peninsula. This genetic feature likely reflects the geographic position of Italy, in the center of the Mediterranean Sea, and its cultural/economic past as a crossroad of migratory waves from the Western Asian coasts to Continental Europe. It is worth nothing that Italian breeds show a frequency of haplogroup L (23.9%) which is intermediate between those recorded in Western Asia (18.1%) and in Continental Europe (31.1%) (S5 Table). Moreover, an additional clue of a putative east-west direction of the gene flow is given by the overall haplogroup frequencies of Italian horses, which are somehow more similar to the breeds from South-West Asia (χ 2 : 27.5; p-value: 0.006) than to those from Continental Europe (χ 2 : 74.8; pvalue: <0.001), as already indicated [32]. These findings probably reflect the overall mtDNA legacy of the ancestral mares (of eastern origins) that long time ago (see age estimates in Table 5) were probably used at the initial stages of breeding selections. Those mitochondrial lineages were also preserved during the final establishment of pure breeds that was mainly reached through sex-biased breeding practices [42], which often involved the intensive use of few selected external stallions [43,44]. Thus, the impact on the original mtDNA gene pool could have been marginal, as also testified by the only four haplotypes shared between the Arabian horses and the ten Italian breeds here analyzed in spite of the well-recognized use of the Arabian stallions to revitalize some Italian breeds. As for the recent times, our mtDNA data lend also genetic support to some historical theories about the origin of some Italian breeds.
In conclusion, we confirm that the mitogenome is an appropriate resource in studies aiming to reconstruct the maternal ancestral origins of local breeds and to evaluate genetic continuity with the original stocks.

Ethics statement
All experimental procedures were reviewed and approved by the Animal Research Ethics Committee of the Universities of Perugia and Pavia in accordance with the European Union Directive 86/609.
Total DNA was extracted from blood samples by automated extraction using the Mag-Core1 Automated Nucleic Acid Extractor, following the provided protocol.

Mitochondrial DNA sequence analyses
Sequences (610 bps from np 15491 to np 16100) were assembled and aligned to ERS using Sequencher™ 5.10 (Gene Codes Corporation). Whenever electropherograms showed ambiguities, new PCR amplifications and sequencing reactions were performed. All mtDNA D-loop sequences determined in this study were deposited in GenBank with accession numbers KU711082-KU711507.
Several mtDNA sequence variation parameters were estimated by using DnaSP 5.1 software [46]. Analysis of MOlecular VAriance (AMOVA) and pairwise Fst calculations were performed using the Arlequin v. 3.5 software package [47]. The statistical significance of the values was estimated by permutation analysis using 100 replications. Intra-as well as inter-population comparisons were performed based on the number of pairwise differences between sequences and figured using an Arlequin integrated R script (http://www.rproject.org/).
The evolutionary relationships among haplotypes were visualized through the construction of different median-joining networks using Network 4.6 (www.fluxusengineering.com), one for each haplogroup (C, D, E, G, L, Q, and R) and macro-haplogroup (A'B, H'I, J'K, M'N, and O'P), then parsimoniously connected by hand according to mutational diagnostic motifs identified by Achilli et al. [19]. The evolutionary distances were computed as averaged distance (ρ) of the haplotypes within a clade from the respective root haplotype, accompanied by a heuristic estimate of SE (σ). All positions containing gaps and ambiguous data were eliminated from the dataset. Estimate of the time to the most recent common ancestor for each cluster was calculated using a corrected age estimate of about 2.96 x 10 −7 per nucleotide per year in the whole control region [19], which corresponds to 5,540 years per substitution over the sequenced region of 610 bps.
Principal component analyses (PCA) were performed using Excel software implemented by XLSTAT, as described elsewhere [48]. Two PCA were carried out one by considering only our sample; the other by including the available horse mtDNA records obtained from GenBank. The PCA is a widely used dimension-reduction method which seeks to explain the variance of multivariate data by a smaller number of variables (the principal components, PCs), which are linear functions of the original variables, which in this case are the haplogroup frequencies. Considering the high degree inbreeding, which mostly characterizes common selection strategies, the haplogroup frequencies used as source data for the PCA were calculated by considering only different haplotypes within the same breed. The rarest haplogroups were phylogenetically grouped and among the large plethora of available data, only those represented by at least 15 different haplotypes were included in the analysis in order to increase the statistical significance. After having reduced the variables (haplogroups) to PCs, we reported the coordinates of the observations (breeds here and elsewhere analyzed) in two-dimensional graphics representing the genetic landscape of Italy and West Eurasia.  Table 2. (TIF) S1