Genetic Structure and Gene Flows within Horses: A Genealogical Study at the French Population Scale

Since horse breeds constitute populations submitted to variable and multiple outcrossing events, we analyzed the genetic structure and gene flows considering horses raised in France. We used genealogical data, with a reference population of 547,620 horses born in France between 2002 and 2011, grouped according to 55 breed origins. On average, individuals had 6.3 equivalent generations known. Considering different population levels, fixation index decreased from an overall species FIT of 1.37%, to an average of −0.07% when considering the 55 origins, showing that most horse breeds constitute populations without genetic structure. We illustrate the complexity of gene flows existing among horse breeds, a few populations being closed to foreign influence, most, however, being submitted to various levels of introgression. In particular, Thoroughbred and Arab breeds are largely used as introgression sources, since those two populations explain together 26% of founder origins within the overall horse population. When compared with molecular data, breeds with a small level of coancestry also showed low genetic distance; the gene pool of the breeds was probably impacted by their reproducer exchanges.


Introduction
The horse is a species raised for very diverse purposes. During the last 100 years, in the industrialized world, strong changes occurred in the ways of using horses: up to World War I, horses were mainly used for war, carriage and agricultural work; now, horses are mainly used for sport, leisure, hobby and even as a companion animal. These changes had two main consequences: first, the actual population size of racing and riding breeds have largely increased, whereas many draught breeds are now endangered; second, horse breeders have strongly modified their breeding goals. Another consequence is that outcrossing was commonly practiced for some breeds, and is still practiced, to improve performance of international and local populations. Distinction has to be made in relation to studbooks regulations. Indeed some studbooks are closed (e.g., French Trotter, Arab, and Thoroughbred breeds), others give allowance to introduce new gene lines and stallions from other breeds. As an example, the Anglo-Arab breed is a former cross between Arab and Thoroughbred breeds, used here because of their high performance in endurance and speed, respectively [1].
Several studies have been performed to assess the impact of outcrossing on a specific or a limited number of horse breeds, based on genealogical [2][3][4][5] or molecular approaches [2,[6][7][8][9][10]. Molecular tools are especially useful to measure genetic differentiation and distance between breeds, and assess a theoretical amount of admixture within a given population related to some geneflows [11]. However, they may still have a lack of precision, when considering the exact amount of gene flow at a given time scale in comparison to a documented pedigree data base. Yet completeness and correctness of genealogical information constitutes the main limitations of pedigree approaches [12]. Indeed, it is difficult to study past gene flows among a large number of breeds, since studbooks are generally independently established, from one breed to another, even if several indicators are available for that, such as probability of gene origins [13] or approximations of Wright-statistics [14].
This multiracial research was aimed at studying gene flows considering the whole French horse population, using the database of the French Horse Institute (Institut Français du Cheval et de l'Equitation, IFCE), which registers all the horses raised in France and some of their ancestors of foreign origin (between 2 and 3 generations on average). Among others, our goal was to explain how a breed can contribute to these flows or be affected by them. A comparison between breed genealogical and molecular distance indicators was also conducted.

Genealogical database
The entire French horse database SIRE (French Equine Information System), which includes, according to IFCE, between 90 and 95% of horses raised in France, was analyzed in this study. It includes 139 studbook designations, corresponding to breeds or breed subpopulations (varieties) defined according to national or international studbook rules. Those designations are categorized by the IFCE in three different breed groups: (1) Race and riding horses, (2) Pony breeds and (3) Draught horses.
To define a ''reference population'', we chose the group of animals born in France from 2002 to 2011, which corresponded to a total of 732,176 animals, all breeds and designations considered. Based on equivalent complete generations (EqG) [4], we removed from this group animals without origins provided (183,366 horses), as well as 13 studbook designations with average EqG lesser than 2 generations (1,190 horses). Then, the genealogical database consisted in the reference population as defined above (547,620 horses born in France, for 97 studbook designations) plus all known ancestors of this population (360,862 horses, 71,547 of them being born outside France).
For simplicity sake, studbook designations with less than 200 individuals registered over the 2002-2011 period (corresponding to foreign breeds) were grouped together into three foreign ''origins'' according to their respective group: (i) Other foreign race or riding breeds (23 designations), (ii) Other foreign pony (3 designations) and (iii) Other foreign draught horse (1 designation). Studbook designations corresponding to the same breed were grouped together, with only two exceptions. The first exception was for the case of three Anglo-Arab designations, differentiated in their studbook rule according to the percentage of Arab genes within individuals. The second one was for the case of two subpopulations of the Welsh Pony breed, merging 2 and 4 Welsh designations according to their type (Pony or Cob/crossed individual). The French designation ''AQPS'' (''Autre Que Pur-Sang'', literally other than Thoroughbred''), which denotes racing horses related to Thoroughbred, but not recognized due to regulation reasons (Artificial Insemination, non-pure Thorough-bred…), was also considered as an independent origin. Finally the reference population corresponded to 55 breeds, varieties and groups of breeds defined here as ''origins''.

Probability of gene identity and gathering
We analyzed genetic structure first by computing average inbreeding F I and coancestry C IJ coefficients [15] for each subpopulation. Due to computing constraints for populations with a large actual population size, C IJ was estimated by averaging coancestry coefficients over 100,000 pairs of individuals randomly sampled within subpopulations I and J, respectively. In order to characterize genetic structure within a subpopulation I, we computed fixation index F IS-I considering the following equation [16], We also differentiatedF F andC C averaged within all subpopulations, and C C as coancestry averaged over the entire metapopulation [16], considering following equations, N I and N T being the population size of subpopulation I and the entire metapopulation, respectively, In order to compute F-statistics, we used the following equations, C C F-statistics indexes were calculated using the considering either the 3 horse breed groups (Race and riding horses, Pony and Draught horses), or the 55 breed origins as subpopulations. Identity By Descent (IBD) coefficients such as F and C are considered to be very sensitive to incomplete pedigree information, (e.g., [12]). In order to study the relationships between breed origins, taking into account possible differences in pedigree knowledge, we used equivalent complete generations EqG to adjust coancestries between each couple of origins, according to the method developed by Cervantes et al. [17] to compute coancestry rates. Considering two origins I and J, two individuals i and j sampled within each one, EqG i and EqG j their respective equivalent complete generation and C ij their coancestry coancestry rate, DC ij can be computed using the following equation: For each couple of origins, average coancestry rate DC IJ was computed by averaging coancestry rates over 100,000 individual pairs randomly sample within both origins. A hierarchical clustering was carried out on the basis of the average of these coancestry rates computed among the 55 origins, using the Ward method, distances being determined on the basis of the coancestry rates (12DC IJ ), and the phenogram of relations being produced using the R hclust function.
Considering 32 horse breeds with genotype available from the Leroy et al. [6] study, we compared coancestry rates with Reynolds et al. [18] molecular distances computed for these 32 horse breeds.

Probability of genes origin
On the basis of the hierarchical clustering results, origins were grouped to make a focus on gene flows existing among Race and riding horse populations. Pony and Draught horse origins were grouped into their respective breed group. The 15 Race and riding horse origins with reference population containing less than 5,000 horses and Certified race and riding origins were gathered into a single group (OTHERS), as well as the three American breeds (Quarter Horse, Paint Horse and Appaloosa). Finally, 11 Race and riding breeds and groups of breeds were studied, in relation with the other two horse groups (Pony and Draught horses).
Ancestral gene flows (parental and founder) were studied considering either the three horse groups (Race and riding horses, Pony and Draught horses) or the 13 groups, using probability of gene origins. The probability of gene origin is the probability for a gene taken at random within the reference population to come from an ancestor or founder [12]. We consider here a founder as an ancestor of the reference population without any known parent.

Demographical parameters and pedigree completeness
The 55 origins under study had different reference population sizes, ranging from 10 (Other foreign draught group breeds) to 109,551 individuals (French Trotter breed). If we consider the three breed groups, the Race and riding horse group had the largest reference population size with 339,574 horses. For the

Inbreeding, coancestry and F-statistics
Within each breed group (Race and riding horses, Pony, Draught horses), average inbreeding was found to be equal to 1.79, 1.41 and 1.26% respectively, while coancestry was found to be equal to 0.49, 0.30 and 0.41% respectively. However, according to Table 2, average coancestry coefficients between horse groups were smaller than within-group coancestry coefficients (under 0.02%), underlining differentiation between horse groups.
By contrast, inbreeding and coancestry coefficients were not found to be so well differentiated when considering breed origins. F ranged from 0.24 (Half Bred Arab) to 6.13% (Poitevin Mulassier breed) and C ranged from 0.14 (Certified race and riding origin) to 8.25% (Other foreign pony). In general, higher values were found for breeds with small actual population size and higher EqG. The high coancestry level found for the Other foreign pony origin was, however, due to a sampling effect, with 20 of the 35 individuals sharing a common sire.
These contrasts between inbreeding and coancestry can be well illustrated when considering average F-statistics ( Table 3). The higher value was found for overall species F IT (1.37%), a slightly lower value being found for subpopulation F IS when considering breed groups (1.16%). This indicates a relative genetic structure remains within breed groups. By contrast, at the origin (i.e. breed) level, F IS was found to be slightly negative (20.07%), underlining that breeds constitute in general populations with almost no genetic structure. This was, however, not always the case, since a few origins showed F IS larger than 1% (Camargue and Arab breeds, the two Welsh origins and the Lusitano breed).

Genetic relations and gene flows within and between horse populations
Over the 1485 average coefficients of coancestry computed across the 55 breed origins, 691 were different from zero, ranging from 10 27 to 1.44% (AQPS and Thoroughbred). Each origin showed at least 2 non zero coefficients of coancestry with other breed origins ( Table S3).
The phenogram based on those coancestry rates allowed empirically assigning most of the origins into their respective breed group (Figure 1), some pony origins (Henson, Fjord), however, being found with Race and riding horse origins. Race and riding horse origins showed more contrasted relations than other breed groups, probably due to larger coancestry coefficients among origins, in relation to larger amounts of gene flow. Figure 2 shows founder and parental gene flows between the 3 breed groups, origins and flows from individuals born in foreign countries being considered separately. Considering breed composition, Draught and Race and riding horses groups were found to be quite homogeneous with 99.5% of intern founder origins. By contrast, 12.4% of founder origins in the Pony breed group, were  Table S1). About current gene flows, 6.1% of the ponies had parents from the Race and riding horses group. A large number of parents seem to be born outside of France for Race and riding horses and Ponies, with animals of foreign country origin accounting for 17.9% and 16.9%, respectively.
Breeds were found to be submitted to various gene flows, the percentage of genetic variability being explained by external introgression ranging from almost 0 to 100% according to origins, considering either parents or founders (Table S2). In particular, 9 origins over the 55 showed no introgression, considering either parents or founders. The percentage of genetic variability being explained by parents and founders born in France ranged from 16 (Franche-Montagne, a breed of Swiss origin) to 100% and from almost 0.5 (Lusitano, a breed of Portugese origin) to 100%, respectively. The amount of parental introgression was larger when considering sire pathway (21.1% on average over breeds) than when considering dam pathway (14.7% on average over breeds). The Camargue breed was found to be the only origin with 100% of parents and founders born in France. Figure 3 illustrates the complexity of gene flows among Race and riding horses. Only a small number of breeds seemed closed, namely Arab, Thoroughbred, Camargue and Merens breeds. On the contrary, composite horse populations were found to be more or less highly impacted by the former, i.e. founder, and current, i.e. parental, (AQPS, Half Blood Arab) external influences. Some breeds also showed an intermediate situation considering either the intensity or the time when cross-breeding had occurred, such as the French Trotter: 17% of the founders' origins were explained by Trotters from other countries, 100% of parental origins being, however, related to the French Trotter.
Breeds also seemed to be used with wide range of intensity as an introgression source. Thoroughbred and Arab breeds were in particular largely used for cross-breeding.While representing 4.2 and 15.1% of Race and riding horses population, these two breeds within Race and riding horses explained 5.6 and 18.3% of parental origins, and 7.9 and 32.9% of founder origins, respectively. At the species scale, both breeds explain altogether 26% of founder origins.

Comparison between genealogical and molecular data
Based on the comparison of 32 breeds in common with the study by Leroy et al. [6], coancestry rates and Reynolds' molecular distances were found to be negatively correlated (Spearman correlation of 20.39, P,0.0001 based on the Mantel test). Onehundred fifty-one coancestry rates over 496 were found different from 0, ranging from less than 10 29 to 0.2% per generation (Barb and Arab-Barb breeds). As illustrated by Figure 4, breeds with a minimum coancestry rate also showed low genetic distance, and for instance, each of the 22 pairs of breeds with coancestry rate larger than 0.005% showed genetic distance smaller than 0.05, i.e. less than half the average distance computed overall pairs (0.1).

Discussion
Genetic structure within the horse species One of the aims of the study was to assess the different levels of genetic structure within the species, in a single country, on the basis of pedigree information. Computation of adapted F-statistics, from the species to the breed scale, constitutes an interesting tool for such a purpose. With a value of 1.37%, F IT computed for the species indicated a certain amount of genetic structure, as expected for a domestic species divided into several breeds. Considering the F IS estimated for the three breed groups, their respective values, lower than the overall F IT , but remaining larger than 0.8%, seem to confirm that they constitute a globally adequate classification, with, however, genetic structure remaining within each, as expected.
Finally the average F IS computed for the 55 origins (which roughly corresponds to the breed level), around 20.07%, also confirms that horse breeds constitute in general populations without genetic structure. Some exceptions were found, such as Arab, Camargue, Welsh and Lusitanian breeds, with F IS larger than 1%. For the Arab breed, this is probably related to inbreeding practices (intentional mating between close relatives) within the breed [2]. In the Spanish Arab Horse, Cervantes et al. [3] found indeed F IS close to 2%, in relation to preferential mating between relatives. By contrast, the existence of several subpopulations within the Welsh breed (Wahlund effect) explained the large F IS within the Welsh Cob and Welsh Pony origins (1.4 and 1.1% respectively). This structure level was, however, much smaller than if considering all Welsh studbook designations as one, F IS being then equal to 2.3% (data not shown). Similarly, the Wahlund effect was investigated in the Norikan draught horse by Druml et al. [20], in relation with coat color differentiation. For most of the other breeds, negative values of F IS were estimated, which could be explained by a limited population size (Highland pony and Henson breed for instance), a limited effective population size inducing a decrease of F IS , eventually to negative values [14]. Negative F IS is also expected for populations constituted by F1 crossed horses, with low inbreeding in comparison to coancestry (Half-Breed Arab).
When regarding studies using molecular markers to compute Fstatistics on horse breeds with different origins [7,8,21], molecular F IT estimation was found much larger (around 12%), mainly in relation to higher F ST values, genealogical F ST being probably largely underestimated due to lack of pedigree information. In those three studies, breed F IS values were significantly (P = 0.0001) larger (on average 1.5% considering 73 breeds), in comparison to those determined in our study (on average 20.4% considering our 55 origins). By contrast, pedigree knowledge being important enough to assess genetic structure within most of the breeds studied in the present paper, such differences may be related to bias in molecular analysis, due to sampling or existence of null alleles when using microsatellites for instance, leading to an eventual overestimation of observed homozygosity. A more recent study using SNPs [10] showed mean F IS close to 0 (0.7% on average), higher F IS values being found in Arab, Shetland and Lusitano horse breeds, similarly to our study.

Gene flow among horse breeds
This study allowed precisely measuring the current, or relatively recent, gene flows, existing among horse breeds. In agreement with F IS values, these gene flows exist preferentially within breed groups (Race and riding horses, Pony, draught horses), as investigated previously by Aberle et al. [5] in German Heavy horse breeds, or Cervantes et al. [3] within Spanish sport breeds. Yet as illustrated by Figure 2, some reproducers exchanges may occur, in particular from Race and riding horses to Pony breeds. Comparison between molecular and genealogical indicators confirms that breeds with regular reproducers exchanges also show molecular similarities (Figure 4). It is therefore not surprising that studies based on molecular markers, showed similarities between breeds related either to pony, draught horse, or race/ riding horse breeds [7,22,23]. Those molecular studies provided however more complete view of breed genetic relationships, as they are not limited by pedigree knowledge.
Breeds themselves show contrasted patterns, considering either the way they impact or are impacted by introgression, the existence of exchange of reproducers between countries, as well as the evolution of gene flows over time (see for instance [24]). In relation to their studbook regulation, some of the origins considered here (e.g., Arab and Thoroughbred breeds) constitute populations closed to foreign influence, i.e. to introgression from other breeds or eventually also from individuals belonging to the same breeds but raised in another country (Camargue breed). Since no external genepool can be used to introduce some genetic variability within those populations, for breeds with limited population size, such as the Camargue breed, it appears important to limit erosion of their genetic diversity, through minimization of coancestry for instance [25]. However, according to our results, most of the horse breeds or populations seem submitted to more or less regular amounts of introgression. Those introgressions may have occurred during the former generations, with the founders belonging to external origins explaining on average 31% of genetic variability over the 55 origins, or on the contrary being continuous, 18% of parental origins being, on average over the 55 origins, related to individuals belonging to other breeds. Some horse origins are actually defined by the fact that they are constituted by first generation crossbred individuals, such as Half Bred Arab origin. It also has to be underlined that a limited part of horses raised in France do not belong to any specific breed. Certified origins or horses without known origins represent about 36% of horses registered in France between 2002 and 2011.This result was similar in Belgium for instance, where 36.5% of horses registered within the country are without origins (source: Belgian Horse Confederation, http://www.cbc-bcp.be, personal communication).
Results of the study show that outcrossing occurs mainly through the male pathway, in general using elite stallions which may be used in several breeds. As an example, Quidam de Revel, a stallion from the Selle Français breed, was used to produce in different riding horse breeds, specially German and Belgian ones (French Horse Riding Federation, http://www.ffecompet.ffe.com, 2013). Yet introgression is not exclusive of sire pathway, and in all the breeds submitted to introgression, outcrossing occurred both on sire and dam pathway.
Although pedigree analyses allow considering evolution of neutral diversity considering an entire population, they do have some limitations either due the extent of pedigree knowledge or to existence of pedigree errors (e.g., [26]). In our case, if on average   individuals had 6.3 equivalent generation known, corresponding to 60 years (considering generation intervals equal to 9.6 years, as shown by Leroy et al. [27]), there is large differences between breeds and populations. For instance, while in most French breeds, systematic registration of foals within SIRE database (including their ancestors as far as possible) began in 1976, while for draught horses it took place in 1988, explaining lower pedigree knowledge for those breeds. As a consequence, time scale considered for IBD coefficients and probability of gene origins is not the same across populations (note it is however corrected for coancestry rates). This may lead to some bias [5] and the influence of foreign breeds (Arab and Thoroughbred breeds for instance) were probably underestimated, especially for breeds with poor pedigree knowledge [3], as founders considered in the study were probably already impacted by gene flows from those two breeds, due to former outcrossing events. The existence of pedigree errors, especially if linked with unofficial outcrossing events, may also have led to some bias in our results. It has however to be stated that paternity testing, which allow to limit such inaccuracies, is widely used in horse since more than twenty five years. In France, parentage control has been developed since the mid-seventies (first through blood typing), and is made systematically in some breeds since 1988 [2]. Since 2001, about 33 000 horses (i.e. more than 50% of the number of foals with origin identified) are genotyped for parentage control each year in France, reducing drastically the extent of pedigree errors. To conclude, it has to be stated that, when pedigree knowledge is not limiting, genealogical analyze may have some advantage in comparison to molecular data as (i) entire populations are considered, (ii) it may give accurate results for a defined generation (such as parental one here) and time scale.
The results of this study may lead to several ascertainments and recommendations regarding horse breeds management. Among others, it clearly underlines a current and large use of the Thoroughbred gene pool (and the Arab gene pool to a lesser extent) at least at the French population scale. As an example, from a genetic point of view, AQPS constitutes a population that is difficult to distinguish from Thoroughbreds, the latter breed explaining 97.3% of its founder origins (Table S4). To a lesser extent, the Thoroughbred constitutes also by far the main origin for the Selle Français and Anglo-Arab breeds (46.7 and 54.4% of founder origins, respectively). Such a result is in agreement with a previous study showing the large use of Thoroughbreds in outcrossing [28]. These two breeds can therefore be considered already as half Thoroughbred from a genetic point of view, previous studies based on molecular results suggesting that those breeds could be genetically even closer [7]. If breeders of these populations want to consider these populations as distinguished from the Thoroughbred, one can recommend to better monitor outcrossing in the future. Note, however, that our approach was based on evolution of neutral variability. In relation to the wide range of selection goals, populations could be more differentiated when considering selected genome areas. Our results also allow us to underline that the different subpopulations of the Welsh breed can be classified into two categories in relation to their type.

Conclusion
Outcrossing constitutes a common practice within domestic animals, and has been punctually investigated for a given breed [28,29]. This is the first study that makes a pedigree analysis considering individuals raised in a country whatever the breed. It is therefore possible to illustrate the complexity and the diversity of gene flows existing within a given domestic species. The genealogical approach is not as accurate as molecular analysis to measure genetic similarity between breeds, yet when enough pedigree knowledge is available, it provides accurate results about breed structure and recent gene flows, providing, among others, useful information for gene association studies [30]. These approaches could be improved by gathering pedigree data from foreign studbooks, underlining eventual differences in breeding practice according to countries, and allowing adequate recommendations to be given regarding management of genetic diversity for international breeds.