Genetic analysis of the endangered Cleveland Bay horse: A century of breeding characterised by pedigree and microsatellite data

The Cleveland Bay horse is one of the oldest equines in the United Kingdom, with pedigree data going back almost 300 years. The studbook is essentially closed and because of this, there are concerns about loss of genetic variation across generations. The breed is one of five equine breeds listed as “critical” (<300 registered adult breeding females) by the UK Rare Breeds Survival Trust in their annual Watchlist. Due to their critically endangered status, the current breadth of their genetic diversity is of concern, and assessment of this can lead to improved breed management strategies. Herein, both genealogical and molecular methods are combined in order to assess founder representation, lineage, and allelic diversity. Data from 15 microsatellite loci from a reference population of 402 individuals determined a loss of 91% and 48% of stallion and dam lines, respectively. Only 3 ancestors determine 50% of the genome in the living population, with 70% of maternal lineage being derived from 3 founder females, and all paternal lineages traced back to a single founder stallion. Methods and theory are described in detail in order to demonstrate the scope of this analysis for wider conservation strategies. We quantitatively demonstrate the critical nature of the genetic resources within the breed and offer a perspective on implementing this data in considered breed management strategies.


Introduction
In recent years there has been substantial interest in quantifying the genetic diversity of equine breeds using pedigree [1], molecular data [2] or a combination of both sources [3] in order to implement effective breed management strategies. The effectiveness of the use of both data types in the understanding and management of rare and native equine breeds have been investigated using both theoretical modelling, and studies of closed studbooks.
The Cleveland Bay horse is a heritage British breed which has its origins in the Cleveland Hills of Northern England [4]. The first studbook was published in 1885, and this contains a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 A more recent study [13] restricted to animals entered in the CBHS studbook between 1934 and 1995, highlighted the limited genetic diversity in the breed and the increasing levels of inbreeding. It was recognised that further in-depth analysis of the status of the breed would be needed in order to aid in the development of breed management plans.
The aim of this study was to develop a comparative analysis of the genetic diversity in the Cleveland Bay Horse population using both genealogical and molecular methods and provide recommendations in order to support a global breed conservation strategy for the Cleveland Bay Horse, whilst sequentially detailing the theory and practice inherent in our approach leading to its applicability in the conservation of endangered breeds and species in vivo.

Pedigree data
Summary data from the CBHS stud books volumes one to thirty eight was published in the Society's Centenary studbook [7]. Names and studbook numbers of all registered horses together with date of birth, sire and dam were listed and this information was digitised in File-maker™ (Filemaker Inc.), to construct an electronic pedigree database for the breed, stored in Filemaker format. Registrations post-1985 have been added to the database on an annual basis up to and including for this study, Volume 38 of the studbook.
The Cleveland Bay Horse Society provided access to a total of 535 microsatellite parentage testing reports. These had been obtained by commercial analysis of hair follicle samples taken from individual animals for registration verification. Samples were tested for a panel of 16 microsatellite markers approved by the International Society for Animal Genetics (ISAG) equine genetics group, by the Animal Health Trust (Newmarket, UK.). Close examination of stud book records, recent Breed Society census records and the microsatellite dataset enabled the identification of a reference population of 402 animals, registered in the 10-year period 1997 to 2006 for which both microsatellite and pedigree data was available.

Pedigree completeness
Data correction routines within the programmes Genes [14] and Eva [15] were used to identify pedigree errors and correct infinite loops. Calculation of Pedigree Completeness was made using PopRep [16]. Using Eqs 1 and 2 to compute pedigree completeness index [17] (I d ): Where k represents the paternal (pat) or maternal (mat) line of an individual, and a i is the proportion of known ancestors in generation i; d is the number of generations measured when calculating the pedigree completeness. Values for pedigree completeness will range from 0 to 1. Where all of the ancestors of an individual are known to some specified generation (d) then I d = 1. However, where one of the parent animals is unknown, I d = 0 [16].

Generation interval
Generation Interval is defined as the average age of the parent animals at the birth of selected offspring with offspring subsequently producing at least one progeny [18]. The generation interval was calculated for each of the four possible lines of descent: sire to son; sire to daughter; dam to son and dam to daughter. The results were averaged for each year group using PopRep [16].

Founder and ancestor representation
Stallion and dam lines, defined respectively as: unbroken descent through male or female animals only from an ancestor to a descendant [3] were identified and detailed founder and ancestor analysis was performed using Endog 4.6 [19] to initially determine Number of Founders.
We make the assumption that all animals with two unknown parents are regarded as founders in this analysis [20]. In addition, if an animal has one known and one unknown parent, the unknown parent is regarded as a founder. The total number of founders contains limited information on the genetic basis for the population. Firstly, founders are assumed to be unrelated, as their parentage is unknown. However, this is most likely not the case in practice. Secondly, some founders have been used more intensely and therefore contribute more, in terms of genetic resource, to the current population than other founders.
The effective number of founders, ƒ e , has been designed to correct for this second shortcoming [21] and is defined as the number of equally contributing founders that would be expected to produce the same genetic diversity as in the population under study. This is computed as: Where q k is the probability of gene origin of the k th founder and N f the real number of founders. In a scenario where every founder makes an equal contribution, the effective number of founders will equal the actual number of founders.
It is more common for founders to contribute unequally, leading to f e < N f . The genetic contributions will converge following 5 to 7 generations [22]. Once this convergence occurs, employing f e as a measure of genetic contribution, will have limited usefulness as will remain constant irrespective of later changes in the population. Pedigrees of more than 7 generations can be characterized with a high effective number of founders even after a severe, recent bottleneck [23]. Whilst the effective number of founders is not an absolute measure of genetic diversity, it forms a basis for comparison of the effective population size (N e ) and the effective number of ancestors (f a ). In a population with minimum inbreeding, f e would be expected to be approximately equal to ½N e [22]. Where f e diverges from this, there is compelling evidence that the breeding structure has been changed since the founder generation [24].
The Effective Number of Founder Genomes (ƒ g ) was proposed by Lacy (1989) to account for unequal founder contributions, random loss of alleles caused by genetic drift and for bottleneck events. It is computed by the equation: Where p i is the expected proportional genetic contribution of a founder i; r i is the expected proportion of alleles from founder i which remain in the current population, and c is the total number of contributing founders [21]. This gives an indication of the number of equally contributing founders with no loss of founder alleles, that would produce the same degree of diversity as found in a reference population [25]. The f g will be smaller than both f e and the effective number of ancestors (f a ), even under minimum inbreeding pressure, and approximately equal to ½N e . The scale of these differences is indicative of the degree of random loss of alleles. Alleles will be lost with every generation of a pedigree and thus f g will decrease as the depth of pedigree increases [24].
The Effective Number of Ancestors (ƒ a ) supplements f e and is calculated from the genetic contributions of ancestors with the largest marginal genetic contributions themselves [20]. Whilst genetic contributions of founders are independent and sum to unity, this is not the case for genetic contributions of ancestors. Indeed, the dam of a highly used sire has >50% contribution of her son, as the same genes are represented in both generations. Boichard et al. (1997) therefore introduced the marginal contribution to the pedigree genetic resource. The ancestors contributing most to the reference population are considered individually in a recursive process. For each round of the recursion, the ancestor with the highest contribution is chosen, and the contributions of all others are calculated conditionally on the contribution of the chosen ancestor. The marginal contribution is the genetic contribution from an individual after correcting for contributions of other ancestors already considered in the recursive process. The sum of marginal contributions of all ancestors will be equal to unity. Ancestors with a large marginal contribution to the reference population will correlate with individuals having genes passed through many descendants [24].
Assessment of the f a helps to account for the losses of genetic variability produced by the unbalanced use of individuals in terms of reproduction within breeding programmes. This is conventional in domestic equines, whilst also accounting for bottlenecks in the pedigree.
The parameter f a is computed as where q j is the marginal contribution of an ancestor j.

Inbreeding analysis
Inbreeding coefficients for each individual animal were calculated using ENDOG [19]. The Increase in Inbreeding (ΔF), is calculated for each generation using ENDOG 4.6 [19], by means of Eq 6.
where F t and F t-1 are the average inbreeding of offspring and their parents, respectively [18]. The Average Relatedness Coefficient (AR) [26] describes the probability that a randomly chosen allele from the whole population in the pedigree belongs to the animal under study. This parameter was calculated using ENDOG 4.6 [19]. The Additive Relationship Coefficient (R yz ), is estimated for two animals through calculating the hypothetical coefficient of inbreeding of an animal produced by mating the two individuals, irrespective of the sex of these assumed parents. The additive relationship between the two animals is then calculated as twice the coefficient of inbreeding of the hypothetical offspring. R yz = 2 F x , where F x is the coefficient of inbreeding of the hypothetical offspring of individual Y and individual Z. This additive relationship has a minimum value of zero and a maximum value of two. The Additive Relationship is twice the value of the coefficient of kinship. The kinship of any two individuals is identical to the inbreeding coefficient of their progeny if they were mated. It is the probability that alleles drawn randomly from gametes of each of the two individuals are identical by descent.

Effective population size
The Effective Population Size from the rate of inbreeding is computed using the classic equation Where the rate of inbreeding per generation is calculated using Eq 6. The Effective Population Size from the number of parents is computed as Where N m and N f are the number of male and female parents, respectively [18]. This method assumes that the ratio of breeding males to breeding females is 1:1, and that all individuals have an equal opportunity to contribute their genetic material to the next generation. This is seldom the case in managed livestock populations and there is a tendency for this method to overestimate N e [16].

Microsatellites
Total DNA was isolated at the Animal Health Trust's laboratories, from hair follicle samples following standard commercial procedures and as previously described [27]. A set of 16 microsatellites (ASB17 VHL20 HTG10 HTG4 AHT5 AHT4 HMS3 HMS6 HMS7 ASB23 LEX3 LEX33 ASB2 HTG6 HTG7 HMS2) were analysed in all the sampled individuals. The GENETIX program was used to carry out factorial correspondence analyses and associated calculations on 15 of these markers [28]. Although microsatellite LEX3 appears in the panel of markers recommended for equine parentage verification by the International Society for Animal Genetics it was excluded from the analysis in this study because it is located on the X chromosome and as such is not appropriate for this type of analysis.
The Average Number of Alleles per Locus (A), corrected in order to account for sample size using Hurlbert's rarefaction method (1971) can be shown as: where g is the specified sampled size for a collection containing N individuals, numbering N i in the i th species. Nei's minimum distance (D m ) and Nei's standard distance (D s [29]) are computed according to Eqs 10 and 11, respectively.
where f kk and f mm are the average coancestry between individuals belonging to population k or m, and f km is the average coancestry between individuals belonging to populations k and m.
Population structure F (fixation) statistics extend the study of inbreeding coefficients in the case of sub-divided populations [30]. The F IT refers to the inbreeding of individuals in the total population. Conversely, F IS describes the inbreeding of individuals within sub-populations. F ST is not strictly a fixation index as it represents the correlation between two gametes taken at random in two sub-populations from the total population. It measures the degree of genetic differentiation of the sub-populations. The three indices are computed as in Eqs 12, 13 and 14, respectively and where f and F are, respectively, the mean coancestry and the inbreeding coefficient for the entire metapopulation, and, the average coancestry for the subpopulation, so that (1 - [31]. ENDOG [19] was used to calculate F statistics and Nei's minimum distance [29]), D, the genetic distance between subpopulations i and j which is given by Eq 15 The programme TREX [32,33] was used to construct phylogenetic trees to illustrate the structure from the distance matrix data.
Bayesian model-based clustering was conducted using the programme STRUCTURE v2.1 [34], to assign individuals to homogeneous clusters or populations K, from a user defined range. An admixture model was adopted, with a burn in of 104 and 104 iterations of each value of K from 2 to 25.

Pedigree completeness
The pedigree file included a total of 5422 animals, of which 2661 were male and 2761 were female. The reference population of 402 individual animals consisted of 193 male and 209 females for which microsatellite data as well as pedigree data was available.
The pedigree file was analysed to assess the number of fully traced generations for each individual, the maximum number of generations traced and the equivalent complete generations for each animal. The maximum number of traced generations was 36. Percentage average population completeness for each year of birth considering 1 through 6 generations are shown in Fig 1 with percentage population completeness for the reference population up to 6 generations being high (Table 1).

Average generation interval
Generation intervals for each of the four pathways (Table 2) ranged from 9.2 years to 10.0 years (sire-son and sire-daughter, respectively). The average generation interval for each breeding year (Fig 2) was found to range between 5.5 and 13 years, being at a minimum in the immediate post WW2 period 1946 to 1950, which coincides with the genetic bottleneck previously identified by Walling (1994).

Founder and ancestor representation
A total of 11 stallion lines were identified in the pedigree. A single paternal ancestry line is present in the reference (living) population.
Analysis of the female members of the studbook identified a total of 17 dam lines. Nine of these maternal ancestry lines are present in direct descent in the living population. Three of these lines (2,4 & 9) are only represented, in direct female descent, by either a single individual or two individual animals ( Table 3). The three most common maternal lines constitute 70% of the present female population. However, analysis of the relative contributions of the most influential maternal ancestry lines to the genome of the reference population reveals that some of the lines least well represented in direct descent in fact continue to make a substantial genetic contribution as shown in Table 3.  Analysis identified 194 founders in total of which 28 were represented in the reference population. The mean retention was 0.035. The number of founder genomes surviving was 6.285. Calculations on the same population show the founder genome equivalent to be 2.366 with the effective number of non-founders only 2.379. The proportion of ancestry known was 0.330 reflecting the fact that in early volumes of the studbook only a record of the sire of an individual animal was made. The Number of Ancestors contributing to the population was 424 and the number of ancestors describing 50% of the genome was 7 animals.
The number of Ancestors contributing to the Reference Population was calculated as 31 animals. The Effective Number of Founders/Ancestors [20] for the Reference Population were 40 and 9, respectively. The number of ancestors describing 50% of the genome of the living population was 3. Ancestors were selected following Boichard et al. (1997), while founders were selected by their individual Average Relatedness coefficient (AR).

Inbreeding analysis and effective population size
Across the whole analysed dataset, F = 7.8% with an associated mean average relatedness of 8.3%.  (Table 4). The rate of change of the average inbreeding coefficients based on slope regression between 1901 and 2009 was 0.00214, which represents a ΔF per generation of 0.02709. The effective population sizes for the Cleveland Bay Horse breed, based on Δf and ΔF were 19 and 18, respectively (Fig 4). The pattern of inbreeding during which the reference population was foaled and Effective population size, calculated based on both the rate of inbreeding and the number of parents are tabulated in Table 5 for the period 1997 to 2006 with data calculated using POPREP [16].

Microsatellite variation
The total number of alleles found for 15 microsatellite loci within the reference population was 93. The mean number of alleles per locus was 2 ranging from 4 to 10. The mean Observed Heterozygosity (H o ) ranged between 0.052(HTG7) and 0.716 (VHL20) the mean being 0.4486 whilst the mean Expected Heterozygosity (H e ) was 0.5341. The highest values for H e were found for microsatellite LEX33 whilst the lowest were found for microsatellite HTG6 (Table 6).
Across the reference population there is complete heterozygosity. However, at subpopulation level 3 (Table 7), groups show homozygosity at multiple loci. Female Line 2 is 62.5% polymorphic with fixation at HMS3 and LEX3. Female Line 4 is 62.5% polymorphic with fixation of alleles at HMS3, ASB23, HTG4, HTG10 and LEX3. Female Line 8 is 93.75% polymorphic with fixation at LEX3. Allele frequencies are more restricted in populations 2, 4 and 9 (Fig 5), as is the expected heterozygosity. This will be influenced by the smaller membership and corresponding sample size for these subpopulations.
The analysis of allele frequencies identifies a significant number of gaps in the distribution of allele length or number of repeats. It has been reported that populations that have experienced genetic bottlenecks tend to exhibit such less cohesive distributions than stable populations [35].

Bottleneck analysis
The microsatellite allele frequency data was tested for departure from mutation-drift equilibrium with the software BOTTLENECK 1.2 [23]. The results of the three tests of heterozygosity excess (Infinite Allele Model, IAM; Stepwise mutation Model, SMM; and Two-Phase Mutation Model, TPM) are shown in Table 8 and the results of the test for null hypothesis under Sign  Test, Standard Difference Test and Wilcoxon Test in Table 9.
Under the Sign Test, the expected number of loci with heterozygosity excess were 8.93 (p = 0.00120) under IAM, 9.40 (p = .0.29262) under TPM, and 9.43 (p = 0.06923) under SMM. This suggests that the null hypothesis is rejected under IAM, but with p> 0.05 would appear to

Mode shift indicator
The Bottleneck software [23] provides an alternative method for detecting potential genetic bottleneck events in the Mode Shift Indicator. Populations that have not experienced a bottleneck will be at or near mutation drift equilibrium and will be expected to have a large proportion of alleles with low frequency [36]. This pattern will show as a normal, L shaped  distribution when displayed graphically. Fig 6 shows that the Cleveland Bay data displays a normal L-shaped distribution at low allele size class, but deviates from it in the latter quartiles. This would suggest a population not completely at mutation drift equilibrium, and showing evidence of having experienced a genetic bottleneck in the recent past. As both the data plot and the trend show that at the higher size classes there is some departure from the normal L-shaped distribution; the absolute assumption of accepting the null hypothesis should be treated with caution. Indeed, on initial examination, the results of the analysis with Bottleneck [23] appear far from conclusive. Initial assessment suggests that under the IAM all of the tests provide evidence of a recent bottleneck event. However, under TPM and SMM, the evidence is somewhat contradictory indicates some reservation to assessment of the suggested recent bottleneck. The mutation drift model deviation from normal Lshaped distribution supports the above assumption, however, this conflicting evidence suggests the reduction in population size in the 1950s was perhaps not as significant a bottleneck event as previously reported [9]. When the theory behind the various models is re-examined [36] it becomes evident that gene diversity excess has only been demonstrated for loci evolving under the Infinite Allele Model. Given that there is very strong evidence to support a recent bottleneck event under this model, which is supported by testing of microsatellite allele frequency data herein, it is likely that the Cleveland Bay horse has indeed experienced a recent genetic bottleneck.

Population structure
Wright F Parameters [37] reflecting departure from Hardy-Weinberg equilibrium were calculated from the pedigree analysis for the reference population in terms of F IS ( -0.006677), F ST (0.040230) and F IT (0.033821). Multilocus estimations of Wright's F statistics [38] from the microsatellite data showed an across population distribution of the following: F IS (0.011362), F IT (0.029308), and F ST (0.018153). Distance matrices [39] were constructed from both pedigree and molecular analysis, and phlogenetic trees were constructed using TRex [33] showing the relative positions of each female ancestry line (Figs 7 and 8).
Both the pedigree distance analysis (Fig 7) and the molecular analysis (Fig 8) are suggestive of a population structure rooted on three sub-divisions, or clades. However, neither analysis provides conclusive evidence of the causes or nature of this division. In addition to the pairwise distance matrices constructed assuming 9 subgroups within the population, GENALEX 6.4 [40] was also used to construct the much larger matrix of Nei distance between individuals  (2N / (2N-1)).
https://doi.org/10.1371/journal.pone.0240410.g005 PLOS ONE [39]. This matrix in Phylip format was imported into the cluster drawing programme Split-sTree4 [41] to construct a Neighbour-Net diagram. The Neighbour Net Diagram indicating a 2 clade subdivision of the population is shown at Fig 9 whilst a 3 clade subdivision is shown at Fig 10. Examination of this net immediately suggests that the structure of the reference population could be explained by two broad groups or clades as shown in Fig 9. However, an alternative model with three clades, shown in Fig 10, is also possible.
Principal co-ordinate analysis via covariance matrix was conducted using Genalex 6.5 [40], with sub-populations assigned by both modern female and modern male ancestry lines, in order to examine alternative possible structuring of the reference population. Fig 11 presents the PCoA with subpopulations assigned by female ancestry and Fig 12 by male ancestry.
The PCoA analysis shows both male and female sub-populations distributed widely across principal axes, with little suggestion of structuring by sex group being the driving process of population sub-division in the microsatellite data. Variational Bayesian analysis of the microsatellite dataset, using the programme STRUCTURE [34] was carried out, in order to further investigate breed structure. 104 runs of the analysis were carried out for potential populations, K, numbering 2 to 25. The best fit of K appears at K = 3. Fig 13 provides a  The IAM, SMM and TPM mutation models simulate the coalescent processes of n genes. H exp is the average heterozygosity and used to compare with the observed value in determining a heterozygosity excess or deficit at each locus. The standardised difference for each locus is estimated based on the inverse product of the Nei gene diversity and standard deviation (SD) of the mutation-drift equilibrium.

visual representation
https://doi.org/10.1371/journal.pone.0240410.t008    of this analysis for K = 2 to K = 4. There is a substantial increase in background noise in the display at K = 4, indicative that the number of clusters or sub-populations is below this level.
Further analysis of the population structure was conducted using the programme BAPS [42]. 17 clusters within the microsatellite dataset were identified, with a highly significant probability of 0.99998.

Discussion
The results presented herein highlight the significant losses of founder representation that have occurred in the Cleveland Bay Horse population across the past century. Approximately 91% of the stallion and 48% of the dam lines are lost in the reference population. The unbalanced representation of the founders is illustrated by the effective number of founder animals (f e ) and the effective number of ancestors (f a ). The parameter f e constitutes over a third of the equivalent number of founder animals for the reference population, whilst the ratio f a /f e is 22.5%. This ratio is substantially lower than that reported in other horse breeds such as 41.7% in the Andalusian [43] or 54.4% in the Lipizzan [44]. Additionally, this is lower than the figure of 38.2% reported for the endangered Catalonian donkey [26]. The values of the generation interval presented herein (Table 2) are common in equines and identical to those observed in the literature [22,43]. Suggesting some lack of regularised control measures or quantitative breeding strategies on the part of breeders and a decrease in genetic gain, which is directly linked to the generation interval. Breeders should start programmes at a younger age and decrease breeding extent over time.
The average inbreeding computed for the Cleveland Bay Horse at 20.64% in the reference population is substantially higher than most of the values reported in the literature [43], with typical values ranging from 6.5% to 12.5%. Although most of these inbreeding values have been computed in breeds with deep pedigrees such as Andalusian, Lipizzan or Thoroughbred there are significant differences in population sizes, and the accumulation of inbreeding in populations of restricted size will occur at a greater rate.
The smaller the number of individuals in a randomly mating breed the greater will be the accumulation of inbreeding due to the restricted choice of mates. Furthermore, we see a smaller N e with increasing ΔF. The Cleveland Bay horse is therefore predisposed to inbreeding and associated loss of genetic variation. In the reference population of 402 individuals the Effective Population Size (N e ) computed via individual increase in inbreeding was 27.84. N e computed via regression on equivalent generations was 26.29. Inbreeding and genetic loss under random mating will occur at ½ N e per generation. In the reference population, where Mean N e is 32.32 under random mating, inbreeding can be expected to accumulate at 1.5% per generation. This is reflected by the genealogical F IS values. This parameter characterises the mating policy derived from the departure from random mating as a deviation from Hardy-Weinberg equilibrium. Positive F IS values indicate that the average F value within a population exceeds the between-individuals coancestry, thus suggesting that matings between relatives have taken place [26]. Moreover, the average AR values computed for nine complete generations, (Table 3) are roughly equivalent the value of F. In an ideal scenario with random matings and no population subdivision, AR would be approximately twice the F value of the next generation [26].
Molecular information obtained in this study using microsatellite analysis suggests that genetic diversity within the breed is more restricted than has been reported in many other horse breeds and is based on an assessment of the tendency of genetic characteristics to vary accordingly (Table 10).
Populations that have experienced a recent reduction in their N e exhibit a correlative reduction of the allele numbers (k) and gene diversity (H e ) at polymorphic loci. However, the allele numbers reduce faster than the genetic diversity. Thus, in a recently bottlenecked population, the observed gene diversity is higher than the expected equilibrium gene diversity (H e ) which is computed from the observed number of alleles, k, under the assumption of a constant-size or equilibrium population [36]. The existence of a population bottleneck in the mid twentieth century, when the number of breeding age Cleveland Bay stallions was reduced to four, has previously been reported [12]. There is clear genetic evidence of this event shown in the excess of observed heterozygosity across subpopulations, with the exception of ancestry line nine. The latter is of more recent origin having evolved from a grading up scheme in the latter half of the twentieth century. In all other subgroups, the excess is positive ranging from 2.12% in Line 5 to 19.6% in Line 4. However, this investigation has revealed that lines two, four, and eight are in fact not polymorphic. The observed heterozygosity excess amongst the five polymorphic lines peaks in line one at 6.1%.
Microsatellite multilocus estimations of Wright's F statistics [38] showed an across population F IS ; F IT and F ST of 0.01758, 0.02490, and 0.00745, respectively. This departure from random mating will have been influenced by a number of factors common to restricted populations of domesticated equines. These include: selection by breeders for particular lines of descent; natural differences in fertility between individuals; a restricted number of male animals leaving significantly more offspring than females (disproportionate male founding) and geographic distribution of animals and breeders leading to logistical difficulties in some matings. The reduced number of alleles and fixation at certain loci in female ancestry lines is evidence of loss of founder representation from these lines. This lower heterozygosity is also indicative of the typical practice of the larger studs, where breeding tends to be carried out in pasture by free live cover, with the use of only one stallion per year, per herd and where the same stallion may be retained for several breeding years. This strategy is compounded by breeders with only a small number of breeding females sending their animals to these groups or to be covered in hand by the same stallion.
This strategy has different implications for the genetic diversity of the Cleveland Bay Horse compared that of mares travelling to stud to be covered in hand by a greater range of stallions that do not have their own herds of mares [55]. as well as through trade or exchange, which will change geographic location albeit on an irregular basis. Although this latter practice has clear benefits in conservation programmes, there is the danger of inappropriate matings supplanting the more common and less frequent alleles. Whilst such matings increase the frequency of the rarer alleles, they simultaneously increase the frequency of those more common [56], highlighting the need for in-depth understanding of the genetic diversity of any rare breed, and for an effective management plan for conservation maintenance.
There has been considerable debate about the most effective methods of conserving and managing endangered populations [55]. Before the advent of mitochondrial and microsatellite DNA analysis, the accepted strategy involved minimizing inbreeding, whilst managing mean Kinship/average relatedness [57]. Moreover, the use of molecular methods has been proposed [58,59]. Where pedigree data is robust and complete over a significant number of generations, it appears that genealogical data remains the preferred method by which to manage founder contributions, inbreeding and kinship/relatedness. Indeed Lacy has highlighted the problems caused in conservation programmes based on private or rare alleles [56]. Variational Bayesian analysis of within-population structure using microsatellite data shows significant evidence for three main clades. Although this study has been based on the use of pedigree and microsatellite marker data for the Cleveland Bay horse there is now firm evidence of the value of mitochondrial DNA for such investigations and an increasing number of investigations consider the origins and relatedness of modern equines (Table 10).
The Cleveland Bay horse has been reported to belong to haplotype C [48] which is common amongst older northern European breeds such as the Exmoor, Icelandic, Fjord, Connemara and Scottish Highland. This correlates with the assertion that in the matriline the Cleveland Bay has evolved from the Chapman; an ancient Northern European breed. The comparative studies have been based on five Cleveland Bay mtDNA sequences deposited in GeneBank by Cothran and Frankham within which there are three haplotypes. There is scope for further sampling of all of the existing matrilines to determine the number of haplotypes present in the reference population the level of correlation with the three Clades identified herein.

Conclusion
We have reported an in-depth genetic analysis of the Cleveland Bay Horse, using both pedigree and microsatellite data. It reveals substantial loss of genetic diversity and high levels or relatedness and inbreeding. The results of this study highlight the importance of the Cleveland Bay Horse community implementing an effective and sustainable breed management plan, such as management of Mean Kinship and Inbreeding Coefficients.