The distribution of runs of homozygosity and selection signatures in six commercial meat sheep breeds

Domestication and the subsequent selection of animals for either economic or morphological features can leave a variety of imprints on the genome of a population. Genomic regions subjected to high selective pressures often show reduced genetic diversity and frequent runs of homozygosity (ROH). Therefore, the objective of the present study was to use 42,182 autosomal SNPs to identify genomic regions in 3,191 sheep from six commercial breeds subjected to selection pressure and to quantify the genetic diversity within each breed using ROH. In addition, the historical effective population size of each breed was also estimated and, in conjunction with ROH, was used to elucidate the demographic history of the six breeds. ROH were common in the autosomes of animals in the present study, but the observed breed differences in patterns of ROH length and burden suggested differences in breed effective population size and recent management. ROH provided a sufficient predictor of the pedigree inbreeding coefficient, with an estimated correlation between both measures of 0.62. Genomic regions under putative selection were identified using two complementary algorithms; the fixation index and hapFLK. The identified regions under putative selection included candidate genes associated with skin pigmentation, body size and muscle formation; such characteristics are often sought after in modern-day breeding programs. These regions of selection frequently overlapped with high ROH regions both within and across breeds. Multiple yet uncharacterised genes also resided within putative regions of selection. This further substantiates the need for a more comprehensive annotation of the sheep genome as these uncharacterised genes may contribute to traits of interest in the animal sciences. Despite this, the regions identified as under putative selection in the current study provide an insight into the mechanisms leading to breed differentiation and genetic variation in meat production.


Introduction
Domestication and the subsequent selection of animals for either economic or morphological features can leave a variety of imprints on the genome of a population. This selection, a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 combined with the natural adaptation to local environments, has resulted in over one thousand different sheep breeds that vary phenotypically [1]. Understanding the genetic diversity among these sheep breeds can contribute to the success of many genomic analyses including genomic selection and QTL detection through genome-wide association studies [2,3].
Genomic regions subjected to selection frequently show signatures such as reduced nucleotide diversity, stretches of homozygous loci (i.e. runs of homozygosity; ROH), shifted site frequency spectrum and reduced recombination rate. The presence of continuous lengths of homozygous genotypes in an animal can be attributed to the inheritance of identical haplotypes from both parents [4]. The extent and frequency of these ROH can inform on both the ancestry of an animal itself, as well as of the population as a whole. Particularly, consanguinity may be indicated from the presence of long ROH; the longer the ROH the more likely that recent inbreeding occurred within a pedigree, as limited opportunity existed for recombination to break up these haplotype segments [4]. As a result, ROH are widely used as a predictor of whole genome inbreeding levels [5][6][7]. Moreover, as selection is often characterised by local reductions in haplotype diversity, the distribution of ROH patterns across the genome can inform on genomic regions that have potentially been subjected to recent and/or ancient selective pressure [8,9].
The reduction in the genetic variation surrounding a beneficial mutation is known as a "selective sweep" and occurs due the positive selection pressure altering the frequency of the favourable allele(s) over time [10]. If a population undergoes recent intensive selection pressure, extended linkage disequilibrium (LD) patterns between the mutation and neighbouring SNPs are observed [11,12]. This often leads to the emergence of different haplotypes in populations that have been subjected to varying selection pressures [13]. Several methods exist to detect regions of selection, and one such commonly used measure is Wright's fixation index (F ST ) [14]. The fixation index is a single SNP test that is routinely used to identify highly differentiated alleles that have undergone divergent selection among populations [15][16][17]. However, one major concern highlighted by Fariello et al., [18] is that the F ST approach assumes that all populations have the same effective population size and were derived independently from the same ancestral population; if this is not the case, false positive F ST signals could be detected. Therefore, Fariello et al., [18] proposed the hapFLK statistic, which is a haplotype-based extension of the FLK statistic developed by Bonhomme et al., [19], that can account for both population structure and haplotype information. In contrast, the FLK test, which is an extension of the Lewontin and Krakauer (LK) test, is a single SNP test that accounts for population size heterogeneity and structure to compute a global F ST for each SNP [19].
Both hapFLK and F ST have been previously applied to varying sheep populations to identify regions of the genome under selection [1,16,18,[20][21][22][23]. These studies have successfully identified several genomic regions associated with morphological traits, reproductive performance, nematode resistance, body size and skeletal morphology, which have been targeted by both natural and artificial selection during domestication. However, detecting regions of selection associated with quantitative polygenic traits such as growth and muscularity, is hampered by the standing variation existing at many loci in these traits [24]. As a result, selection for quantitative traits is often driven by polygenic adaptation i.e. shifts but not fixation in the allele frequencies of thousands of loci that have small effects on a trait [24,25]. Therefore, hard selective sweeps are rarely detected for such quantitative traits. The breeds used in the current study, with the exception of the Belclare breed where emphasis remains on prolificacy [26], have all been subjected to selection for meat and growth related traits in recent years. Despite the emphasis on terminal related traits within these breeds, substantial differences in phenotype and morphology exist, and they provide a considerable resource for deciphering the genetic variation that exists between terminally selected breeds. Therefore, the objective of the present study was to quantify the genetic diversity in six commercial sheep breeds using both ROH and selection signatures with the aim of identifying genomic regions that have been subjected to selection. In addition, the detected ROH will be assessed for their predictive ability of inbreeding through comparison with the traditional pedigree inbreeding coefficient and alternative genomic inbreeding coefficients. Results from this study will be useful in identifying genomic regions that differentiate among breeds and, through their biological annotation, provide insights into the mechanisms underlying past selection practices.

Methods
Animal Care and Use Committee approval was not obtained for this study because the data were from an existing database.

Population structure analyses
To understand population structure within and between breeds principal component analysis (PCA) using EigenStrat [27] and ancestry models implemented in ADMIXTURE 1.2.3 [28] were performed. To ensure uncorrected LD did not distort the results, pairwise SNP pruning was completed using PLINK [29] prior to analyses. This involved removing one locus from each SNP pair where LD (r 2 ) exceeded 0.1 within 50-SNP blocks. The cross validation procedure in ADMIXTURE was used to estimate the most likely number of genetic populations (clusters of K) between the breeds, considering values of K from 2 to 8. PCA plots were constructed using the first four components from the analysis.

Effective population size
The historical and current effective population size of each of the six breeds was estimated using the SNeP tool as described by Barbato et al. [30]. This approach is based on the relationship between the variance in LD between adjacent SNPs and the effective population size in the presence of a mutation to infer ancestral and recent effective population sizes [31]; where N T(t) is the effective population size t generations ago calculated as t = (2f(c t )) −1 [20], c t is the recombination rate for a specific physical distance between SNPs estimated using Sved & Feldman [32], r 2 adj is the LD value adjusted for sample size and α is a correction for the occurrence of mutations. Only SNPs with a MAF >0.05 were used to estimate the effective population size.

Runs of homozygosity
Runs of homozygosity were defined in each of the six populations of sheep using a sliding window approach of 50 SNPs in PLINK v1.09 [29], as previously described for cattle by Purfield et al. [5]. A maximum of two SNPs with a missing genotype, and up to one possible heterozygous genotype was permitted per ROH window. To minimize the detection of ROH that could occur by chance, the minimum number of SNPs needed to constitute a ROH (l) was estimated using the method proposed by Lencz et al., [33]; l ¼ log e a n s :n i log e ð1 À hetÞ where n s is the number of SNPs per individual, n i is the number of individuals, α is the percentage of false positive ROH (set to 0.05 in the present study), het is the mean SNP heterozygosity across all SNPs. Finally, to ensure low SNP density did not falsify ROH length, the minimum SNP density per ROH was set to 1 SNP every 100 kb and the maximum gap permitted between consecutive homozygous SNPs was set to 250 kb. A minimum ROH length of 1 Mb was set.
Runs of homozygosity were estimated for each individual separately. Each ROH was categorised based on their physical length into 1 to <5 Mb, 5 to <10 Mb, 10 to <15 Mb, 15 to < 20 Mb and !20 Mb. For each of the aforementioned ROH length categories, the mean sum of ROH per breed was calculated by summing all ROH per animal in that category and averaging this per breed population. The percentage of SNP residing within an ROH for a given breed, or in the population as a whole, was also calculated by counting the amount of times a SNP appeared in a ROH within the given breed or population whole. To identify the genomic regions most commonly associated with ROH, the top 1% of SNPs observed in an ROH in each breed and across all breeds were selected and adjacent SNPs over this threshold were merged into genomic regions corresponding to ROH hotspots. In addition, the fraction of chromosome residing in ROH was estimated as the mean ROH sum per individual for each chromosome divided by the chromosomal length, as estimated from the SNP coverage.
To determine if the variation in recombination rate across the genome impacted ROH length, ROH were also mapped using the genetic SNP coordinates (i.e. position in the linkage map) available from Johnson et al., [34]. The average recombination rate (cM/Mb) was estimated in 500kb intervals across the genome and also within each ROH hotspot. The percentage of occurrences of a SNP in a ROH was plotted against recombination rate for each chromosome identified as containing a ROH hotspot. In addition, the genetic mapping of ROH length was also used to infer demography using the method proposed by Thompson et al., [35] whereby the map length of a ROH (l) = 100/2g cM, were g is the number of generations of interest. Four ROH length categories were determined so that the analysis would provide information on the effective population size during four different time spans; up to 5 generations ago, 5 to 10 generations ago, 10 to 20 generations ago, and >20 generations ago. The mean sum of ROH per breed was calculated as above and breeds with a larger average abundance of ROH in a particular length class were inferred to have a smaller effective population size during that time span.

Inbreeding coefficient vs. Runs of homozygosity
The inbreeding coefficient based on ROH (F ROH ; previously described by McQuillan et al. [4] for cattle), was calculated as the sum of the length of all ROH per animal as a proportion of the total autosomal SNP coverage (2.44 Gb). F ROH was calculated separately as the sum of the lengths of all ROH !1 Mb (F ROH1Mb ), the sum of the lengths of all ROH !5 Mb (F ROH5MB ) and finally as the sum of all ROH !10 Mb (F ROH10Mb ). Pedigree-based inbreeding coefficients (F PED ) for all animals were calculated using the Meuwissen and Luo [36] algorithm. Depth of pedigree known was measured in complete generation equivalents (CGE) for all animals as described in McParland et al. [37] and Pearson's correlations between all measures of inbreeding were calculated only for 843 animals with a CGE value !6. Each ROH measure of inbreeding was also separately regressed on the pedigree-based inbreeding coefficient for all 843 animals. In addition, two other estimates of inbreeding were calculated (F GRM and F HOM ) using GCTA [38]. The F GRM was estimated using the VanRaden method [39] based on the variance of the additive genotypes, whereas F HOM was estimated based on the excess of homozygosity following Wright [40].

Signatures of selection
Fst. Global F ST was calculated per SNP using the HierFstat R package [41] with the unbiased estimator proposed by Weir and Cockerham [42] across all breeds. In addition, pairwise F ST was calculated for each pair-wise breed combination (i.e. 6 breeds = 15 comparisons). To reduce noise and identify regions of strong signatures of selection, a sliding window of five SNPs was used to compute an average F ST value of the middle SNP; only the average F ST value is discussed hereon in. Over 90% of all SNPs in a window were within 300 kb of each other and the average length of a window was 231 kb. The empirical P-value for each SNP was then estimated and only the top 0.1% of F ST values (n = 42) were considered to represent a selection signature. To define the boundaries of the identified genomic regions under selection, neighbouring SNPs of the top 0.1% F ST SNPs were included in the selection signature until two consecutive SNPs ranked outside of the top 5% of F ST values [1]. The second SNP that ranked outside of the top 5% of F ST values was not considered in the reported selection signature.
HapFLK. To account for the haplotype structure of the populations, as well as varying population effective sizes, the hapFLK statistic [18] was also used to identify possible regions under selection in all breeds. This required the estimation of a neighbour joining tree and a kinship matrix based on a matrix of Reynold's genetic distances between breeds [19]. The kinship matrix captured the population structure, which was used to model the covariance matrix of the population allele frequencies whereas a multi-point linkage disequilibrium model was used to create local haplotype clusters on each chromosome [43]. The number of haplotype clusters per chromosome was set to 50, which was determined using cross-validation based estimation in fastPHASE [43]. The hapFLK statistic was calculated as the average of 20 expected maximisation iterations. Once hapFLK values were generated for each SNP, P-values were computed based on a chi-square distribution with the python script provided in the hapFLK webpage (https://forge-dga.jouy.inra.fr/projects/hapflk). To limit the number of false positives, a q-value threshold of 0.01 was applied. Local population trees were then re-estimated using only SNPs mapping to the putative selective sweep to identify the population under selection.

Population structure
The principal component analysis in the present study was successful in separating out breed clusters based on genotypic data. The first and second principal components (PC) accounted 37.49% and 32.15% of the variation, respectively (S1 Fig). The Belclare, Beltex and Texel breeds had overlapping clusters, whereas the Charollais, Suffolk, and Vendeen formed distinct separate clusters (S1 Fig). The largest PC separated the Suffolk from the remaining European breeds, whereas the second PC separated the French Charollais and Vendeen breeds into distinct but adjacent clusters. The formation of two clear, non-overlapping clusters for the Suffolk breed is an artefact of the importation of New Zealand Suffolk into the Irish population in recent years. The smaller of the Suffolk clusters represents Suffolk of New Zealand ancestry. The cross-validation error estimates from ADMIXTURE plateaued at K = 5 (S2 Fig), therefore K = 5 was taken as the most probable number of inferred populations. The Belclare was the most admixed of all populations whilst the Suffolk was the least (Fig 1). Evidence of Texel admixture was found in both the Belclare and Beltex populations, with greater evidence detected in the latter.

Effective population size
The effective population size of all six breeds declined over time (Fig 2). Based on the sample population used in the present study, the Charollais breed had the largest effective population size across all generations, whereas the Beltex had the smallest. Assuming a generation interval of four years, the estimated effective population size in the last 50 years ranged from 115 in the Beltex breed to as high as 357 in the Charollais breed.

Runs of homozygosity
ROH were common across all breeds, although the length and frequency of ROH often differed per breed. ROH were identified in all animals with the exception of one Vendeen animal. The Suffolk and Beltex breeds had a greater mean proportion of their autosome, 0.053 (128.31 Mb) and 0.045 (110.47 Mb), respectively, covered in shorter ROH (1 -<5 Mb) in comparison to the other four breeds; mean ROH autosomal coverage per breed ranged from 39.94 to 92.61 Mb in the remaining breeds (Fig 3). For all breeds, the majority of detected ROH were less than 10 Mb in length, with relatively few long ROH !20 Mb detected within each breed (mean ROH coverage per breed for ROH !20 Mb ranged from 0.83 to 3.7 Mb). In fact, only 8.21% of the individuals had at least one ROH !20 Mb in length and these were primarily in the Belclare breed; 13.99% of the Belclare individuals had at least one ROH !20 Mb in length.
ROH were also mapped using their genetic positions and the abundance of ROH in different length classes was used to qualitatively evaluate the historical demography of each of the breeds. The time to the most recent common ancestor (TMRCA) was estimated for four different categories for each breed in S3 Fig. The ability to infer demography for > 20 generations ago was limited by the density of the SNP panel. ROH were more abundant in all TMRCA categories in the Suffolk population. This suggests that the effective population size in the Suffolk was small both in recent and past generations. The substantial increase in the abundance of ROH in the Belclare breed from 10 to 20 generations ago to <5 generations ago, suggests a recent decrease in the effective population size. The lower ROH abundances found in the Charollais population suggests a relatively large effective population size has been maintained across generations.
The proportion of the autosome covered in ROH varied both within and across breeds ( Fig  4). The Charollais breed had a tendency towards fewer ROH, whereas large inter-animal variability existed within the Suffolk breed; individual ROH autosome coverage ranged from 0.025 (50.53 Mb) to 0.319 (778.69 Mb) within the Suffolk population. The three most homozygous animals in the sample population used in the present study had, on average, 0.315 (768.65 Mb) of their autosome covered in ROH, equivalent to almost a third of their genome (Fig 4).
Moderate to weak correlations per breed existed between the pedigree inbreeding coefficient and the varying ROH inbreeding measures ( Table 1). The lowest correlations between F PED and F ROH were found in the Vendeen population whilst the strongest existed in the Belclare population. The Pearson correlations between F PED and F GRM were low in the Vendeen    The correlations between F GRM and F ROH, and F HOM and F ROH were higher than those between F ROH and F PED. The intercept of the regression of all ROH inbreeding measures on F PED was greater than zero, suggesting that the F PED may underestimate genome homozygosity (Fig 5). The smaller intercept of F ROH10Mb is consistent with longer ROH arising from more recent inbreeding that is more likely to be captured by pedigree recording.
The percentage of the autosome residing in a ROH varied by chromosome and by breed, ranging from as low as 1.64% of OAR24 in the Charollais population to as high as 14.21% of OAR15 in the Suffolk population (S4 Fig). Several genomic regions were identified that frequently appeared in a ROH within individual animals (Fig 6A), although the region harbouring ROH often differed per breed (S5 Fig). The top 1% of SNPs with the highest occurrences in a ROH across and within all breeds, were identified as candidate SNPs under directional selection. All adjacent SNPs over this threshold were merged to form ROH islands and in total, 11 genomic regions under putative directional selection across all breeds were identified on OAR 2, 4, 5, 17 and 22 ( Table 2). The ROH hotspot with the highest occurrences was located on OAR2 (115.48-126.34 Mb) and likely candidate genes within this region include MSTN, ITGAV, BIN1 and NUP35, all of which are involved in muscle differentiation. Within breed, this region on OAR2 (115.48-126.34 Mb) was identified as under putative selection in the Belclare, Beltex and Texel populations (S1 Table). In the Charollais population, several regions under putative selection were identified on OAR 2, 4, 9 and 23, and plausible candidate genes within these regions included the fertility related genes NTRK2, HECW2, STK17B and ITGB8. Although more regions were identified as under putative selection in the Vendeen population, the occurrence of a SNP in a ROH was much lower in the Vendeen population in comparison to the other populations (S5 Fig), with the strongest signal on OAR2 only detected in 37.52%  6B). The mean genomic F ST value across all SNPs was 0.127, indicating moderate genetic differentiation according to Wright's classification. In total, 11 different selection signatures were identified on OAR2, OAR13 and OAR17. The highest ranked SNPs were located on OAR2 in two genomic regions between 121,455 and 123,362 kb and between 115,476 and 117,369 kb ( Fig 5B); several genes were identified within each of these regions on OAR2 (Table 1) including ITGAV and BIN1, respectively. Of the 42 significant F ST values, 24 of these were located directly within genes and a total of 144 genes were identified within the selection signatures  (Table 3). Gene ontology (GO) terms associated with the 144 genes were tested for evidence of functional enrichment. This revealed enrichment for GO terms associated with negative regulation of the JNK cascade (GO:0446329; P-value = 7.57x10 -4 ) and the opioid receptor signalling pathway (GO:0038003; P-value = 9.13x10 -4 ). Likely candidate genes identified with the putative selection signatures included those involved in skin pigmentation and coat colour (ASIP, EDN3, HERC2), body size and muscle formation (KSR2, NUP35, BIN1, ITGAV). In addition, non-coding DNA sequences and multiple uncharacterised genes were identified within these regions of selection, including LOC101106402 and LOC101115300 which may be the putative genes under selection within the selective sweeps on OAR2 (113,014 and 114,763 kb) and OAR13 (52,652 and 53, 111 kb). Orthologues of LOC101106402 and LOC101115300 include the human ARPC4 (ENSG00000241553) and bovine SIRPB1 (ENSBTAG00000039520), respectively. Pairwise breed F ST analyses also identified several genomic regions that were highly differentiated between pairs of breeds. The Beltex and Suffolk breeds had the largest number of putative selective sweeps (18) (Fig 6C). Of these five selection signatures, only the region on OAR2 was consistent with the F ST analysis. Multiple genes were identified within each of these five selection signatures (Table 4) including MSTN on OAR2 and MC4R on OAR23, both of which enhance growth performance. The breed(s) under selection for each of the four selection signatures were identified through comparison of the local population trees to those estimated from whole genome data (Fig 7). The Suffolk, Vendeen and Charollais breeds were significantly differentiated from the Belclare, Beltex and Texel breeds for the selection signature identified on OAR2, presumably due to the selection for mutations within the myostatin gene that contribute to muscle hypertrophy within the latter breeds. The SNPs within this region on OAR2 were almost fixed in the Belclare, Beltex and Texel populations in the present study, which could be indicative of a hard sweep signal. In contrast, no evidence for a hard selective sweep was found for the selection signature on OAR23 (54,419-61,155 kb), suggesting this genomic region has been subjected to a recent selection pressure. The reduced, albeit different, haplotype diversity for this selection signature on OAR23 (Fig 7) in both the Beltex and Charollais populations, suggests that selection for this genomic region started on different haplotype backgrounds. Selection for this genomic region on OAR23 may be for variants within the melanocortin-4 receptor (MC4R) which has been associated with body weight in sheep. Runs of homozygosity and selection signatures in sheep

Discussion
The identification of genomic regions that are under the influence of both natural and artificial selection can help determine the genetic basis of economically important traits that are segregating within or between breeds. Several genomic regions under putative selection were identified in the present study across six commercial sheep breeds and these signatures provide an insight into the genes contributing to their diverse phenotypes. However, it is important to acknowledge that regions identified as putative selective sweeps should be interpreted cautiously as differences in demographic history such as genetic drift, effective population size, inbreeding and population bottlenecks can also result in false positive signatures of selection [23].

Demographic history
The observed decline in the effective population size over time in the breeds considered in the present study coincides with human-mediated specialisation for wool and milk traits 4,000-5,000 years ago [58]. The reduction in the effective population size to hundreds in recent years is consistent with population subdivision and selection, but also the relatively limited use of artificial insemination [1]. However, unlike cattle [37], sheep have retained a relatively high level of genetic diversity. In general, our estimates of effective population size in the present study are similar to those previously reported in comparable breeds of sheep [1][2][3]. The smallest effective population size in the Beltex breed may reflect the small founding population. The Beltex breed, which was developed in Belgium by selectively breeding Texel sheep for doublemuscling, was only introduced into the United Kingdom in 1989, where they acquired their name and have since been refined to the modern form. The genetic similarity between the Beltex and Texel populations is evident from the ADMIXTURE and PCA results (Fig 1 and S1  Fig). The high abundance of ROH in the Beltex breed with a TMRCA between 5 to 10 and 10 to 20 generations is consistent with that of a small effective population size during breed formation [59], and suggest the presence of ancient relatedness possibly occurring from a population bottleneck. Similarly, the large abundance of ROH across all TMRCA categories in the Suffolk population is consistent with the low effective population size we detected in this breed. Kijas et al., [1] previously reported a mean genomic inbreeding coefficient of 0.22 in the Irish Suffolk breed suggesting a high level of relatedness among this breed. The consistent overlap between the estimated effective population size and the abundance of ROH within each TMRCA category across all breeds suggests that ROH can be successfully applied to infer demography within sheep populations.

ROH as a predictor of ovine inbreeding
In the absence of pedigree information, previous studies have documented the usefulness of the sum of an individual's ROH coverage to infer the inbreeding level of an individual [5,7,9,59]. The moderate correlation in the present study between F PED and F ROH , with the exception of the Vendeen population , further substantiates the usefulness of ROH as a measure of inbreeding in a population. The moderate correlations in the present study may be partly explained by the relatively shallow depth of the pedigree records for all breeds (Mean CGE = 6.5). Similar correlations between F PED and F ROH were previously reported in various cattle populations [5][6][7]60]. However, it is important to acknowledge that F PED is often an imprecise measure of the proportion of the genome that is identical-by-descent (IBD) as it is limited by pedigree depth, pedigree errors and linkage [61], and fails to account for the variability that can exist in IBD estimates between individuals of the same pedigree [62]. Therefore two further estimates of genomic inbreeding, F GRM and F HOM, were used to evaluate the efficacy of F ROH as a measure of inbreeding in sheep populations. The strong correlation between F ROH1Mb and F HOM in the present study corroborates previous results found in cattle [7,63], although the correlation between F ROH and F GRM was greater than those previously estimated [7,60,63]. Both measures, F GRM and F HOM, have been previously shown to be strongly dependent on allele frequencies, particularly for populations with divergent allele frequencies, which can lead to misleading IBD results [7]. The moderate to strong correlations between F ROH and all inbreeding measures in the present study suggests the extent of a genome under ROH can be used to accurately predict the proportion of the genome that is IBD in sheep populations. Nevertheless, it must be acknowledged that these correlations are likely to underestimate the precision of each of these inbreeding estimators due to their joint correlation with IBD (e.g. r(F ROH , F GRM ) % r(F ROH , F IBD ) Ã r(F GRM ,F IBD )). In addition, it should be underlined that not every ROH is attributable to IBD and instead possibly originated from identity-by-state due to localised low levels of recombination and high levels of linkage disequilibrium in unrelated ancestors [4].

Genomic regions of selection
The geographic adaptation and selection for specialised production traits has resulted in many shared but also breed-specific phenotypes in sheep. Here we used two different, yet complementary, statistical approaches (i.e., F ST and HapFLK) to identify putative selection signatures across six phenotypically different commercial breeds. The hapFLK analysis identified fewer selection signatures in the present study than F ST , but five out of the seven selection signatures identified on OAR2 using F ST , were located within the 18.36 Mb region identified as under selection using hapFLK. In addition, hapFLK is also expected to be more stringent than F ST , which typically suffers from bias and false positives; this is because hapFLK accounts for the haplotype and structure of the population [18]. To limit the number of false positives identified using F ST in the present study, only the top 0.1% of F ST values were considered as representing a signature of selection which is consistent with studies undertaken elsewhere [1,16,17]. Global F ST was used in the current study to identify selection signatures that were differentially fixed across breeds and to determine how selection altered the allele frequency patterns between these breeds. The content of these differentiated regions strongly suggest selection for genes that are associated with skin pigmentation, body size and muscle formation. Skin pigmentation type has been a selection criterion of sheep breeders since ancient times [64]. Candidate genes identified in selection signatures in the present study that are involved in the development and migration of melanocytes in skin pigmentation included EDN, HERC2 and OCA2 [47,52]. In addition, ASIP whose duplication has also been previously shown to control a series of alleles for black and white coat colour in sheep [46] was also identified. Although none of the breeds used in the present study traditionally have the black coat phenotype, black skin colour is an important characteristic of the Suffolk breed. Indeed the allele frequency between the Suffolk and the other breeds differed substantially within these selected regions. Moreover, positive selection for regions harbouring these genes has been previously reported in sheep by Kijas et al., [1] and Fariello et al., [20].
The objective of selective breeding programs to increase the productivity and profitability of the sheep meat industry, has contributed to the differentiation of several genomic regions that enhance muscling and weight-gain across breeds [65]. Several candidate genes associated with such traits were identified within selection signatures in the present study, including NPR2 on OAR2 which is involved in skeletal morphology and body size, and has been previously identified by both Kijas et al., [1] and Moradi et al., [21] to reside in a selection signature in sheep. The integrin subunit, ITGAV, identified within the selection signature on OAR2 from 121,455 to 123,259 kb, has also been reported to play a key role in adipogenic differentiation of human adipose tissue stem cells [44]. The accumulation of adipose tissue is often a characteristic of increased body weight in sheep. Furthermore, KSR2, commonly known as "The fat gene", was also identified within a selection signature on OAR17. Variants within KSR2 have been shown to play an important role in energy homeostasis and obesity in humans [51]. The hapFLK analysis also identified a putative selection signature on OAR17 in close proximity to KSR2 (<800 kb). This region included the SH2B2 gene which has been previously associated with growth performance in cattle [56]. The identification of two selection signatures by two different methods within close proximity, suggests that the 53.94-56.86 Mb region on OAR17 may be under putative selection for growth related genes.
The intense selection for enhanced muscle development within the Beltex and Texel breeds may be why several of the selection signatures included candidate genes that are essential to muscle differentiation. Two such candidate genes involved in muscle differentiation include BIN1 and NUP35. Variants within BIN1 and NUP35 on OAR2 have been previously associated with muscle myopathy in humans and mice, although their role in sheep is unknown [66,67]. However, it most likely that the nucleoporin NUP35 plays an important role in myogenic differentiation through the formation of the nuclear pore complex [49]. Similarly, BIN1 expression, structure and localisation is known to be tightly controlled during muscle differentiation, suggesting BIN1 is a key regulator in the formation of muscular tissue [45]. Selection surrounding myostatin, the causative gene for the characteristic double muscling of the Texel and Beltex breeds [53], was not detected in the present study to be in a selection signature based on the global F ST values but was significantly differentiated in the pairwise F ST comparison between the Texel population and both the Suffolk and Vendeen populations and between the Belclare and Charollais populations. Similarly, the hapFLK analysis identified a hard selective sweep on OAR2 (108,265-126,623 kb) within the Belclare, Beltex and Texel populations, most likely acting on the myostatin gene MSTN. The intense selective breeding for muscle hypertrophy within the Texel breed has resulted in the fixation of multiple alleles within MSTN in that breed [53]. The origin of the Beltex from the Texel breed, and indeed the infusion of Texel blood into the Belclare population in recent years, has most likely resulted in the continued selection of the favourable mutations of MSTN within these breeds. The overlap of the five global F ST selection signatures and the hapFLK selection signature on OAR2, suggests that this 18.36 Mb genomic region has been selectively targetted for the many genes within this region that are impact sheep growth and muscle formation, and has been previously identified as a selection signature by Fariello et al., [18].
Although plausible candidate genes were identified in several selection signatures in the present study, many non-coding regions and uncharacterised genes also resided within these regions that cannot be dismissed. Further annotation and investigation of the functional properties of these uncharacterised genes is necessary, as they may contribute to phenotypic variability in performance traits, or traits associated with disease resistance or environmental adaptations. The identification of two selection signatures on OAR2 and OAR13 in the present study were the most likely candidate gene is uncharacterised, further substantiates the need for more comprehensive annotation of the sheep genome. Animal QTLdb identified several QTL for carcass muscle weight, parasite resistance and meat quality that overlapped these signatures of selection [68][69][70].
Overlapping ROH are often identified across individuals due to the selection of common ancestors that carried superior alleles at specific locations [9]. This is evident in the Belclare, Beltex and Texel populations in the present study, where more than 50% of the individuals within each of these populations contain a ROH overlapping MSTN. The identification of regional selection for adaptive variants using the distribution of ROH has been successfully applied elsewhere [8,9,71]. Indeed, Kim et al., [9] identified that two-thirds of the selection signatures identified in a German Holstein population overlapped with high ROH regions in U.S. Holsteins. In the present study, 9 of the 11 selection signatures identified using global F ST contained SNPs that appeared in a ROH in more than 15% of the animals in the study ( Fig  5A). However, when focusing on the top 1% of SNPs with the highest occurrence in a ROH as ROH hotspots across all breeds, only two of the ROH hotspots identified (both on OAR2) overlapped with those identified using global F ST and one with the hapFLK method. Previous work by Pemberton et al., [72], and Bosse et al., [73] have demonstrated that ROH distributions are not uniform and instead have distinctive continental patterns. The existence of ROH hotspots and coldspots therefore has been partly attributed to the variation in recombination events and GC content across the genome and not solely selection. A similar trend was detected in the present study whereby ROH hotspots frequently coincided with regions of low recombination rate. This high ROH abundance in low recombination regions may have been partially attributed to by selection, however decreased SNP density and increased nucleotide diversity in regions with high recombination may have attributed to this abundance [74]. Despite this, the significant correlation between the occurrence of ROH in a SNP (S7 Fig) and the global F ST per SNP in the present study and elsewhere [75], supports the hypothesis that the observed ROH patterns are not solely the result of demography and instead harbour targets of positive selection. Therefore it may be possible to use the distribution of ROH across the autosome to limit the number of false positives identified using the global F ST method.

Conclusion
In conclusion, changes in autosome autozygosity, allele frequency patterns, and the extent of recombination across the autosome can inform on past selection pressures individuals have been subjected to. Patterns of ROH across the autosome were consistent with estimates of the effective population size; many short ROH were routinely detected in breeds estimated to have a smaller effective population size whereas long ROH, indicative of recent inbreeding were less frequently found across all breeds. Several signatures of selection were also successfully identified, although further annotation is needed to deduce the functions of the uncharacterised genes within these regions. Despite this, the regions identified as under selection in the current study provide an insight into the mechanisms leading to breed differentiation and variation in meat production.