Selection Signatures in Worldwide Sheep Populations

The diversity of populations in domestic species offers great opportunities to study genome response to selection. The recently published Sheep HapMap dataset is a great example of characterization of the world wide genetic diversity in sheep. In this study, we re-analyzed the Sheep HapMap dataset to identify selection signatures in worldwide sheep populations. Compared to previous analyses, we made use of statistical methods that (i) take account of the hierarchical structure of sheep populations, (ii) make use of linkage disequilibrium information and (iii) focus specifically on either recent or older selection signatures. We show that this allows pinpointing several new selection signatures in the sheep genome and distinguishing those related to modern breeding objectives and to earlier post-domestication constraints. The newly identified regions, together with the ones previously identified, reveal the extensive genome response to selection on morphology, color and adaptation to new environments.


Introduction
Domestication of animals and plants has played a major role in human history. With the advance of high-throughput genotyping and sequencing technologies, the analysis of large datasets in domesticated species offers great opportunities to study genome evolution in response to phenotypic selection [1]. The sheep was one of the first grazing animals to be domesticated [2] in part due to its manageable size and an ability to adapt to different climates and diets with poor nutrition. A large variety of breeds with distinct morphology, coat color or specialized production (meat, milk or wool) were subsequently shaped by artificial selection. Since the release of the 50K SNP array [3], it is now possible to scan genetic diversity in sheep in order to detect loci that have been involved in these various adaptive selection events. The Sheep HapMap dataset, which includes 50K genotypes for 3000 animals from 74 breeds with diverse world-wide origins, provides a considerable resource for deciphering the genetic bases of phenotype diversification in sheep. In the first analysis of this dataset [4], the authors looked for selection by computing a global F ST among the 74 breeds at all SNP in the genome. They identified 31 genome regions with extreme differentiation between breeds, which included candidate genes related to coat pigmentation, skeletal morphology, body size, growth, and reproduction. Further studies took advantage of the Sheep HapMap resource to detect genetic variants associated with pigmentation [5], fat deposition [6], or microphtalmia disease [7]. An other study [8] performed a genome scan for selection focused on American synthetic breeds, using an F ST approach similar to that in [4].
The 74 breeds of the Sheep HapMap dataset have a strong hierarchical structure, with at least 3 distinct differentiation levels: an inter-continental level (e.g. European breeds vs Asian breeds), an intra-continental level (e.g. Texel vs Suffolk European breeds), and an intra-breed level (e.g. German Texel vs Scottish Texel flocks). Recent studies [9][10][11][12] showed that, when applied to hierarchically structured data sets, F ST based genome scans for selection may lead to a large proportion of false positives (neutral loci wrongly detected as under selection) and false negatives (undetected loci under selection). Besides, the heterogeneity of effective population size among breeds implies that some breeds are more prone to contribute large locus-specific F ST values than others [10]. Apart from these statistical considerations, merging populations with various degrees of shared ancestry can limit our understanding of the selective process at detected loci. Indeed, the regions pointed out in [4] can be related to either ancient selection, as the poll locus which has likely been under selection for thousands of years, or fairly recent selection, as the myostatin locus which has been specifically selected in the Texel breed. But in most situations the time scale of adaptation cannot be easily determined.
Another limit of genome scans for selection based on single SNP F ST computations is that they do not sufficiently account for the very rich linkage disequilibrium information, even when the single SNP statistics are combined into windowed statistics. Recently, we proposed a new strategy to evaluate the haplotype differentiation between populations [13]. We showed that using this approach greatly increases the detection power of selective sweeps from SNP chip data, and also enables to detect soft or incomplete sweeps. These latter selection scenarios are particularly relevant in breeding populations, where selection objectives have likely varied along time and where the traits under selection are often polygenic.
In this study we provide a new genome scan for selection based on the Sheep HapMap dataset, where we distinguish selective sweeps within and between 7 broad geographical groups. The within group analysis aims at detecting recent selection events related to the diversification of modern breeds. It is based on the single marker FLK test [10] and on its haplotypic extension hapFLK [13]. The FLK test is an extension of the Lewontin and Krakauer (LK) test [14] that accounts for population size heterogeneity and for the hierarchical structure between populations. As the LK test, the FLK test computes a global F ST for each SNP, but allele frequencies are first rescaled using a population kinship matrix F . This matrix, which is estimated from the observed genome wide data, measures the amount of genetic drift that can be expected, under neutral evolution, along all branches of the population tree. With this rescaling, allele frequency differences are typically down-weighted if they are obtained with small populations, or populations that diverged a long time ago. The between group analysis focuses on older selection events and is only based on FLK. Overall, we confirmed 19 of the 31 sweeps discovered in [4], while providing more details about the past selection process at these loci. We also identified 71 new selection signatures, with candidate genes related to coloration, morphology or production traits.

Results and Discussion
We detected selection signatures using methods that aim at identifying regions of outstanding genetic differentiation between populations, based either on single SNP, FLK [10], or haplotype, hapFLK [13], information. These methods have optimal power when working on closely related populations so we separately analyzed seven groups of breeds, previously identified as sharing recent common ancestry [4] and corresponding to geographical origins of breeds. Before performing genome scans for selection signatures, we studied the population structure of each group to identify outlier animals as well as admixed and strongly bottlenecked populations, using both PCA and model-based approaches [15,16]. hapFLK was found to be robust to bottlenecks or moderate levels of admixture, but these phenomena may affect the detection power so we preferred to minimize their influence by removing suspect animals or populations. Details of these corrections are provided in the methods section. The final composition of population groups are given in Table 1.

Overview of selected regions
An overview of selection signatures on the genome across the different groups is plotted in Figure 1 and a detailed description is provided in Table 2. Detected regions were typically a few megabases long and included from 1 to 196 genes, with a median of 15 genes. However, in many regions strong functional candidate genes were found very close to the position with lowest p-value, typically among the two closest genes from this position. These genes are reported in Table 2, as well as a few other functional candidates with less statistical evidence but strong prior knowledge from the literature. We found 41 selection signatures with hapFLK and 26 with FLK, although we allowed a slightly higher false discovery rate for FLK than hapFLK (10% vs 5%). This result was consistent with a higher power for hapFLK than FLK, as already shown in [13].
Four regions were found with both the single SNP and the haplotype test and harbor strong candidate genes: NPR2, KIT, RXFP2 and EDN3 ( Table 2). The overlap was thus small, illustrating that the two tests tend to capture different signals. In particular, hapFLK will fail to detect ancient selective sweeps, for which the mutation-carrying haplotype is small and not associated with many SNP on the chip. On the contrary, single SNP tests will fail to capture selective sweeps when a single SNP is not in high LD with the causal mutation. They will also fail if the selected mutation is only at intermediate frequency but is associated to a long haplotype, in contrast with hapFLK.
Six regions were detected in more than one group of breeds. They all contained strong candidate genes ( Table 2). Three of these genes are related to coat color (KIT, KITLG and MC1R), and could correspond to independent selection events (see discussion below). One region harbors a gene (RXFP2) for which polymorphisms have been shown to affect horn size and polledness in the Soay [17] and Australian Merino [18]. We detected this region in 4 different groups and in all of them the highest FLK value was found to be very close to RXFP2 ( Figure S8 in File S1). This provides clear indication that selection in this region is related to RXFP2, consistent with previous selection signatures detected by comparing specifically horned and polled breeds ( Figure 6 in [4]). However, we note that the signatures of selection in this region exhibit different patterns among groups. The signal is very narrow in the SWE and SWA groups, and is in fact not detected by the hapFLK test, whereas it affects a large genome region in the CEU group where it is detected by hapFLK. In the ITA group, the FLK statistics do not reach significance, and the hapFLK signal is not high (minimum q-value of 0.04). Overall, the selection signatures suggest that selection on RXFP2, most likely due to selection on horn phenotypes, was carried out worldwide at different times and intensities. Another region harbors the HMGA2 gene, involved in selection for stature in dogs [19] and associated to body size in horses [20] and height in humans [21]. The last region includes two interesting candidate genes: ABCG2, which has been associated to a strong QTL for milk production in cattle [22], and NCAPG, which has been associated to fetal growth [23] and calving ease [24] in cattle and which is located in several selection signatures in this species [25][26][27][28]. In our analysis, populations with a selection signature in this region belong to three European groups (SWE, ITA and CEU) and our results suggest that selection in these different groups might imply distinct genes ( Table 2).
In the paper presenting the Sheep HapMap dataset [4], 31 selection signatures were found, corresponding to the 0.1% highest single SNP F ST . Using FLK and hapFLK, we confirmed signatures of selection for 10 of these regions. Considering the two analyses were performed on the same dataset, this overlap can be considered as rather small. Two reasons can explain this.
First, the previous analysis was based on the F ST statistic. Although this statistic is commonly used for selection scans, it is prone to produce false positives when the population tree harbors unequal branch lengths (i.e. unequal effective population sizes) [10]. In particular, strongly bottlenecked breeds will contribute high F ST values preferentially even under neutral evolution, because their smaller effective population size implies a larger variance of allele frequencies. With FLK and hapFLK, F ST values between populations are rescaled using branch lengths, so populations with long branch lengths will not contribute more than others [13]. In fact they will tend to contribute less, as the statistical power to distinguish selective effects from drift effects is naturally lower in populations where drift is larger.
Second, the previous analysis was performed using all breeds at the same time. It is therefore possible that some of these regions correspond to differentiation between groups of breeds rather than within groups. To investigate this question, we performed a genome scan for selection between seven virtual populations corresponding to the ancestors of the seven population groups. Allele frequencies in each of these ancestral populations were estimated from those observed in modern breeds and regions with outlying genetic differentiation between ancestral populations were detected using the FLK statistic [10]. For this analysis, we did not include SNP lying in regions detected within groups since selection biases their estimated ancestral allele frequencies. The ancestral population tree was reconstructed using SNP for which we have unambiguous ancestral allele information ( Figure S9 in File S1). This tree is decomposed into two main lineages, one for European breeds and one for Asian and African breeds. The African group exhibits a slightly higher branch length. We note, however, that this could be due to ascertainment bias of SNP on the SNP array.
This led to the identification of 23 new selection signatures ( Figure 2 and Table 3), 9 of them being common to the analysis of [4]. Overall, combining the scans for recent and ancestral selection, we failed to replicate 12 of the regions in [4].

Selection Signatures within population groups
Coloration. Many selection signatures are located around genes that have been shown to be involved in hair, eye or skin color. In particular, several detected regions include candidate genes that are involved in the development and migration of melanocytes and in pigmentation: EDN3, KIT, KITLG, MC1R and MITF. For all these genes except MITF, we have quite strong evidence that they are the genes targeted by selection in the detected region. In the SWA group, EDN3 was included in the detected region for both FLK and hapFLK, and in both cases it was the closest gene to the highest test value. KIT and KITLG were both included in a detected region (with relatively few genes) for two different geographical groups, and were very close to the position with the smallest p-value in one of those. MC1R was also in a detected region for two different groups, NEU and ITA. In the two cases it was not very close to the maximum of the signal, but we note that the black skin or coat color is an important characteristic of the two populations that have been found under selection in this region, the Irish Suffolk and Sardinian Ancestral Black. This observation, together with the fact that MC1R mutations are responsible for coat color patterns in mammals (e.g in cattle [29]), supports the hypothesis that MC1R is a good candidate for the signatures we observed.
Although not listed in Table 2, SOX10 and ASIP, two other genes implied in pigmentation, also show some evidence of selection. In the ITA group, the q-value of hapFLK near SOX10 is 6.2% and almost reaches the significance threshold of 5%. Similarly, the two closest SNP to ASIP (s66432 and s12884) present suggestive FLK p-values of respectively 7:510 {4 and 6:810 {5 in the ASI group, and one (s12884) is significantly differentiated between the ancestral groups. All these genes have previously been reported as being likely selection targets and/or associated to color patterns in different mammalian species.
Finally, we found a signal for selection centered on the BNC2 gene, that has recently been associated with skin pigmentation in humans [30]. All population groups present at least one selection signature which is very likely related to one of the above genes, reflecting the widespread importance of color patterns to define sheep breeds.
Inferring a precise history of underlying causal mutations for color patterns in this dataset is hard for several reasons: the precise phenotypic characterizations of coat color patterns in the Sheep HapMap breeds are not available; the 50K SNP array used does not offer sufficient density to associate a given selection signature to a specific set of polymorphisms; Finally, from the literature it appears that a large number of genes and mutations can be considered a priori as potentially causal for a given pigmentation pattern. In particular, mutations in different genes can give rise to the same phenotype (e.g. in horses [31]). Also, within a gene different mutations can give rise to different phenotypes, e.g mutations in the MC1R gene (also named the extension locus) have been associated to a large panel of skin or coat colors [29,32,33]. Deciphering selection signatures related to coat color in sheep and in particular identifying the causal variants under selection will require sequencing these genes for individuals from several breeds with diverging color patterns. This in turn will help to understand the evolutionary history of the breeds and the effect of selection [34]. To potentially help in this task, in Table S1 in File S1 we list, for each ''color gene'', the populations that have likely been selected for.
Morphology. Another group of genes that are found within selection signatures have known effects on body morphology and development. NPR2, HMGA2 and BMP2, pointed out previously [4] are confirmed as good positional candidates by our study. We also found strong evidence for selection on WNT5A, ALX4 or EXT2, and two HOX gene clusters (HOXA and HOXC). WNT5A and ALX4 are two genes involved in the development of the limbs and skeleton. Mutations in WNT5A are causing the dominant Human Robinow syndrome, characterized by short stature, limb shortening, genital hypoplasia and craniofacial abnormalities [35]. ALX4 loss of function mutations cause polydactily in the mouse, through disregulation of the sonic hedgehog (SHH) signaling factor [36,37]. Moreover, the ALX4 protein has been shown to bind proteins from the HOXA (HOXA11 and HOXA3) and HOXC (HOXC4 and HOXC5) clusters [38]. Located just besides ALX4 and corresponding to the same selection signature, EXT2 is responsible for the development of exostose in the mouse [39]. HOX genes are responsible for antero-posterior development and skeletal morphology along the anterior-posterior axis in vertebrates. The selection signature around HOXA is a recent selection signature in the SWA group, while that around HOXC is an ancestral signature with a high differentiation of the ASI ancestor compared to AFR and SWA (Table 3).
Finally, we note that an ancestral selection signature is found near the ACAN gene, whose expression was shown to be upregulated by BMP2 [40], another candidate gene for selection. Three genes within the selection signature are found closer to the maximum test value than ACAN, but these are in silico predicted genes, whose protein coding function has not been confirmed, so ACAN seems to be overall a better candidate for explaining selection in the region. Mutations in the ACAN gene have been shown to induce osteochondrosis [41] and skeletal dysplasia [42]. The ACAN region has also been shown to be associated with height in humans [43].
Traits of agronomic importance. Sheeps have been raised for meat, milk and wool production. Under selection signatures, we found several genes associated with these production traits. In addition to the selection signature in Texels on the MSTN gene for increased muscularity [44], discussed in [13], we detected a selection signature centered on HDAC9 and including few other genes, which could also be linked to muscling. HDAC9 is a known transcriptional repressor of myogenesis. Its expression has been shown to be affected by the callypige mutation in the sheep at the DLK1-DIO3 locus [45]. The signature around HDAC9 corresponds to a selection signature in the Garut breed from Indonesia, a breed used in ram fights. As already discussed, one selection signature contains ABCG2, a gene underlying a QTL with large effects on milk production (yield and composition) in cattle [22]. Also, one of the ancestral selection signatures reaches its maximum value close to the INSIG2 gene, recently shown to be associated with milk fatty acid composition in Holstein cattle [46]. Two selection signatures could be related to wool characteristics, one in the CEU group including the FGF5 gene, partly responsible for hair type in the domestic dog [47,48], and an ancestral selection signature on chromosome 25 in a QTL region associated to wool quality traits in the sheep [49,50].
One of the strong outlying regions in the selection scan contains the PITX3 gene. Further analysis revealed that this signature was due to the German Texel population haplotype diversity differing from the other Texel samples (results not shown). It turns out that the German Texel sample consisted of a case/control study for microphtalmia [7], although the case/control status information in this sample is not given in the Sheep HapMap dataset. The consequence of such a recruitment is to bias haplotype frequencies  Table 3. Selection signatures in ancestral populations. in the region associated with the disease, which provokes a very strong differentiation signal between the German Texel and the other Texel populations. Although not related to artificial or natural selection in sheep, this signature illustrates that our method for detecting selection has the potential to identify causal variants in case/control studies, while using haplotype information.

Ancestral signatures of selection
For ancestral selection signatures, i.e. the regions showing outlying genetic differentiation between population groups, it is difficult to estimate how far back in time selection occurred. In particular, it would be interesting to place the divergences shown by the ancestral population tree with respect to sheep domestication. Two interesting candidate genes for ancestral selection signatures might indicate that the selection signatures captured could be rather old. First, we found selection near the TRPM8 gene, which has been shown to be a major determinant of cold perception in the mouse [51]. The pattern of allele frequency at the significant SNP (Table 3) is consistent with the climate in the geographical origins of the population groups. AFR, ASI and ITA, living in warm climates, have low frequency (0.04-0.16) of the A allele, while NEU and CEU, from colder regions, have higher frequencies (0.55-0.7), the SWE group having an intermediate frequency of 0.38. Overall, this selection signature might be due to an adaptation to cold climate through selection on a TRPM8 variant. Another selection signature lies close to a potential chicken domestication gene, TSHR [52], whose signaling regulates photoperiodic control of reproduction [53]. This selection signature was identified before [4] and our analysis indicates that selection happened before the divergence of breeds within geographic groups, consistent with an early selection event. Given its role, we can speculate that selection on the TSHR gene is related to seasonality of reproduction. Under temperate climates, sheep experience a reproductive cycle under photoperiodic control. Furthermore, there is evidence that this control was altered during domestication [54] so our analysis suggests genetic mutations in TSHR may have contributed to this alteration.
As discussed above, some of the genes found underlying ancestral selection signatures can be related to production or morphological traits (e.g. ASIP, INSIG2, ACAN, wool QTL), indicating that these traits have likely been important at the beginning of sheep history. The other genes that we could identify as likely selection targets in the ancestral population tree relate to immune response (GATA3) and in particular to antiviral response (TMEM154 [55], TRAF3 [56]). The most significant ancestral selection signature is centered around the NF1 gene, encoding neurofibromin. This gene is a negative regulator of the ras signal transduction pathway, therefore involved in cell proliferation and cancer, in particular neurofibromatosis. Due to this central role in intra-cellular signaling, mutations affecting this gene can have many phenotypic consequences so that its potential role in the adaptation of sheep breeds remains unclear.

Conclusions
The Sheep HapMap dataset is an exceptional resource for sheep genetics studies. In a population genomics context, our study shows that the rich information contained in these data permits to start unraveling the genetic history of sheep populations worldwide. In order to fully exploit this information, we used recent statistical approaches that account for the relationship between populations and the linkage disequilibrium patterns (haplotype diversity). This allowed detecting with confidence more selection signatures and identifying for most of them the selected populations. Among these new selection signatures detected by our study, several result from recent selection and include good positional candidate genes with functions related to pigmentation (KITLG, EDN3), morphology (WNT5A, ALX4, EXT2, HOXA cluster) or production traits (HDAC9). Two ancestral selection signatures are also of particular interest as they harbor genes (TRPM8 and TSHR) whose functions (cold and photoperiodic perception respectively) seem highly relevant to the selection response during the early history of sheep domestication.
With information on adaptive genome regions and selected populations, we hope that our work will foster new studies to unravel the underlying biological mechanisms involved. To this aim, it is likely that further phenotypic and genetic data are required. On the genetics side, even though the SNP array used in this study was sufficient to localize genome regions harboring adaptive mutations, its density and the SNP ascertainment bias resulting from its design did not allow to tag the causative mutation precisely. Elucidating the causal variation underlying selection signatures will thus most likely require large scale sequencing data.
Genome scans for selection, including this one, are identifying regions that are outliers from a statistical model and do not require to specify an alternative hypothesis based on phenotypic records. While this can be seen as an advantage for the initial localization of genome regions, it is a limitation for the identification of biological processes involved. Gathering phenotypic records in specific populations, in particular for color and morphology traits, will be needed to go further.

Methods
Selecting populations and animals. Seventy-four breeds are represented in the Sheep HapMap data set, but we only used a subset of these breeds in our genome scan. We removed the breeds with small sample size (v 20 animals), for which haplotype diversity cannot be determined with sufficient precision. Based on historical information, we also removed all breeds resulting from a recent admixture or having experienced a severe recent bottleneck. Focusing on the remaining breeds, we then studied the genetic structure within each population group, in order to detect further admixture events. We performed a standardized PCA of individual based genotype data and applied the admixture software [16].
In two population groups (AFR and NEU) the different breeds were clearly separated into distinct clusters of the PCA and showed no evidence of recent admixture (Figures S1 and S2 in File S1). These samples were left unchanged for the genome scan for selection. A similar pattern was observed in three other groups (ITA, SWA, ASI), except for a few outlier animals that had to be re-attributed to a different breed or simply removed (Figures S3, S4 and S5 in File S1). In the two last groups (CEU and SWE), several admixed breeds were found and were consequently removed from the genome scan analysis ( Figures S6 and S7 in File S1).
We performed a genome scan within each group of populations listed in Table 1, with a single SNP statistic FLK [10] and its haplotype version hapFLK [13].
Population trees. Both statistics require estimating the population tree, with a procedure described in details in [10]. Briefly, we built a population tree for each group by first calculating Reynolds' distances between each population pair, and then applying the Neighbor Joining algorithm on the distance matrix. For each group, we rooted the tree using the Soay sheep as an outgroup. This breed has been isolated on an island for many generations and exhibits a very strong differentiation with all the breeds of the Sheep HapMap dataset, making it well suited to be used as an outgroup.
FLK and hapFLK genome scans. The FLK statistic was computed for each SNP within each group. The evolutionary model underlying the FLK statistic assumes that SNP were already polymorphic in the ancestral population. To consider only loci that most likely match this hypothesis, we restricted our analysis within each group to SNP for which estimated ancestral minor allele frequency p 0 was above 5%. Under neutrality, the FLK statistic should follow a x 2 distribution with n{1 degrees of freedom (DF), where n is the number of populations in the group. Overall, the fit of the theoretical distribution to the observed distribution was very good (Text S1 in File S1) with the mean of the observed distribution (FLK) being very close to n{1 (Table S3 in File S1). Using FLK as DF for the x 2 distribution provided a better fit to the observed data than the n{1 theoretical value. We thus computed FLK p-values using the x 2 (FLK) distribution. To compute the hapFLK statistic, we used of the Scheet and Stephens LD model [57], a mixture model for haplotypes which requires specifying a number of haplotype clusters to be used. To choose this number, for each group, we used the fastPHASE crossvalidation based estimation of the optimal number of clusters. The results of this estimation are given in Table S2 in File S1. The LD model was estimated on unphased genotype data. The hapFLK statistic is computed as an average over 20 runs of the EM algorithm to fit the LD model. As in [13], we found that the hapFLK distribution could be modeled relatively well with a normal distribution (corresponding to non outlying regions) and a few outliers; we used robust estimation of the mean and standard deviation of the hapFLK statistic to eliminate the influence of outlying (i.e. potentially selected) regions. This procedure was done within each group, the resulting mean and standard deviation values obtained are given in Table S2 in File S1. Finally, we computed at each SNP a p-value for the null hypothesis from the normal distribution.
Selection in ancestral groups. The within-group FLK analysis provides for each SNP an estimation of the allele frequency p 0 in the population ancestral to all populations of the group. We used this information to test SNP for selection using between group differentiation, with some adjustments. First, the FLK model assumes tested polymorphisms are present in the ancestral population. SNP for which the alternate allele has been seen in only one population group are likely to have appeared after divergence (within the ancestral tree) and were therefore removed from the analysis. Second, regions selected within groups affect allele frequency in some breeds and therefore bias our estimation of the ancestral allele frequency in this group. We therefore removed all SNP that were included in within-group selection signatures. Finally, the FLK test requires a rooted population tree. For the within group analysis, we could use a very distant population to the current breeds (the Soay sheep). For the ancestral tree, we created an outgroup homozygous for ancestral alleles at all SNP.
Identifying selected regions and candidate genes. We defined significant regions for each statistic and within each group of populations. Using the neutral distribution (x 2 for FLK and Normal for hapFLK), we computed the p-value of each statistic at each SNP. To identify selected regions, we estimated their q-value [58] to control the FDR. For FLK, SNP with a q-value below 0.1 were considered significant, which by definition implies that we expect 10% of false positives among our detected SNP. Since the power of hapFLK is greater than that of FLK [13], we used a qvalue threshold of 0.05, therefore controlling FDR at the 5% level.
For the FLK analysis in ancestral populations, we used an FDR threshold of 5%.
We then aimed at identifying genes that seem good candidates for explaining selection signatures. We proceeded differently for the single SNP FLK and hapFLK. For FLK, we considered that significant SNP less than 500Kb apart were capturing the same selection signal. Then, we considered as potential candidate genes any gene that lies less than 1Mb of any significant SNP. For hapFLK, the genome signal is much more continuous than single SNP tests, because the statistic captures multipoint LD with the selected mutations. A consequence is that the significant regions can span large chromosome intervals. To restrict the list of potential candidate genes, and target only the ones closest to the most significant SNP, we restricted our search to the part of the signal where the difference in hapFLK value with the most significant SNP was less than 0.5s. This allowed taking into consideration the profile of the hapFLK signal, i.e. if the profile resembles a plateau, the candidate region will be rather broad while very sharp hapFLK peaks will provide a narrower candidate region. We extracted all protein coding genes present in the significant regions using the Ensembl Biomart tool (http://www. ensembl.org/biomart/) for Ovis Aries 3.1 genome assembly. These full lists are provided as Supporting Information (Dataset S1 and Dataset S2). Within each candidate region, genes were ranked according to their distance from the most significant position of the region (the larger the rank, the larger the distance). The functional candidate genes shown in Table 2 and discussed in the manuscript were chosen based on this rank and/or on their implication in previous association or sweep detection studies.