Genome-Wide Genetic Diversity and Differentially Selected Regions among Suffolk, Rambouillet, Columbia, Polypay, and Targhee Sheep

Sheep are among the major economically important livestock species worldwide because the animals produce milk, wool, skin, and meat. In the present study, the Illumina OvineSNP50 BeadChip was used to investigate genetic diversity and genome selection among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds from the United States. After quality-control filtering of SNPs (single nucleotide polymorphisms), we used 48,026 SNPs, including 46,850 SNPs on autosomes that were in Hardy-Weinberg equilibrium and 1,176 SNPs on chromosome × for analysis. Phylogenetic analysis based on all 46,850 SNPs clearly separated Suffolk from Rambouillet, Columbia, Polypay, and Targhee, which was not surprising as Rambouillet contributed to the synthesis of the later three breeds. Based on pair-wise estimates of F ST, significant genetic differentiation appeared between Suffolk and Rambouillet (F ST = 0.1621), while Rambouillet and Targhee had the closest relationship (F ST = 0.0681). A scan of the genome revealed 45 and 41 differentially selected regions (DSRs) between Suffolk and Rambouillet and among Rambouillet-related breed populations, respectively. Our data indicated that regions 13 and 24 between Suffolk and Rambouillet might be good candidates for evaluating breed differences. Furthermore, ovine genome v3.1 assembly was used as reference to link functionally known homologous genes to economically important traits covered by these differentially selected regions. In brief, our present study provides a comprehensive genome-wide view on within- and between-breed genetic differentiation, biodiversity, and evolution among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds. These results may provide new guidance for the synthesis of new breeds with different breeding objectives.


Introduction
During the last five years, the animal genome community has made significant progress in mapping, sequencing, assembly, and annotation of the ovine genome. Based on BAC (bacterial artificial chromosome) end sequences, Dalrymple and colleagues [1] first reported a virtual sheep genome by painting a total of 84,624 sheep BACs (about 5.4-fold genome coverage) to orthologous regions in the human genome, which were assembled into 1,172 sheep BAC comparative genome contigs that covered 91.2% of the human genome. In 2009, Goldammer and coworkers [2] constructed a cytogenetic map of the sheep genome with 566 loci, which helped link and order genome regions, such as sequence contigs, genes, and polymorphic DNA markers to ovine chromosomes. Approximately two years ago, the International Sheep Genomics Consortium (ISGC) began assembly of a draft reference genome of sheep (Ovis aries) using both Sanger sequencing and the next-generation sequencing platforms [3]. This large scale sequencing of the ovine genome led to discovery of more than 2.8 million ovine single nucleotide polymorphisms (SNPs; http:// www.ncbi.nlm.nih.gov/SNP/). In collaboration with the ISGC, Illumina developed the OvineSNP50 Genotyping BeadChip that contains a total of 54,241 SNPs with a marker placed approximately every 46 Kb along the sheep genome (www.illumina.com).
The Illumina OvineSNP50 Genotyping BeadChip has been successfully used in sheep and goat genome research. For example, BeadChip analysis revealed that the PITX3 gene is responsible for microphthalmia [4]. A similar approach also helped identify the dentin matrix protein 1 gene (DMP1) as responsible for inherited rickets in Corriedale sheep [5] and the solute carrier family 13 (sodium/sulphate symporters), member 1 (SLC13A1) gene for chondrodysplasia in Texel sheep [6]. Both OvineSNP50 Bead-Chips and microsatellite markers were used to refine two quantitative trait loci (QTL) mapped on OAR5 and 13 for resistance to Haemonchus contortus in sheep [7]. Other applications of OvineSNP50 BeadChip include investigating gene drivers of pigmentation in Merino sheep [8], long range linkage disequilibrium analysis in wild sheep [9], inbreeding coefficient and pairwise relatedness in Finnsheep [10], and genomic selection in different sheep breeds from around the world by the ISGC [11].
It is well known that Suffolk and Rambouillet were developed in England and France, respectively, but the breeding history of American synthetic breeds may be unfamiliar to readers. In brief, Columbia was one of the first breeds of sheep developed in the United States. In 1912, rams of the long wool breeds were crossed with high quality Rambouillet ewes to produce large ewes yielding more pounds of wool and more pounds of lamb. The original cross was made at Laramie, Wyoming, and then moved to the Sheep Experiment Station, Dubois, Idaho, in 1918. Subsequently, Columbia sheep were released to the public [12]. Polypay sheep were developed at the U.S. Sheep Experiment Station starting in 1968. The objective was to develop a breed with a reproductive capacity markedly superior to that of domestic Western range breeds. The final composition of the Polypay is 1/4 Dorset 6 1/4 Finnsheep 6 1/4 Targhee 6 1/4 Rambouillet. The first ''Polypay'' ewes and rams were sold 1975-1977 [13]. Targhee sheep were developed at the U.S. Sheep Experiment Station, Dubois, Idaho in 1926. A group of cross-bred ewes, consisting of Rambouillet, Lincoln, and Corriedale blood, was bred to USSES Rambouillet rams. After three years, first generation ewes were carefully selected and bred intensely. The U.S. Targhee Sheep Association was founded in 1951 (http://www.ustargheesheep. org/).
Generally speaking, sheep breeds can be classified into three groups: meat, wool, or dual-purpose breeds based on their breeding objectives. For example, Suffolk is a typical meat breed as the animals possess large body size, rapid growth rate, and high cutability carcasses (http://u-s-s-a.org/). On the other hand, Rambouillet sheep represent a fine wool breed with a welldeveloped flocking instinct, an extended breeding season, and high-quality fleece (http://www.countrylovin.com/ARSBA/facts. htm). Columbia, Targhee, and Polypay are, however, considered as dual-purpose breeds, because they are fast-growing, highquality market lambs that also yield heavy, medium-wool fleeces with good staple length [12][13] (http://www.sheepusa.org/).
Previously, microsatellite markers were the main source of markers used to investigate genetic diversity of sheep breeds. For instance, Bayesian cluster analysis on microsatellite genotypes of 666 animals for 28 U.S. sheep breeds derived from 222 producers located in 38 states was able to distinguish meat vs. wool producers due to physiological differences rather than geographic origin [14]. In the present study, our goal was to test the power of the Illumina OvineSNP50 Genotyping BeadChip in evaluating genetic diversity, genome selection, and breed differentiation among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds. These results may provide new guidance for the synthesis of new breeds with different breeding objectives.

Illumina OvineSNP50 BeadChip Genotyping Basics
Among the 54,241 SNPs on the Illumina OvineSNP50 BeadChip that were genotyped on the 94 sheep DNA samples, we observed that 695 SNPs had no calls, 1,019 SNPs were not genotyped for at least 95% of all the individuals, 1,235 SNPs were monomorphic in all breeds, 350 SNPs could not be assigned to chromosome locations and 2,057 SNPs had MAF #0.05 for the whole dataset. By excluding the SNPs described above, the remaining 48,885 SNPs, including 47,597 autosomal SNPs and 1,288 SNPs on chromosome X, were used for further analysis. Of the 1,288 SNPs on chromosome X, 1,176 SNPs identified non-heterozygous males, while 112 SNPs were heterozygous in some rams. This might be related to a homologous region between chromosomes X and Y because our data do not show that the heterozygous regions are random. As suggested by Gautier et al [15], we further excluded a total of 747 autosomal SNPs, which showed significant (P,0.01) deviations from the Hardy-Weinberg equilibrium (HWE) test due to the small number of samples. We did not test chromosome X because the SNPs on chromosome X in males carry only one copy. As a consequence, 46,850 autosomal SNPs were included in linkage disequilibrium, genetic diversity and DSR analyses, while the 1,176 SNPs with non-heterozygous males on chromosome X were used for DSRs analysis only. As shown in Figure S1, the final 46,850 and 1,176 SNPs were uniformly distributed on different autosomes  and the X chromosome, and were comparable to the initial distribution of the 54,241 SNPs on these chromosomes, although the numbers of SNPs in each chromosome were different between them.

r 2 Measurements by Chromosomes
The r 2 values for pairs of loci were measured along with the physical distance separating the loci and averaged within each breed. As the sheep genome is currently estimated to be 2.86 Gb in size (http://genome.ucsc.edu/), the 46,850 SNPs used in linkage disequilibrium (LD) analysis would have an average inter-marker distance of approximately 60 Kb. As shown in Figure  S2, the average within-population pairwise r 2 dropped quickly toward its asymptotic value when physical distances reached 200 Kb. More interestingly, the decreasing trends of the average r 2 values remained similar among these five breeds, but the Suffolk breed had the highest r 2 , followed by Columbia, Rambouillet, Targhee, and Polypay, respectively.
The Genetic Structure at the Individual Level The results clearly showed that there were no conflicts about the origin of individuals assigned to each breed. Also, the individuals from different sheep breeds were clearly clustered with closer genetic distances observed among Targhee, Columbia, Rambouillet, and Polypay breeds as compared to the Suffolk population.

Accessing the Genetic Structure at the Population Level
As shown in Figure S3, the gene diversity, heterozygosity, and polymorphism information content (PIC) among five sheep populations were 0.3291-0.3576, 0.3496-0.3722 and 0.2619-0.2837 respectively. Polypay and Targhee populations had the highest gene diversity, heterozygosity, and PIC while the Suffolk population had the lowest values in these indexes. Classical Fstatistics showed that most variation originated from individuals within a breed, while only 11% of the variation resulted from different breeds (Table S1). In particular, Targhee sheep had the highest within-breed variation.
Furthermore, a multidimensional scaling plot clearly showed the genetic origin of breed between Suffolk and Rambouillet and among Rambouillet-related sheep breeds ( Figure 2). Based on pair-wise estimates of F ST, significant genetic differentiation appeared between Suffolk (meat breed) and Rambouillet (fine wool breed) (F ST = 0.1621). In comparison, Rambouillet-related breeds were not significantly separated (F ST = 0.0681-0.0952). In particular, Rambouillet and Targhee had the closest relationship (F ST = 0.0681) ( Figure 2).

Population Differences in Minor Allele Frequencies
Based on minor allele frequency of SNPs, different levels of variation across breeds were observed. As shown in Figure S4, over 80% of the SNPs with one allele in all breeds had a MAF . 0.10. In all MAF ranges, the proportion of loci were significantly different among all sheep breeds (x 2 = 204.1084-1510.9140, P = 0.0000), indicating that each sheep breed had different numbers of SNPs in each MAF range.

Characterization of DSRs between Suffolk and Rambouillet
Based on the SNP F ST estimates, a total of 45 DSRs were identified in genomes between Suffolk and Rambouillet, which contained the top 0.1% of markers (48 SNPs in autosomal chromosomes and 6 SNPs in chromosome X; Table 1). Further examination of these DSRs identified 608 unique known genes, including 507 from autosomal DGRs and 101 from X chromosome DSRs (Table S2). The GO analysis revealed pathways enriched for a wide range of biological processes, such as regulation of organelle/cytoskeleton organization, translational elongation, protein catabolic processes, and cilium morphogenesis (Table S3). Among these 45 DSRs between Suffolk and Rambouillet, 13 also appeared as DSRs in cattle (Table S4)  of this region, is an evolutionarily conserved protein implicated in cell division and morphology [16]. Additionally, selection signals were detected for genes associated with economically important traits, i.e., MITF and GHR.

Characterization of DSRs among Rambouillet-related Breeds
Genome-wide distribution of F ST among Rambouillet, Columbia, Polypay, and Targhee are shown in Figure 4. Among these four Rambouillet-related breeds, 41 DSRs were identified with the top 0.1% of markers ranked by SNP F ST (46 in autosomal and 1 in chromosome X, Table 2). These DSRs harbor a total of 526 unique genes, including 524 from autosomal DSRs and 2 from chromosome X DSRs (Table S5). Interestingly, GO analysis revealed that the enriched pathways were mainly related to cell adhesion processes (Table S6). Among these four sheep breeds, the OAR21_19719146 SNP (F ST = 0.65, region 37) ( Table 2) that belongs to potassium channel tetramerisation domain containing 14 (KCTD14) gene ranked highest, but unfortunately, little is known about this gene. Our data also show that both sheep and cattle may share eight DSRs identified among Rambouillet, Columbia, Polypay, and Targhee (Table S7).

Discussion
In the present study, we used the Illumina OvineSNP50 Genotyping BeadChip to analyze the genetic diversity and genome selection among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds from the USDA, ARS, U.S. Sheep Experiment Station. Our present study determined that close genetic relationships exist within Rambouillet-related breeds: Rambouillet, Columbia, Polypay, and Targhee, while Suffolk sheep are well separated from the Rambouillet-related breeds. The F ST results showed significant genetic difference between Suffolk and Rambouillet (F ST = 0.1621). Between these two distinct breeds, 45 DSRs and 608 candidate genes were identified using the Ovine Genome v3.1 Assembly as a reference. On the other hand, 41 DSRs and 526 genes were also determined among the four Rambouillet-related breeds.
Polypay (Finn 6 Targhee 6 Rambouillet 6 Dorset) [13], Targhee (Rambouillet 6 Lincoln 6 Corriedale), and Columbia (Lincoln 6 Rambouillet) are three breeds that were originally developed at the U.S. Sheep Experiment Station decades ago. Rambouillet and Suffolk were developed in France and England, respectively. In the present study, the highest gene diversity, heterozygosity and PIC were shown in Polypay and Targhee. This is not surprising because these two sheep breeds are the most recently developed breeds and we expect them to retain greater heterozygosity than the three other sheep breeds. Also, we found the F ST averaged 0.1140 but Rambouillet-related breeds were not significantly separated, suggesting that the genetic differentiation is mainly between Suffolk and Rambouillet-related breeds. Not unexpectedly, cluster analysis also clearly showed that Suffolk is genetically distant from the other four sheep breeds ( Figure 1). These results are rational because Columbia, Targhee, and Polypay are Rambouillet-related breeds. In particular, pair-wise F ST estimation suggested the Targhee should be considered genetically most similar to Rambouillet (Figure 2). These results are reasonable because they are supported by our records and breed selection history of Columbia, Targhee, and Rambouillet. Recently, the same SNP chip was used to assign population of origin between wild sheep breeds, including bighorn and thinhorn sheep [9,] and to determine the historic selection of 74 sheep breeds [11]. And in cattle, the bovine SNP chip had also been used to reveal genetic history or population diversity [17][18]. Now, our results further confirm that the SNP chip is a powerful tool to discover the population genetic diversity in livestock and these data provided strong evidence of the genetic structure in these five sheep breeds.
Meat, wool, and dual-purpose breeds of sheep were developed because these are highly valued traits in sheep production. Based on genetic distance, our results indicated that the meat breed (Suffolk) is very distant from the fine wool breed (Rambouillet), which is very much in line with the functional purpose of the different breeds. For example, sheep selected for meat production generally have greater body weights. Mature weights of Suffolk rams, which have been historically selected for meat production, range from 113 to 159 kg and the fleece is considered a medium wool type with a staple length of 5 to 8.75 cm (http://u-s-s-a.org/). In comparison, mature Rambouillet rams, that have been bred to produce high quality wool, are smaller and weigh between 113 to 135 kg while the fleece staple length varies from 5 to10 cm and fiber diameter ranges from 18.5 to 24.5 microns (http://www. countrylovin.com/ARSBA/facts.htm).  In the present study, a genome-wide scan or differentiation analysis using F ST revealed 45 chromosomal regions with evidence for selection. Interestingly, three regions, 19, 24, and 37, are almost identical to the regions identified in the 74 sheep breeds examined in [11], implying that important genomic selections might appear in these regions. Interestingly, region 24, which includes the RXFP2 gene that is involved in horn morphology had a selection signal that was reconstituted only when comparing horned with polled populations [11,19], was discovered in this study. Kijas et al (2012) [11] indicated this gene had the strongest selection signal due to the long-standing nature of selection. But in our study, only had two Rambouillet rams had horns., Therefore, the sample size was most likely too small to detect a difference in our study. Not unexpectedly, GHR, an important growth-related gene, was identified (region 32 on OAR 16). It is well-known that this gene affects body growth and decreases fatness [20], and its genetic variations are associated with growth traits in sheep or cattle [20][21]. Therefore, our study provides additional information for interpreting the difference in growth ability between Suffolk and Rambouillet. The highest ranked SNPs (F ST .0.90) were located in glutamate receptor interacting protein 1 (GRIP1) and ankyrin repeat and sterile alpha motif domain containing 1B (ANKS1B). Many studies have found GRIP1 plays an important role in receptor trafficking, synaptic organization, transmission in glutamatergic and GABAergic synapses and modulating autistic phenotype [22][23][24]. But unfortunately, little is known about the function of GRIP1 in livestock. Here our results might provide a new clue for its role in sheep production. Recently, ANKS1B gene had been shown to be associated with body weight index and waist circumference in human GWAS studies [25], and Parker et al [26] indicated this gene may underlie the QTL associated with body weight in mice. Our studies also suggested ANKS1B gene might be a good growth trait candidate. However, additional studies are required to confirm this speculation. Interestingly, we discovered MITF gene in DSRs (region 37 on OAR 19). This gene accounts for pigmentation phenotypes in cattle [27]. However, ASIP, which controls a series of alleles of black and white coat color [28], was not included in our DSRs. In sheep, gene duplications might also cause black fleece [28]. In this study, the Suffolk has black head and legs, while the Rambouillet does have recessive black. It appears as though the key gene of pigmentation may provide evidence for selection between the two sheep breeds. In this study we also identified FRY, a gene involved in growing wing hairs [29] and bristles [30] in Drosophila. Mutations in FRY resulted in the formation of a strong multiple hair cell phenotypes that consisted of clusters of epidermal hairs and branched hairs [31]. But there is a little known about the role of FRY in livestock. In the present study, Rambouillet is often considered a fine wool breed while Suffolk has rather poor quality wool. Therefore, our results provide strong evidence for the role of FRY in sheep wool development. Additionally, 13 of the 45 DSRs identified in sheep represent those in cattle, suggesting that these genes are targets for selection across multiple species.
As described above, Columbia, Polypay, and Targhee are related to Rambouillet sheep. Among these four sheep breeds, a total of 41 DSRs were identified (Table 2). Interestingly, GO terms analyses of functionally known genes in these regions discovered pathways related to hemophilic/cell adhesion, translational elongation, germ cell development, sexual reproduction, and macromolecule biosynthetic processes. Some signature genes suggested strong selection given their roles, such as CNTNAP5, ADAM23, and PCDHB4 in cell adhesion, ME3, RIMS2, and TDRD7 in cellular respiration, cellular macromolecule/protein localization, and multicellular organism reproduction, respectively. These might result from long-term selection for improved reproduction and wool traits in these four sheep breeds [32][33].
In summary, we revealed the genetic diversity among Suffolk, Rambouillet, Columbia, Polypay, and Targhee sheep breeds of the United States using the Illumina OvineSNP50 BeadChip. Meanwhile, DSRs between Suffolk and Rambouillet and among Rambouillet-related sheep breeds were also identified with production of a list of candidate genes in these regions based on Ovine Genome v3.1 Assembly. Estimation of genome-wide diversity and identification of DSRs regions provide a powerful method to identify economically important trait-related genes that have been enriched during a long-term selection for different breeding objectives. Furthermore, our results also provide a foundation to further investigate sheep evolution and gene functions in the near future.

Ethics Statement
The U.S. Sheep Experiment Station Animal Institutional Care and Use Committee specifically approved this study (Protocol number: 11-01). All efforts were made to minimize any discomfort during blood collection.

Sheep, DNA Preparation, and Genotyping on Illumina OvineSNP50 BeadChips
In the present study, blood samples were collected from 19  Columbia, 19 Polypay, 16 Rambouillet, 18 Suffolk, and 22 Targhee rams at the U.S. Sheep Experiment Station in Dubois, Idaho. Rams were produced from unique dams and 12, 12, 9, 10, and 17 unique sires of the Columbia, Polypay, Rambouillet, Suffolk, and Targhee breeds, respectively. The number of sheep per breed in this study is similar to the average number of sheep per breed Kijas and coworkers [11] used to quantify breed mixture and selection using the OvineSNP50 BeadChip. Blood was collected via jugular venipuncture into EDTA coated vacutainer tubes. Thereafter, DNA was extracted from 200 mL of whole blood with the GenElute Blood Genomic DNA extraction kit (Sigma, St. Louis, MO) according to the manufacturer's instructions. All DNA samples were genotyped with standard procedures at GeneSeek (Lincoln, NE, US) on the OvineSNP50 genotyping BeadChip. Basic information on the 54,241 SNPs on the BeadChip, including SNP name, chromosome, and map location was provided by the service provider. The genotype quality control process was as previously described [34].

Population Genetic Basics Analysis
Analysis of minor gene allele frequencies (MAF) was conducted with the chi-squared test using SAS Software for Windows v9.2 (SAS Institute Inc., Cary, NC). An exact test for Hardy-Weinberg Equilibrium (HWE) [35] of polymorphic SNPs was further carried out within each breed separately. We also computed the r 2 measure between each marker pair within each breed separately using Haploview 4.1 [36]. Allele sharing distances of the neighborjoining tree relating sheep individuals were computed by Power-Marker V3.25 software [37], and then the neighbor-joining tree was constructed by MEGA 5 [38].

Detection of Differentially Selected Regions (DSRs)
Fisher's exact test was performed by R 2.14.0 to compare the allele frequencies between Suffolk and Rambouillet and among Rambouillet-related breed populations first. A SNP with a P value ,0.05 was considered to be a statistically significant SNP after Bonferroni correction. Then, estimation of SNP and populationspecific F ST were based on the model proposed by Nicholson et al [42] and Flori et al [43]. The DSR algorithm was described previously [11], but with slight modifications: 1) raw values were ranked and used to identify regions; 2) the significant SNPs with 0.1% or 5% highest F ST values were selected as the top significant SNPs; 3) centered on the top significant SNP (0.1%), neighboring markers were included until markers were encountered more than three consecutive SNPs ranking outside of the top significant 5%. We only considered the range between the upstream and downstream 1.5 Mb of the top SNP (0.1%) if the length of candidate regions were more than 3 Mb and combined any two regions as one region if they overlapped. SNP-specific Fst values were smoothed over each chromosome with a local variable bandwidth kernel estimator [44].
Genes in these DSRs were examined for potential involvement in phenotypes using the Ovine Genome v3.1 Assembly (http:// www.livestockgenomics.csiro.au/cgi-bin/gbrowse/oarv3.1/). The functional annotation of target genes for the gene ontology was performed using DAVID bioinformatics resources [45]. Allele frequency per breed for all DSR can be found at www. animalgenome.org/repository/pub/USDA2013.0411/.