Figures
Abstract
Knowledge of linkage disequilibrium (LD), genetic structure and genetic diversity are some key parameters to study the breeding history of indigenous small ruminants. In this study, the OvineSNP50 Bead Chip array was used to estimate and compare LD, genetic diversity, effective population size (Ne) and genomic inbreeding in 186 individuals, from three Iranian indigenous sheep breeds consisting of Baluchi (n = 96), Lori-Bakhtiari (n = 47) and Zel (n = 47). The results of principal component analysis (PCA) revealed that all animals were allocated to the groups that they sampled and the admixture analysis revealed that the structure within the populations is best explained when separated into three groups (K = 3). The average r2 values estimated between adjacent single nucleotide polymorphisms (SNPs) at distances up to 10Kb, were 0.388±0.324, 0.353±0.311, and 0.333±0.309 for Baluchi, Lori-Bakhtiari and Zel, respectively. Estimation of genetic diversity and effective population size (Ne) showed that the Zel breed had the highest heterozygosity and Ne, whereas the lowest value was found in Baluchi breed. Estimation of genomic inbreeding using FROH (based on the long stretches of consecutive homozygous genotypes) showed the highest inbreeding coefficient in Baluchi and the lowest in Zel breed that could be due to higher pressure of artificial selection on Baluchi breed. The results of genomic inbreeding and Ne showed an increase in sharing haplotypes in Baluchi, leading to the enlargement of LD and the consequences of linkage disequilibrium and haplotype blocks confirmed this point. Also, the persistence of the LD phase between Zel and Lori-Bakhtiari was highest indicating that these two breeds would be combined in a multi-breed training population in genomic selection studies.
Citation: Barani S, Nejati-Javaremi A, Moradi MH, Moradi-Sharbabak M, Gholizadeh M, Esfandyari H (2023) Genome-wide study of linkage disequilibrium, population structure, and inbreeding in Iranian indigenous sheep breeds. PLoS ONE 18(6): e0286463. https://doi.org/10.1371/journal.pone.0286463
Editor: Shamik Polley, West Bengal University of Animal and Fishery Sciences, INDIA
Received: December 20, 2022; Accepted: May 16, 2023; Published: June 2, 2023
Copyright: © 2023 Barani et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are included within the paper.
Funding: This study was funded by Animal Science Research Institute of Iran, Mobarakandish Institute and AgResearch, New Zealand, Project number: PRJ-2016/11547. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Archaeological and genetic evidence suggest that domestication of sheep occurred approximately 9000 ago (BC) in a region in the west of the Zagros Mountains of Iran-Iraq [1]. Furthermore, according to the FAO in 2020, Iran is sixth in the world for the live sheep with around 47 million sheep (https://www.fao.org/faostat/en/#data/QCL). Iran has been blessed with several different climates that produce various sheep breeds. Therefore, different climates have accelerated diversity in the genetic structure of Iranian sheep. In a worldwide climate-changing scenario, keeping animals adapted to harsh environmental conditions becomes increasingly important. In this sense, local sheep breeds constitute an important genetic resource due to their rusticity and adaptability to various agroecological environments. The Iranian sheep breeds used in this study are among the most common and main indigenous sheep breeds reared in a variety parts of Iran, that being collected from or near the center of domestication. The Baluchi is a fat-tailed medium size breed that is well adapted to the warm and dry environmental conditions and is accounting over 29% of Iranian sheep population [2]. The Lori-Bakhtiari sheep is one of the most widespread native breeds in the southwestern of the Zagros mountains, and has the fattest tail of all Iranian sheep breeds those adapted to cold and mountainous regions of western Iran. Whereas, the Zel is the only thin tail Iranian breed adapted to the lush green and wet conditions of northern slopes of the Elburz mountain range near the Caspian Sea. This breed is also known as the Aryan breed since the historical evidence shows that the Aryans, who were living in these areas, attempted to domesticate these animals [3].
The knowledge of genetic structure is crucial for genomic prediction within and among populations, genome wide association and local sheep breeds conservation. Rapid progress in the genomic selection (GS) will facilitate the opportunity to implement GS in small ruminant [4]. The basic assumption in GS is that the marker haplotypes should be in linkage disequilibrium (LD) with the quantitative trait loci (QTLs) located between the markers, and the useful threshold for LD in GS and association studies should be higher than 0.3 [5]. LD is defined as the non-random relationship between two loci within a population [6]. Currently, the genetic relationship between multiple breeds is identified by analyzing LD, consistency of gamete phase, and haplotype block structure between different breeds. Thus, when the markers and QTLs have similar LD phase between the breeds, we can use informative markers of both breeds in construction of a multi-breed training population [7].
Amid the important challenges of genome-wide studies on small ruminants is the restriction on artificial insemination that leads to striving for pedigree information based on multiple-sire natural mating groups of rams and ewes in a natural mating system that is not reliable. Then, the knowledge of inbreeding, genetic diversity, and the effective population size (Ne) of livestock populations is crucial for the success of breeding programs. The development of genome wide information in small ruminants has allowed for the measuring of genomic inbreeding and diversity by identifying runs of homozygosity (ROH) and heterozygosity. In livestock genetics, ROH regions, consisting of continuous homozygous loci assumed to originate from the same ancestor, are commonly used for inbreeding detection [8]. Furthermore, The estimation of historical effective population size (Ne) is a widely spread method of modelling the evolution of genetic diversity of populations and it is very useful for designing conservation strategies of indigenous breeds [9].
Therefore, considering that the uneven distribution of LD along the genome has an important effect on genomic prediction, in such a way that genetic variance in regions of high LD is overestimated for causal variants and is underestimated in regions of low LD [10], we investigated the genetic structure of Iranian indigenous sheep breeds by LD as the first objective and, the second objective was to estimate observed and expected heterozygosity, ROH, and Ne.
Material and methods
Ethics statement
All methods and animal care and handling procedures were allowed and approved by the University of Tehran Animal Care and Use Committee (No. 2016/11547). All efforts were carried out in accordance with relevant regulations to minimize any discomfort during blood collection. The authors also complied with the ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines.
DNA samples and SNP genotyping
Three sources of genomic data, including Zel (n = 47), Lori-Bakhtiari (n = 47), and Baluchi (n = 96), were used in the current study. The dataset of Zel and Lori-Bakhtiari has been previously described in detail by Moradi et al. [11] and Baluchi sheep by Golizadeh et al. [12]. Animal sampling for the Zel breed was performed in the northern region, for the Lori-Bakhtiari breed in the western part of Iran close to the Zagros Mountains, and the Baluchi breed in the Abbas-Abad sheep breeding station in north-eastern Iran. DNA was extracted from all animal blood samples using salting out methods [11] and then, genotyping was performed using the Illumina Ovine SNP50 Bead Chip.
Quality control (QC) and genetic diversity analyses
Quality controls of the genomic data was performed using PLINK v1.09 software [13] as follows; first animals with more than 10% missing genotypes were removed. Then, the SNPs with minor allele frequency (MAF) <0.05 and call rates <95% over all samples, and SNPs deviating strongly from Hardy-Weinberg equilibrium within breed (P-value<10−6) were excluded. The Hardy-Weinberg equilibrium was tested as genotype errors, as it is most likely that technical problems explain this result [14]. To obtain a significant level in this test, the Bonferroni correction (β = α/n) was used to address the problem of multiple comparisons. The number of tests was taken to be the number of SNPs (n = 50,000) giving β = 10−6, which corresponds to α = 0.05 experiment-wise error [15].
For the remaining SNPs after mentioned QC, the SNP locations were mapped for ovine genome assembly by OAR_V4.0 using information from the Sheep HapMap dataset (http://www.sheephapmap.org); and finally, the SNP markers whose genomic location was unknown or located on the X chromosome were removed from the total markers. Expected heterozygosity (He) and observed heterozygosity (Ho) were calculated for each SNP which passed the quality control following the methods suggested by Al-Mamun [16], and then averaged over all SNPs.
Principal component analysis (PCA) and population analyses
Principal component analysis (PCA) was performed using the prcomp function in the R version 4.0.3 package (http://cran.r-project.org), which considers the total variance in the data and alters the original variables into a smaller set of linear compounds. After that for more insight and confirmation, population clustering and admixture analysis were determined using the program ADMIXTURE 1.23 [17]. ADMIXTURE is a “hill-climbing” optimization algorithm as a pre-compiled binary executable based on maximum-likelihood that estimates a FST value based on the inferred allele frequencies between each of the ancestral populations [18]. We ran Admixture for 10 to 20 iterations increasing K (An input value for belief of the number of ancestral populations [17]) from 2 to 4. Cross validation (CV) error estimation for each K was performed to determine the optimal number of clusters.
Measures of average LD and persistence of phase
Haplotype reconstruction and phasing of the genotypes by chromosome were carried out using BEAGLE 3.3.1 [19]. A haplotype is a physical grouping of genomic variants from multiple genetic loci on the same chromosome that are inherited as a unit that can encompass two or more SNP alleles [20]. The linkage disequilibrium between adjacent SNPs was measured by the squared correlation coefficient (r2) [21]. The r2 was computed based on the following equation by Haploview v4.2 software [22]:
Eq 1
Where freqA, freqa, freqB and freqb are the frequencies of alleles A, a, B, b, respectively, and freqAB, freqab, freqAb and freqaB are the frequencies of haplotypes AB, ab, Ab and aB, respectively. Then, based on estimated r2 values, sample size correction was performed by the following equation [23]:
Eq 2
Where n is the number of haplotypes in the sample. Averaged r2 was calculated for chromosomes of each breed between pairwise SNPs in different distance categories (0.01<, 0.01–0.02, 0.02–0.04, 0.04–0.06, 0.06–0.08, 0.08–0.1, 0.1–0.2, 0.2–0.5, 0.5–1, 1–2, 2–5, 5–10 and 10–20 (Mb)).
To estimate persistence of LD phase between two breeds, only segregating SNPs in both breeds were included in the analysis. Persistence of LD phase was estimated for intervals of 100 kb following Badke et al. [24] as:
Eq 3
Where is the correlation of phase between rij(k) in population k and
in population k’, S(k) and
are the standard deviation rij(k) and
, and
/
are the average rij across all SNP i and j within interval p for population k and
, respectively.
Haplotype blocks.
The haplotypes were phased using BEAGLE v3.3.1 software [19], and then the haplotype blocks were determined for each chromosome using Haploview v4.2 software [22] based on the method suggested by Gabriel et al. [25]. A pair of markers was defined to be in strong LD if the one-sided 95% confidence bound of D’ was higher than 0.98 and if the lower bound was over 0.7.
Inbreeding coefficients and effective population size
Inbreeding coefficients based on genotype data for each breed were calculated by GCTA software [26]. The three estimates of inbreeding coefficient (F) calculated by this program consist of the FGRM calculated based on the variance of the additive genotype, the FHOM estimated based on the excess of homozygotes, and the FUNI calculated based on the correlation between uniting gametes. Inbreeding coefficient for each breed was measured as the average of inbreeding coefficients of all individuals for that breed. Runs of homozygosity (ROH) was calculated using PLINK v9.1 software [13] with adjusted parameters (—homozyg-density 1000,—homozyg-kb 10,—homozyg-window-het 1, and—homozyg-window-snp 20). The minimum number of SNPs needed to constitute an ROH (l) was estimated using the method proposed by Lencz et al. [27],
Eq 4
Where ns is the number of genotyped SNPs per individual, ni is the number of individuals, α is the percentage of false-positive ROH (0.05), and het is the average of SNP heterozygosity across all SNPs. Finally, the measurement of inbreeding based on ROH (FROH) was calculated by the following equation [27]:
Eq 5
Where LROH is the sum of ROH lengths and LAUTO is the total length of autosomes.
Effective population size (Ne) is a key population genetic parameter for calculating genetic diversity and can be used to estimate the inbreeding coefficients. We estimated effective population sizes from LD using SNeP software [28]. The software SNeP allows the estimation of Ne trends across generations using SNP data that corrects for sample size, phasing, and recombination rate.
Where NT is the effective population size t generations ago, calculated as t = (2f (ct))– 1 [29], ct is the recombination rate defined for a specific physical distance between markers, is the LD value adjusted for sample size, and α = (1) a correction for the occurrence of mutations [30].
According to the relationship between r2 and Ne, effective population size can be calculated from LD data for each autosomal chromosome at distance bins of <0.01, 0.01–0.02, 0.02–0.05, 0.05–0.1, 0.1–0.2, 0.2–0.5, 0.5–1, 1–2, 2–5, 5–10 and 10–20 cM by the following equations [31]:
Eq 7
Eq 8
Where Ne is the effective population size, r2 is a measure of LD among SNP alleles per chromosome, and c is the recombination rate in Morgan units regarding the average distance between two markers, we assumed 1 Mb = 1 cM and the generation of Ne or T is equal to 1/2c [29].
Results and discussion
Quality control (QC) and a summary of statistics obtained for the SNPs passed QC
Out of 190 animals from three sheep breeds, 178 animals passed the quality control. Table 1 shows the results of quality control for each breed.
The average distance between SNPs on autosomal chromosomes after filtering was 57.92, 58.78 and 58 Kb in Zel, Lori-Bakhtiari and Baluchi breeds, respectively. The average marker distances in current study for Zel and Lori-Bakhtiari were different from 60Kb that reported by Moradi et al. [15]. This may be due to the use of the Ovine Genome Assembly V4.0, instead of v1.1 that have been used by Moradi et al. [15], and also excluding the sexual SNPs in the current study.
A summary of the distribution of the remaining SNPs after quality check per each chromosome and the average r2 on each chromosome is indicated in Table 2.
Chromosome 21 had the largest distance between adjacent SNPs for all breeds (Table 2). The maximum and the minimum number of SNPs over all genotyped animals were observed on chromosome 1 and 24, respectively, for all breeds. The minor allele frequency (MAF) is important genetic diversity parameter and effective on population structure [32, 33]. It influences the r2 value, and r2 decreases significantly with the difference in MAF between loci [34]. O’Brien et al. [35] proposed MAF>0.05 as the optimal threshold for QC and estimated unbiased LD.
In our study, the average of MAF for the SNPs before quality control (QC) was 0.27, 0.27, and 0.25, in Zel, Lori-Bakhtiari and Baluchi, respectively, while these values were changed to 0.29, 0.29, and 0.28 after QC check with MAF>0.05. McRae et al. [36] reported the average of MAF between 0.24 up to 0.26 using same genotyping array in New Zealand sheep. The distribution of MAFs is affected by the long-term demography of the population it represents [37]. It seems New Zealand sheep have been under more selection intensity during last years, and are thus more inbred [38], resulting in slightly lower MAFs. The average of MAF for Iranian sheep breeds was almost similar to what reported by García-Gámez et al. [39] in Spanish Churra sheep breed (the average of MAF = 0.288). The distribution of MAF per each chromosome was almost uniform in different breeds. The proportion of SNPs with the MAF higher than 0.3 was 48.3%, 51.36%, and 47.8% in Baluchi, Lori-Bakhtiari, and Zel, respectively.
The average r2 on each chromosome showed that OAR 24 and 25 in Baluchi, OAR 9 and 21 in Lori-Bakhtiari and OAR 23 and 24 in Zel have higher LD value than other chromosomes, while in Australian sheep breeds OAR 10, 22 and 23 had the highest r2 and haplotype blocks [16]. Also, Liu el al. [40] demonstrated OAR 24, 25, 18 and 10 had the highest average r2 values for a distance of 0–10 kb which is consistent with our results. Mateescu & Thonney [41] identified some regions on seven chromosomes 1, 3, 12, 17, 19, 20 and 24 in Dorset×East Frisian sheep, that associate with a seasonal reproduction trait. The difference in extent of LD on chromosomes in different studies could probably be due to the fact that LD means correlation between alleles, not physical association of loci, so factors other than physical distance on chromosomes such as mutation, genetic drift, epistasis and amalgamating two populations with different allele frequencies cause disequilibrium between unlinked markers [42].
Population structure and genetic diversity
We performed principal component analysis (PCA) to identify how animals allocated to their true population in this study. The results clustered three distinct populations according to geographical origins and type breed (Fig 1). The first and second PCs (PC1 and PC2) accounted for 7.26 and 1.85% of the total variation, respectively. We found that PC2 separated out thin-tail (Zel) and fat-tail (Lori-Bakhtiari and Baluchi) sheep breeds from each other, while fat tail sheep breeds were separated for PC1 (Fig 1).
Green, purple and blue colors are showing the individuals of Zel, Baluchi and Lori-Bakhtiari sheep breeds in this Fig, respectively.
In the Baluchi population, some outliers were observed farther than the main population. The Lori-Bakhtiari and Zel populations have been subjected to smallholder sheep farms with an extensive selection, incomplete pedigree, and uncontrolled mating, although the Baluchi population has been isolated in Abbas-Abad sheep breeding station. However, genetic links among the Lori- Bakhtiari and Zel populations are most presumably, arising by the absence of full pedigree information, co-ancestry, or gene flow. Until recently, the movement of livestock without animal identification had not been prohibited; therefore, gene flow could be possible due to animal migration by nomadic tribes.
Population structure was analyzed by considering different K numbers (2–4) based on autosomal chromosomes. The structure analysis with K = 2 clustered the Lori-Bakhtiari and Zel populations into the same group (Fig 2). It may be compatible with this belief that domestication of sheep occurred in the Zagros mountains, then outspread in other regions [1]. In other words, this could be due to the migration of individuals between these two populations and possible common ancestry. These results represent the genetic closeness between Zel and Lori-Bakhtiari and confirm the findings based on the PCA. Setting K = 3 clustered all populations into distinct clusters. When the K value was 4, the structure analyses indicated introgression of the Zel population with the Baluchi population; however cross-validation error had the lowest value. It seems the influence of the Great Silk Road on the gene pool of local sheep to be plausible [43]. Cross-validation error for K = 2, 3 and 4 was 0.559, 0.554, and 0.553, respectively.
Each thin vertical line represents one individual and each color shows one inferred ancestral population.
Various methods have been suggested to evaluate the genetic diversity, although the Ho and He are the most widely used to measure genetic diversity in a population [44]. The average of He ± SD, calculated based on autosomal chromosomes, was 0.375 ± 0.117 for Baluchi and 0.382 ± 0.113 for both Zel and Lori-Bakhtiari and, the averaged Ho for Baluchi, Zel, and Lori-Bakhtiari was 0.382, 0.383, and 0.388, respectively. It should be noted that the low heterozygosity observed in Balochi may be due to ascertainment bias, since the samples of this breed was collected from AbbasAbad station located in north-eastern Iran. Different animal breeding programs have been used in this station during last years, and it seems the lowest genetic diversity can be consistent with this issue in this breed. While, the rural livestock system of Zel and Lori-Bakhtiari is as the rams and ewes are housed and grazed together and there is no control over mating and inbreeding. Eydivandi et al. [45] reported the range of Ho in Iranian domestic sheep breeds ranged from 0.343 up to 0.389. Al-Mamun et al. [16] reported almost the same He in Australian sheep populations with 0.38, 0.31 and 0.34, in Merino (MER), Border Leicester (BL), and Poll Dorset (PD), respectively. Deniskova et al. [43] investigated genetic diversity of 25 Russian sheep breeds by the whole genome information and reported that Romanov breed had the lowest level of genetic diversity with an He = 0.354. Dávila et al. [46] suggested suitable He greater than 0.5 for genetic diversity of a breed.
Linkage disequilibrium and persistence of LD phase
Linkage disequilibrium was calculated separately for each breed using r2. The average r2 values between adjacent SNPs across autosomal chromosomes were different for each breed. The Baluchi breed had the highest level of LD and the Zel breed had the lowest level of LD across all distances. At distances up to 10Kb, the mean r2 ± SD between adjacent SNPs were 0.388 ± 0.324, 0.353±0.311, and 0.333±0.309 for Baluchi, Lori-Bakhtiari and Zel, respectively. The average r2 values presented a slight difference among autosomal chromosomes for each breed. In the analysis of LD, decay for distances from 0 to 50 Mb is shown in Fig 3, indicating the r2 values decreased rapidly with increasing distances between markers.
The average r2 values were estimated at distances up to 50 Mb. Red, green and blue colors are showing Baluchi, Lori-Bakhtiari, Zel sheep breeds in this Fig, respectively.
The minimum average values of r2 were obtained at a distance of 10 to 50 Mb in all the breeds. Previous studies suggested an r2 higher than 0.3 for GWAS [47], while an LD of more than 0.2 is considered essential for estimating genomic breeding values with around 0.85 accuracies [5]. Considering that in the current study, the average r2 for the threshold of 0.2 was reached at a distance of 27 Kb for Zel and Lori Bakhtiari breeds and 41 Kb for the Balochi breed, it seems, the SNP chips with a marker density higher than 90K in the Zel and Lori-Bakhtiari breeds and 60K in the Baluchi breed will be needed for GS studies. At distances up to 10 Kb, the percentage of pairs of markers with r2>0.3 was 48.4% (Baluchi), 44.3% (Lori-Bakhtiari), and 41.1% (Zel). The differences in the LD pattern can be due to the selection process, population structure, Ne, marker allele frequencies, and the average of the distances between SNPs [48, 49]. In this study, LD is measured by r2 because compared to D’, it is less influenced by sample size and allele frequency. Using D’ for small sample sizes can lead to overestimates in LD [50]. Khatkar et al. [6] suggested a minimal sample size of 75 for r2, but Bohmanova et al. [49] showed a sample size of 22, tended to overestimation of r2. In this regard, the estimated r2 values were corrected for sample sizes. Baluchi indicated higher levels of LD compared to other breeds, this was probably due to the upward selection intensity and the small sample size of breeding stations. Selection, over generations, on the allele of interest will increase the frequency of adjacent alleles known as "selective sweep" and lead to increased linkage disequilibrium and decreased diversity at these points, so the extent of linkage disequilibrium depends on the selection intensity in the breeding program [51].
Persistence of the LD phase is important for GS in multiple breeds and genome-wide association studies because it can be used to characterize the marker density and generate a multi-breed training population. Our results revealed that the highest persistence of the LD phase presented itself between Zel and Lori-Bakhtiari, followed by Zel and Baluchi and finally Lori-Bakhtiari and Baluchi (Fig 4). The high persistence of the LD phase between Zel and Lori-Bakhtiari represents a genetic closeness between these two breeds, which is consistent with the population structure results described above. The expectancy persistence of phase decreased with increasing distance between SNP. However, the persistence of phase decay showed rather erratic behavior between Lori-Bakhtiari and Zel with the exception at two points (from 40 to 60 and 80–100 Kb) and between Lori-Bakhtiari and Baluchi at one point (from 10 to 20 Kb) where there was an increase in the persistence of phase (Fig 4).
Blue, green and red colors are showing the persistency between Zel-LoriBakhtiari, Baluchi-Zel and Baluchi-LoriBakhtiari sheep breeds, respectively.
Haplotype blocks structure
A summary of the analysis for the haplotype blocks structure is indicated in Table 3. Haplotype blocks are defined as long stretches of a chromosome that have low recombination rates [52].
Knowledge of the structure of haplotype blocks provides useful information for GS studies. The pattern of haplotype blocks is different on chromosomes due to a variety of factors such as heterogeneous recombination, population bottlenecks, density of markers, mating among populations with different allele frequencies, and selection intensity on the regions of the genome [53]. The distribution of haplotype blocks per autosomal chromosomes shows that the total length of blocks was 46086 Kb (1.90% spanning percentage of the genome), 14871 Kb (0.61%), and 14633 Kb (0.60%) for Baluchi, Zel, and Lori-Bakhtiari, respectively (Table 3). In this study, we found 1446 haplotype blocks in Baluchi, 604 in Zel, and 636 in Lori-Bakhtiari (consisting of 2 or more SNPs). The coverage percentage of SNPs on autosomal chromosomes was 8.194%, 3.195%, and 3.371% in Baluchi, Zel, and Lori-Bakhtiari, respectively. Baluchi seems to have experienced more intense selection than other breeds. Chromosome 2 showed the largest number of blocks and the total block length among all breeds (Table 3). The coverage of markers on chromosome 2 was higher than other chromosomes. Al-Mamun et al. [16] reported the longest block was detected on OAR10 for Merino (MER), Poll Dorset (PD), and two crossbred populations (F1 crosses of Merino and Border Leicester (M×B) and M×B crossed to Poll Dorset (M×B×P)).
Effective population size (Ne)
The Ne was estimated for all three breeds based on Eq 7 and Eq 8. The results were similar with both methods and, showed the reduction of the Ne by increasing the share of haplotypes. Note that both of the methods use the r2 values combined with marker distances and recombination rate to estimate Ne, but SNeP software estimates r2 and then consider adjusted r2 with recombination rate and corrects for the occurrence of mutations. In this study, estimates of Ne showed a downward slope (Fig 5). The slope of Baluchi was stronger than the Zel and Lori Bakhtiari.
Evolutionary processes such as the rates of genetic drift, loss of genetic variability, the effectiveness of selection, and gene flow depend on Ne [54]. Kimura and Ohta [55] showed the time required for the fixation of one allele depends on Ne and allele frequency. According to the process of reducing Ne obtained in the present study (Fig 5), and considering the results reported by Moradi et al. [15], it seems Zel and Lori Bakhtiari were diverged and the breeds separated from each other, around 1100–1300 generations ago (~5000–6000 years ago with considering ~4.5 generation interval).
The estimated Ne at four generations ago for Baluchi, Lori-Bakhtiari, and Zel were 57, 65, and 66, respectively, which are in range of 50 to 100, the Ne range recommended for conservation [56]. Mastrangelo et al. [57] estimated contemporary Ne equal 25 in Barbaresca sheep breed in southern Italy. The reduction of Ne is probably due to an increase in inbreeding rate and a reduced genetic diversity by domestication, breed formation, and artificial breeding technologies [29]. The main reason for the reduction of Ne in the recent generations in Iranian breeds, observed in the current study, would be the low efficiency of production. So, the use of well-designed breeding programs is necessary to control the loss of genetic diversity, increased rate of inbreeding and reduce the risk of extinction.
Inbreeding coefficient
The inbreeding coefficients estimated by four different methods including FGRM, FHOM, FUNI, and FROH are presented in Table 4.
Due to incomplete and careless recorded pedigree for the studied sheep breeds, the use of the genome-wide data for the estimation of inbreeding, and assessing their accuracy is important. The main advantage of genomic information is to realize true proportion of genome-wide relationship between two individuals and inbreeding for specific regions of the genome [58]. While the pedigree-based inbreeding assumes an equal chance for the two alleles at the same locus on two homologous chromosomes [59], in many loci the two alleles may have different chances for selection, as a result the assumption will be true only under an infinitesimal model [60].
The estimates from the first three methods were almost similar in all breeds. The results revealed that the average of inbreeding calculated by the genomic relationship matrix (FGRM), the excess of homozygosity (FHOM) and the correlation between uniting gametes (FUNI) are higher in Zel than other two breeds. The ROH is defined as the lengths of homozygous genotypes above 1Mb that contain only up to one heterozygous genotype [61]. FROH is the inbreeding coefficient derived from ROH based on molecular approaches that allow for recombination and mutation to apprehend relatedness among founders [62]. Estimation of inbreeding by FGRM, FHOM, and FUNI methods are sensitive to allelic frequencies and the number of copies of reference alleles for ith SNP [63]. Also, these methods cannot distinguish alleles that are IBD or IBS [64]. The use of ROH leads to accurately estimated levels of autozygosity among individuals because this method has better accuracy in distinguishing between IBD and IBS [65]. Zanella et al. [65] studied a comparison of different methods for estimating inbreeding values using genomic (FROH) and pedigree data in commercial pigs indicated that the use of FROH has been more reliable. Their study showed that the estimation of inbreeding using the SNP-by-SNP method overestimates the levels of inbreeding compared to FROH because it uses the frequency of homozygous genotypes, including both IBD and IBS alleles. Comparison of ROH and other methods, derived from GCTA software, demonstrate that ROH has the lowest sensitivity to allelic frequencies, and this method is very capable of distinguishing between IBD and IBS. In this study, the lowest FROH was observed in Zel breed and this is in constant with the results obtained for Ne, as described previously, the largest Ne was observed in Zel than other breeds in this study.
While the previous methods display the inbreeding coefficient for recent generations and does not provide any information about population history, long ROH indicates recent inbreeding that may be due to the mating of close relatives and the decrease of Ne, but shorter ROH suggests the reduction of genetic diversity, bottlenecks, and founder effects in the initial population [16, 66]. The measurement of ROH in the studied breeds indicates that Zel and Lori-Bakhtiari have the longest ROH containing lengths higher than 20Mb, which may be due to selection intensity, reduced effective size of the population, and increased inbreeding in recent generations (Fig 6). Among the breeds, Baluchi had the shortest ROH which was probably due to inbreeding in ancestral generations and a small ancestral population.
The Baluchi ROH length is in the first three categories, between 1–15 Mb, and is not long enough to be in the >15 Mb categories. Red, green and blue colors are showing the individuals of Balochi, Lori-Bakhtiari and Zel sheep breeds, respectively.
Chromosomal inbreeding coefficients are shown in Fig 7. Among breeds, Baluchi had the maximum sum chromosomal run of homozygosity and inbreeding coefficient for all chromosomes. The highest contiguous homozygous stretches in the Baluchi and Lori-Bakhtiari breeds were for OAR 2, 1, 3 and 10. The highest chromosomal inbreeding coefficients in Zel were for OAR 2 and 3. This may be caused by lower recombination, followed by increasing homozygosity and inbreeding coefficient.
The FROH in Baluchi for all autosomal chromosomes was higher than other breeds and the highest was in chr 2. The FROH for Baluchi, Lori-Bakhtiari and Zel is exhibited in red, green and blue lines respectively.
Conclusion
This study provides a comprehensive assessment of genetic structure, linkage disequilibrium (LD) and several other genetic diversity parameters, including gene diversity (He), Ne and genomic inbreeding coefficients in Iranian sheep breeds. The PCA and admixture analysis displayed a clear genetic differentiation of the breeds. Genome-wide study of LD in Iranian sheep breeds showed that the Baluchi breed has higher levels of LD and haplotype blocks than other breeds, which is agreed with the results of genetic diversity, Ne and inbreeding coefficient analysis in this breed. We found that the amount of LD was relatively small between the adjacent SNPs and it decreased rapidly by increasing the distance between the markers in Baluchi breed. Also, the persistence of the LD phase presented the highest compatibility between Zel and Lori-Bakhtiari, which is consistent with the existence of a common ancestor in the past. This results provide insights into the influence of selection within these breeds and provide useful knowledge that will contribute to design appropriate and successful genomic selection and conservation programs. Take note that this results can be also useful for constructing a multi-breed training population and SNP array designing, however, further investigation with high marker density and more animals, are required to confirm our results.
Acknowledgments
The authors gratefully acknowledge the Animal Breeding Center of Iran (ABCI) for access to the records and animals of the Iranian breeds. Thanks to the staff of the University of Tehran, Animal Science Research Institute of Iran and AgResearch, who helped and supported this research. The authors also acknowledge the financial contributions of Animal Science Research Institute of Iran, Mobarakandish Institute and AgResearch, New Zealand.
References
- 1. Zygoyiannis D. Sheep production in the world and in Greece. Small ruminant research. 2006;62(1–2):143–7.
- 2.
Valizadeh R, editor Iranian sheep and goat industry at a glance. stress management in small ruminant production and product processing; 2010.
- 3. Moradi M, Phua S, Hedayat N, Khodaei MM, Razmkabir M. Haplotype and genetic diversity of mtDNA in indigenous Iranian sheep and an insight into the history of sheep domestication. 2017.
- 4. Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME. Invited review: Genomic selection in dairy cattle: Progress and challenges. Journal of dairy science. 2009;92(2):433–43. pmid:19164653
- 5. Meuwissen TH, Hayes BJ, Goddard M. Prediction of total genetic value using genome-wide dense marker maps. genetics. 2001;157(4):1819–29. pmid:11290733
- 6. Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JA, Barris W, et al. Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC genomics. 2008;9(1):1–18. pmid:18435834
- 7. Mokry FB, Buzanskas ME, de Alvarenga Mudadu M, do Amaral Grossi D, Higa RH, Ventura RV, et al. Linkage disequilibrium and haplotype block structure in a composite beef cattle breed. BMC genomics. 2014;15(7):1–9.
- 8. Purfield DC, Berry DP, McParland S, Bradley DG. Runs of homozygosity and population history in cattle. BMC genetics. 2012;13(1):1–11. pmid:22888858
- 9. Machová K, Marina H, Arranz JJ, Pelayo R, Rychtářová J, Milerski M, et al. Genetic diversity of two native sheep breeds by genome-wide analysis of single nucleotide polymorphisms. animal. 2023;17(1):100690. pmid:36566708
- 10. Ren D, Cai X, Lin Q, Ye H, Teng J, Li J, et al. Impact of linkage disequilibrium heterogeneity along the genome on genomic prediction and heritability estimation. Genetics Selection Evolution. 2022;54(1):1–12. pmid:35761182
- 11. Moradi MH, Nejati-Javaremi A, Moradi-Shahrbabak M, Dodds KG, Brauning R, McEwan JC. Hitchhiking Mapping of Candidate Regions Associated with Fat Deposition in Iranian Thin and Fat Tail Sheep Breeds Suggests New Insights into Molecular Aspects of Fat Tail Selection. Animals. 2022;12(11):1423. pmid:35681887
- 12. Gholizadeh M, Rahimi-Mianji G, Nejati-Javaremi A, De Koning DJ, Jonas E. Genomewide association study to detect QTL for twinning rate in Baluchi sheep. Journal of genetics. 2014;93(2):489–93. pmid:25189245
- 13. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American journal of human genetics. 2007;81(3):559–75. pmid:17701901
- 14. Teo YY, Fry AE, Clark TG, Tai E, Seielstad M. On the usage of HWE for identifying genotyping errors. Annals of Human genetics. 2007;71(5):701–3. pmid:17388941
- 15. Moradi MH, Nejati-Javaremi A, Moradi-Shahrbabak M, Dodds KG, McEwan JC. Genomic scan of selective sweeps in thin and fat tail sheep breeds for identifying of candidate regions associated with fat deposition. BMC genetics. 2012;13(1):1–15. pmid:22364287
- 16. Al-Mamun HA, a Clark S, Kwan P, Gondro C. Genome-wide linkage disequilibrium and genetic diversity in five populations of Australian domestic sheep. Genetics Selection Evolution. 2015;47(1):1–14. pmid:26602211
- 17. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome research. 2009;19(9):1655–64. pmid:19648217
- 18.
Liu C-C, Shringarpure S, Lange K, Novembre J. Exploring population structure with admixture models and principal component analysis. Statistical population genomics: Humana, New York, NY; 2020. p. 67–86.
- 19. Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. The American Journal of Human Genetics. 2007;81(5):1084–97. pmid:17924348
- 20. Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome biology. 2021;22(1):1–24.
- 21. Hill W, Robertson A. Linkage disequilibrium in finite populations. Theoretical and applied genetics. 1968;38(6):226–31. pmid:24442307
- 22. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–5. pmid:15297300
- 23. Villa-Angulo R, Matukumalli LK, Gill CA, Choi J, Van Tassell CP, Grefenstette JJ. High-resolution haplotype block structure in the cattle genome. BMC genetics. 2009;10(1):1–13. pmid:19393054
- 24. Badke YM, Bates RO, Ernst CW, Schwab C, Steibel JP. Estimation of linkage disequilibrium in four US pig breeds. BMC genomics. 2012;13(1):1–10. pmid:22252454
- 25. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. science. 2002;296(5576):2225–9. pmid:12029063
- 26. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. The American Journal of Human Genetics. 2011;88(1):76–82. pmid:21167468
- 27. Lencz T, Lambert C, DeRosse P, Burdick KE, Morgan TV, Kane JM, et al. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proceedings of the National Academy of Sciences. 2007;104(50):19942–7. pmid:18077426
- 28. Barbato M, Orozco-terWengel P, Tapio M, Bruford MW. SNeP: a tool to estimate trends in recent effective population size trajectories using genome-wide SNP data. Frontiers in genetics. 2015;6:109. pmid:25852748
- 29. Hayes BJ, Visscher PM, McPartlan HC, Goddard ME. Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome research. 2003;13(4):635–43. pmid:12654718
- 30. Ohta T, Kimura M. Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population. Genetics. 1971;68(4):571. pmid:5120656
- 31. Sved J. Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theoretical population biology. 1971;2(2):125–41. pmid:5170716
- 32. Plomion C, Chancerel E, Endelman J, Lamy J-B, Mandrou E, Lesur I, et al. Genome-wide distribution of genetic diversity and linkage disequilibrium in a mass-selected population of maritime pine. BMC genomics. 2014;15(1):1–17.
- 33. Linck E, Battey C. Minor allele frequency thresholds dramatically affect population structure inference with genomic datasets. Biorxiv. 2018:188623.
- 34. Wray NR. Allele frequencies and the r2 measure of linkage disequilibrium: impact on design and interpretation of association studies. Twin Research and Human Genetics. 2005;8(2):87–94. pmid:15901470
- 35. O’Brien AMP, Mészáros G, Utsunomiya YT, Sonstegard TS, Garcia JF, Van Tassell CP, et al. Linkage disequilibrium levels in Bos indicus and Bos taurus cattle using medium and high density SNP chip data and different minor allele frequency distributions. Livestock Science. 2014;166:121–32.
- 36. McRae KM, McEwan JC, Dodds KG, Gemmell NJ. Signatures of selection in sheep bred for resistance or susceptibility to gastrointestinal nematodes. BMC genomics. 2014;15(1):1–13. pmid:25074012
- 37. Falconer DS, Mackay TF, Frankham R. Introduction to quantitative genetics (4th edn). Trends in Genetics. 1996;12(7):280.
- 38. Morris C, Vlassoff A, Bisset S, Baker R, Watson T, West C, et al. Continued selection of Romney sheep for resistance or susceptibility to nematode infection: estimates of direct and correlated responses. Animal Science. 2000;70(1):17–27.
- 39. García-Gámez E, Sahana G, Gutiérrez-Gil B, Arranz J-J. Linkage disequilibrium and inbreeding estimation in Spanish Churra sheep. BMC genetics. 2012;13(1):1–11. pmid:22691044
- 40. Liu S, He S, Chen L, Li W, Di J, Liu M. Estimates of linkage disequilibrium and effective population sizes in Chinese Merino (Xinjiang type) sheep by genome-wide SNPs. Genes & Genomics. 2017;39(7):733–45. pmid:28706593
- 41. Mateescu R, Thonney M. Genetic mapping of quantitative trait loci for aseasonal reproduction in sheep. Animal Genetics. 2010;41(5):454–9. pmid:20219065
- 42. Qanbari S. On the extent of linkage disequilibrium in the genome of farm animals. Frontiers in Genetics. 2020;10:1304. pmid:32010183
- 43. Deniskova T, Dotsev A, Lushihina E, Shakhin A, Kunz E, Medugorac I, et al. Population structure and genetic diversity of sheep breeds in the Kyrgyzstan. Frontiers in genetics. 2019;10:1311. pmid:31921318
- 44. Toro MA, Fernández J, Caballero A. Molecular characterization of breeds and its use in conservation. Livestock Science. 2009;120(3):174–95.
- 45. Eydivandi S, Sahana G, Momen M, Moradi M, Schönherz A. Genetic diversity in Iranian indigenous sheep vis‐à‐vis selected exogenous sheep breeds and wild mouflon. Animal Genetics. 2020;51(5):772–87. pmid:32729152
- 46. Dávila S, Gil M, Resino-Talaván P, Campo J. Evaluation of diversity between different Spanish chicken breeds, a tester line, and a White Leghorn population based on microsatellite markers. Poultry Science. 2009;88(12):2518–25. pmid:19903949
- 47. Ardlie KG, Kruglyak L, Seielstad M. Patterns of linkage disequilibrium in the human genome. Nature Reviews Genetics. 2002;3(4):299–309. pmid:11967554
- 48. Qanbari S, Pimentel E, Tetens J, Thaller G, Lichtner P, Sharifi A, et al. The pattern of linkage disequilibrium in German Holstein cattle. Animal genetics. 2010;41(4):346–56. pmid:20055813
- 49. Bohmanova J, Sargolzaei M, Schenkel FS. Characteristics of linkage disequilibrium in North American Holsteins. BMC genomics. 2010;11(1):1–11. pmid:20609259
- 50. McRae A, McEwan J, Dodds K, Wilson T, Crawford A, Slate J. Linkage disequilibrium in domestic sheep. Genetics. 2002;160(3):1113–22. pmid:11901127
- 51. Rafalski A, Morgante M. Corn and humans: recombination and linkage disequilibrium in two genomes of similar size. TRENDS in Genetics. 2004;20(2):103–11. pmid:14746992
- 52. Luikart G, England PR, Tallmon D, Jordan S, Taberlet P. The power and promise of population genomics: from genotyping to genome typing. Nature reviews genetics. 2003;4(12):981–94. pmid:14631358
- 53. Phillips M, Lawrence R, Sachidanandam R, Morris A, Balding D, Donaldson M, et al. Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. nature genetics. 2003;33(3):382–7. pmid:12590262
- 54. Wang J, Santiago E, Caballero A. Prediction and estimation of effective population size. Heredity. 2016;117(4):193–206. pmid:27353047
- 55. Kimura M, Ohta T. The average number of generations until fixation of a mutant gene in a finite population. Genetics. 1969;61(3):763. pmid:17248440
- 56. Sørensen AC, Sørensen MK, Berg P. Inbreeding in Danish dairy cattle breeds. Journal of dairy science. 2005;88(5):1865–72. pmid:15829680
- 57. Mastrangelo S, Portolano B, Di Gerlando R, Ciampolini R, Tolone M, Sardina M, et al. Genome-wide analysis in endangered populations: a case study in Barbaresca sheep. Animal. 2017;11(7):1107–16. pmid:28077191
- 58. Howard JT, Pryce JE, Baes C, Maltecca C. Invited review: Inbreeding in the genomics era: Inbreeding, inbreeding depression, and management of genomic variability. Journal of dairy science. 2017;100(8):6009–24. pmid:28601448
- 59. Santiago E, Caballero A. Effective size and polymorphism of linked neutral loci in populations under directional selection. Genetics. 1998;149(4):2105–17. pmid:9691062
- 60. Villanueva B, Pong-Wong R, Fernandez J, Toro M. Benefits from marker-assisted selection under an additive polygenic genetic model. Journal of animal science. 2005;83(8):1747–52. pmid:16024693
- 61. Ghoreishifar SM, Moradi-Shahrbabak H, Fallahi MH, Jalil Sarghale A, Moradi-Shahrbabak M, Abdollahi-Arpanahi R, et al. Genomic measures of inbreeding coefficients and genome-wide scan for runs of homozygosity islands in Iranian river buffalo, Bubalus bubalis. BMC genetics. 2020;21(1):1–12.
- 62. Peripolli E, Stafuzza NB, Munari DP, Lima ALF, Irgang R, Machado MA, et al. Assessment of runs of homozygosity islands and estimates of genomic inbreeding in Gyr (Bos indicus) dairy cattle. BMC genomics. 2018;19(1):1–13.
- 63. Zhang Q, Calus MP, Guldbrandtsen B, Lund MS, Sahana G. Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds. BMC genetics. 2015;16(1):1–11. pmid:26195126
- 64. Bjelland D, Weigel K, Vukasinovic N, Nkrumah J. Evaluation of inbreeding depression in Holstein cattle using whole-genome SNP markers and alternative measures of genomic inbreeding. Journal of dairy science. 2013;96(7):4697–706. pmid:23684028
- 65. Zanella R, Peixoto JO, Cardoso FF, Cardoso LL, Biegelmeyer P, Cantão ME, et al. Genetic diversity analysis of two commercial breeds of pigs using genomic and pedigree data. Genetics Selection Evolution. 2016;48(1):1–10. pmid:27029213
- 66. McQuillan R, Leutenegger A-L, Abdel-Rahman R, Franklin CS, Pericic M, Barac-Lauc L, et al. Runs of homozygosity in European populations. The American Journal of Human Genetics. 2008;83(3):359–72. pmid:18760389