Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Estimation of linkage disequilibrium levels and allele frequency distribution in crossbred Vrindavani cattle using 50K SNP data

  • Akansha Singh,

    Roles Data curation, Formal analysis, Investigation, Writing – original draft

    Affiliation Animal Genetics Division, ICAR–Indian Veterinary Research Institute, Bareilly, Uttar Pradesh, India

  • Amit Kumar ,

    Roles Conceptualization, Writing – original draft, Writing – review & editing

    vetamitchandan07@gmail.com

    Affiliation Animal Genetics Division, ICAR–Indian Veterinary Research Institute, Bareilly, Uttar Pradesh, India

  • Arnav Mehrotra,

    Roles Formal analysis

    Affiliation Animal Genetics Division, ICAR–Indian Veterinary Research Institute, Bareilly, Uttar Pradesh, India

  • Karthikeyan A.,

    Roles Methodology, Software

    Affiliation Animal Genetics Division, ICAR–Indian Veterinary Research Institute, Bareilly, Uttar Pradesh, India

  • Ashwni Kumar Pandey,

    Roles Data curation, Resources

    Affiliation Animal Genetics Division, ICAR–Indian Veterinary Research Institute, Bareilly, Uttar Pradesh, India

  • B. P. Mishra,

    Roles Supervision, Writing – review & editing

    Affiliation Animal Biotechnology Division, ICAR–Indian Veterinary Research Institute, Bareilly, Uttar Pradesh, India

  • Triveni Dutt

    Roles Project administration, Supervision, Visualization

    Affiliation Livestock Production and Management Division, ICAR–Indian Veterinary Research Institute, Bareilly, Uttar Pradesh, India

Abstract

The objective of this study was to calculate the extent and decay of linkage disequilibrium (LD) in 96 crossbred Vrindavani cattle genotyped with Bovine SNP50K Bead Chip. After filtering, 43,821 SNPs were retained for final analysis, across 2500.3 Mb of autosome. A significant percentage of SNPs was having minor allele frequency of less than 0.20. The extent of LD between autosomal SNPs up to 10 Mb apart across the genome was measured using r2 statistic. The mean r2 value was 0.43, if pairwise distance of marker was less than10 kb and it decreased further to 0.21 for 25–50 kb markers distance. Further, the effect of minor allele frequency and sample size on LD estimate was investigated. The LD value decreased with the increase in inter-marker distance, and increased with the increase of minor allelic frequency. The estimated inbreeding coefficient and effective population size were 0.04, and 46 for present generation, which indicated small and unstable population of Vrindavani cattle. These findings suggested that a denser or breed specific SNP panel would be required to cover all genome of Vrindavani cattle for genome wide association studies (GWAS).

Introduction

Linkage disequilibrium (LD), defined as the degree of non-random association of allele between loci or correlation between genotypes of markers, is an important concept in the understanding of gene mapping and its application in genetic studies. The LD between markers provides insight in exploring the level of diversity between different breeds, inferring the frequency of recombination events, investigating the change in effective population size across generations, and identifying genomic regions to improve economically important traits [13]. The pattern of LD is mainly determined by the physical distance between markers, although several other demographic and evolutionary factors including population stratification, inbreeding, effective population size, genetic bottleneck, genetic drift, migration, mutation, selection, and recombination rate may also influence the extent and pattern of LD [46].

The declining cost of high throughput genotyping provides a new opportunity to explore genome-scale studies, especially Genome Wide Association Studies (GWAS) and Genomic Selection (GS). The accuracy and precision of the genomic studies rely on the level and pattern of linkage disequilibrium (LD) between markers across genome [7]. The LD between markers has been studied in the genome of several taurine and indicine cattle breeds [2, 8]. Results from their study revealed that moderate LD (r2 = 0.20) were extended to <100 kb, which indicated that 50K SNPs captures most of the LD information necessary for GWAS in taurine breeds [1, 2]. However, the extent for of LD (r2 = 0.20 to 0.34) was observed lower (<30kb) within indicine cattle, which indicated use of High density chip for genomic studies in these cattle [8]. The reduced extent of LD detected in indicine cattle breeds can be attributed to the ascertainment bias of the SNP Chips.

Vrindavani is four breed synthetic crossbred cattle strain developed in India by crossing between Bos taurus cattle (Holstein, Brown Swiss, Jersey) and Bos indicus cattle (Hariana) [9]. Recently, the ROH fragments, haplotype blocks, inbreeding coefficient and effective population size in Vrindavani cattle has been evaluated [10]. Further, the admixture pattern and signatures of selection in Vrindavani was identified using 50K SNP chip [9]. However, the extent and pattern of genome-wide LD using 50K SNP Chip in Vrindavani cattle remains unexplored till date. This study was indented to build on the previous work of genetic characterization [9], and gives insight to estimate marker density for genomic studies in Vrindavani cattle. The objective of present study was to investigate allelic frequency distribution, extent of linkage disequilibrium level (r2) and estimate the effective population size in crossbred Vrindavani cattle using a 50K Bovine chip.

Material and methods

Animal sample and genotype quality control

A total of 96 female Vrindavani crossbred cattle were selected from ICAR-Indian Veterinary Research Institute (IVRI), Izatnagar, Bareilly (Uttar Pradesh), India. The experiment was approved by the Institutional Animal Ethics Committee (IAEC) of ICAR-Indian Veterinary Research Institute (IVRI). DNA was extracted from blood samples and genotypic data were generated using the Bovine 50K SNP Bead Chip v2 (Agri Genome Labs Pvt. Ltd., India).

The HiScan images and genotypes were analysed using Genome Studio software (Illumina). The genotyping rate was 0.99, and a total of 53,218 SNPs were identified across the genome at a mean distance of 37.4 kb between markers. Quality control (QC) was performed using the PLINK v1.9 software [11]. The SNPs with call rate of less than 95%, MAF of less than 0.02 and HWE less than 10E x 10−05 were removed. Only SNPs that were uniquely mapped to autosomes were included in the analysis. A total of 43,821 markers were retained for further analysis. The length of chromosome (Mb), number of markers on individual autosome and the mean distance between markers were estimated using R software [12].

Minor allele frequency

The MAF for all autosomal SNPs was calculated using PLINK v1.07 as default settings using command—file data—freq [11]. The distribution of allele frequencies across different chromosome was analysed, and proportion of SNPs in different frequency category (0.02–0.10, 0.10–0.20, 0.20–0.30, 0.30–0.40, and 0.40–0.50) were plotted using R software [12].

Inbreeding coefficient and effective population size

Inbreeding coefficient (F) was calculated as a function of the expected and observed homozygote difference using PLINK v1.07 [11]. The equation is given as follow: where Fi is the estimated inbreeding coefficient of the iih animal; Oi is the number of observed homozygous loci in the iih animal, Ei is the number of expected homozygous loci and Li is the number of genotyped autosomal loci.

The Effective population size (Ne) was estimated using the SNeP tool [13], based on the relationship between N e, linkage disequilibrium represented by r2, and recombination rate (c). This is given by: where, NT is the effective population size t generations ago, ct is the recombination rate, r2adj is the LD value adjusted for sample size, and α is he correction for occurrence of mutation.

Linkage disequilibrium

The square of the correlation coefficient between two loci (r2) was used to evaluate the extent of LD measure, as r2 was considered is considered as more robust and not influenced by change in allele frequency and population size [14]. The r2 estimate is an important estimator to measure loci required for genome wide association study and QTL mapping [8]. The equation for LD (r2) estimate is represented as follows: where, PA1, PA2, PB1 and PB2 are frequencies of each allele at loci A and B, and P11, P12, P21 and P22 are the frequencies of haplotype A1B1, A1B2, A2B1, A2B2, respectively.

The r2 between SNP pairs with physical distances between 0 to 10 Mb for all autosome was used to estimate the extent and pattern of LD using the default command of PLINK v1.07 using command–ld-snp-list mysnplist–ld-window-kb 10000 –ld-window 99999 –ld-window r2 0 [11]. The decay of LD value, were then calculated for distances of pair-wise SNPs were interval into eight categories (0–10 kb, 10–25 kb, 25–50 kb, 50–100 kb, 100–500 kb, 0.5–1 Mb, 1–5 Mb and 5–10 Mb) along the first 10 Mb of each autosome. LD was also estimated as the mean r2 value for each autosomal chromosome according to these intervals and plotted against distance range.

Further the effect of minimum allelic frequency (MAF) and sample size on the extent of LD was investigated. The LD was calculated as previously described with four different MAF thresholds (0.05, 0.10, 0.15 and 0.2) for physical distance of 10 Mb of genome. In order to determine the effect of sample size on LD(r2), five different subset of population (N = 10, 25, 50, 75 and, 90) were randomly selected from the total population and the extent of LD was explored for each sub-population. To visualize the effects of MAF and sample size on the genome-wide LD (r2) levels was also plotted accordingly.

Results

Marker statistics

After quality control, a total 43,821 autosomal SNPs were available for downstream analysis. About 6734 SNPs having MAF less than 0.02 were removed, and 723 SNPs were removed on the basis Hardy-Weinberg equilibrium and call rate threshold criteria. The SNPs retained after quality control spanned across total of 2500.30 Mb of region of Vrindavani genome, with a mean chromosome length of 86.21Mb. The BTA1(158.03Mb) was found to be longest, while the BTA25 (42.80Mb) was shortest across the genome. The distribution of SNPs was proportional to length of chromosome; the highest number of SNPs were on BTA1 (2798), and the lowest on BTA25 (792). The average distance between SNPs was 57.24 kb, where the longest distance between SNPs was 3.26 Mb on BTA10 and shortest distance was 0.01 kb on BTA15. The descriptive statistical results of the SNP marker for each autosome are shown in Table 1.

thumbnail
Table 1. Summary of the SNP markers analyzed and minor allele frequency for each autosomal chromosome (BTA).

https://doi.org/10.1371/journal.pone.0259572.t001

Minor allele frequency

The mean minor allele frequency (MAF) across all autosomes was 0.26 (Table 1). After filtering, 36% of the SNPs (15,805) the MAF was lower than 0.20. The proportion of SNPs on each chromosome at different MAF threshold is presented in Fig 1. While the MAF was lower than 0.10 (0.2–0.10), the chromosome BTA 4, BTA16, BTA21 and BTA24 had a higher SNP proportion, whereas BTA6, BTA7 and BTA27 presented a lower proportion of SNPs.

thumbnail
Fig 1. Mean proportion of SNPs for various minor allele frequencies (MAF) calculated for each autosomal chromosome.

https://doi.org/10.1371/journal.pone.0259572.g001

Effective population size and inbreeding coefficient

The effective population size was estimated over the past 50 generations from the average r2 (Table 2). The estimate of Ne decreased from 245 (50 generations ago) to 46 (5 generations ago), which reflect a declining trend in Ne for Vrindavani. The estimate of mean inbreeding coefficient (F) was 0.04.

thumbnail
Table 2. Effective population size (Ne) over generations based on linkage disequilibrium calculations.

https://doi.org/10.1371/journal.pone.0259572.t002

Extent of LD across the genome

The level of LD among SNPs was estimated by r2 method. The free recombination generally occurs at a physical distance of more than 10 Mb [15] thus, the range between SNPs was set at 0–10 Mb to estimate LD between SNP markers. In order to consider all possible pairs of SNPs with a distance of less than or equal to 10 Mb, combination pairs of 7,378,918 SNPs were obtained across autosomes to estimate LD. The mean values of r2 of markers for distance of 10 Mb was 0.07. The mean LD (r2) for physical distance intervals is presented in Table 3. The autosomes BTA16, BTA24, and BTA 20 had higher level of r2. For the physical distance of less than 10 kb, the average r2 was 0.43 and it decreased to 0.21 for the distances of 25 to 50 kb. Results revealed the decline in mean r2 value with the increase in physical distance between markers.

thumbnail
Table 3. Statistical summary of linkage disequilibrium (r2) over various distances.

https://doi.org/10.1371/journal.pone.0259572.t003

The extent of LD (r2) was found different across each autosomes according to the physical distance. The mean r2 was estimated for markers physical distance separated by intervals of 0–10 kb, 10–25 kb, 50-100kb, 100-500kb, 0.5-1Mb, 1-5Mb, and 5–10 Mb per autosome (Table 4). The mean r2 revealed smaller reduction across different autosomes, when the physical distance between markers exceeded 100 kb (Fig 2).

thumbnail
Fig 2. The mean r2 as distance between pairs of SNP up to 10 Mb for the genome across autosomes.

https://doi.org/10.1371/journal.pone.0259572.g002

thumbnail
Table 4. Statistical information for mean r2 as physical distance between SNP pairs up to 10 Mb for the genome.

https://doi.org/10.1371/journal.pone.0259572.t004

MAF and LD estimates

The effect of MAF on LD extent was estimated using four different threshold level of MAF: >0.05, >0.10, >0.15 and 0.20 for physical distances up to 100kb between the SNP pairs (Fig 3). Results showed significant effect of MAF threshold on the average r2, particularly for less distances. The r2 between SNPs was smaller when the threshold of MAF was low (0.05), and increases significantly at higher MAF threshold. The value of mean r2 ranged from 0.45 to 0.06 for MAF > 0.05, 0.49 to 0.07 for MAF> 0.10, 0.53 to 0.07 for MAF> 0.15, and 0.57 to 0.08 for MAF> 0.20, respectively.

thumbnail
Fig 3. Mean r2 estimates at different physical distances for different minor allelic frequency (MAF) thresholds.

https://doi.org/10.1371/journal.pone.0259572.g003

Sample size and LD estimates

In this investigation, sample size (n) of 10, 25, 50, 75, and 90 were selected at random from the total population to assess the effect of sample size on LD (r2) extent (Fig 4). The average r2 was greater for smaller sample size, and this trend was more evident when physical distance intervals were large (more than 50kb). These findings indicated that the size of samples should be at least 50 animals for precise estimation of r2.

thumbnail
Fig 4. Mean r2 estimates at different physical distances for different sample size.

https://doi.org/10.1371/journal.pone.0259572.g004

Discussion

After quality control, a total of 43,821 autosomal SNPs retained, which were distributed across 2500.30 Mb region of Vrindavani genome. The average MAF value was 0.26, which is comparable with the value reported for different cattle breeds [8]. The distribution of MAF directly influenced the LD, as low MAF could correspond to greater difference in the frequency of allelic pairs, resulted in underestimation of LD [16]. Therefore, four different level of MAF were selected to estimate the effect of MAF on LD extent. The result showed that the average LD (r2) between SNPs was high, when the threshold of MAF was high (>0.20), particularly at smaller distance. Similar effect of allele frequency on r2 estimate was also observed in other cattle breeds [17, 18], suggesting that the 50K chip can be used for genetic studies in crossbred Vrindavani cattle.

The LD between markers was estimated using r2 method as it is less influenced by low allele frequency [5] and small sample size [15]. At inter-marker distance of 10kb, the mean r2 value was 0.43 which was lower as compared to the LD estimates previously reported for taurine breed, Angus (0.46) and Hereford (0.49), but higher than the value reported for indicine breeds of Brahman (0.25), Nellore (0.27) cattle [2, 3]. The mean LD between adjacent value was estimated for different intervals across autosomes. The mean r2 for was highest for BTA16 and BTA24 and lowest BTA18 and BTA15. However, no relationship was observed between chromosomal size and r2 estimate [19]. To explore the dependency of sample size on the extent of LD (r2), different sample size was used to calculate r2. Khatkar et al. [18] has reported that small sample size leads to overestimation of LD. In the current study, a minimal sample size of 55 animal had no influence on r2, which is consistent with the previous study [19].

To analyse the LD decay, the physical distance between markers were classified in different interval. The results showed rapid decay of r2 at distance more than 100kb for all autosomes. Moreover, the LD estimate decreased from 0.43 to 0.21 at 10kb and 50kb marker distance, respectively. Similar level of LD estimates at 40-60kb have been reported for taurine cattle breeds [18, 20]. However, low level of LD estimate had previously been reported in indicine cattle, which may be attributed to ascertainment bias of SNPs in the current chip [2].

The reliability and accuracy of genomic association studies and genomic prediction is dependent on the amount of LD between marker. Mckay et al. [1] reported that a total of 28,700 (2.87GB/100kb at r2 = 0.2, where 2.87GB is the bovine genome size) SNPs were required to capture LD information for genomic studies. In our study, the LD (r2) estimate was 0.21 at distance of 50 kb between SNPs. This indicates that a minimum of 57,400 (2.87 GB/ 50kb at r2 = 0.2) SNPs is required for genomic association studies in Vrindavani cattle. Previous studies have also indicated the requirement of large number of SNPs to cover the genome for genomic studies when analysing data from indicine, crossbred and multi-breed populations [2, 8]. Our data suggest that a higher density SNP array were required for reliable genome-wide association mapping and genomic prediction in crossbred Vrindavani population.

For better understanding of the population structure, the inbreeding coefficient and the effective population size were estimated. The inbreeding coefficient was 0.04, which was comparable to previously reported 3% and 6% inbreeding coefficient in Vrindavani population using ROH and FHOM analysis [10]. The current study observed decrease in effective population size from 5 generations ago to 50 generations ago. Further, the estimate of Ne was 40 in Vrindavani, and a decreasing trend of Ne was reported for over five generations. The FAO has recommended Ne should be at least 50 for maintaining proper breeding plan. In the current study, 5 generations ago the estimated Ne (46), was below the recommended number (50) of FAO. The low Ne estimate may be attributed to the small well protected population size and fluctuating breed ratio in the admixed Vrindavani population [10].

Conclusion

This study reported the magnitude of LD between markers crossbred Vrindavani cattle using 50k Bovine Chip. The estimated r2> 0.2 extended up to 50 kb indicates requirement of high density SNP panel for precise and accurate estimation of whole genome association studies in Vrindavani cattle. Furthermore, declining trend of Ne estimates was observed in Vrindavani population indicates the requirement of breeding plan that could maintain the sufficiently large Ne. However, further confirmatory investigating for the extent of LD and effective population is required in larger population using high density array of SNPs.

References

  1. 1. McKay SD, Schnabel RD, Murdoch BM, Matukumalli LK, Aerts J, Coppieters W, et al. Whole genome linkage disequilibrium maps in cattle. BMC genetics. 2004; 8: 74. pmid:17961247
  2. 2. Espigolan R, Baldi F, Boligon AA, Souza FR, Gordo DG, Tonussi RL, et al. (2013) Study of whole genome linkage disequilibrium in Nellore cattle. BMC genomics 14: 305. pmid:23642139
  3. 3. Porto-Neto LR, Kijas JW, Reverter A (2014) The extent of linkage disequilibrium in beef cattle breeds using high-density SNP genotypes. Genetics Selection Evolution 46: 22. pmid:24661366
  4. 4. Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, et al. (2001) Linkage disequilibrium in the human genome. Nature 411: 199–204. pmid:11346797
  5. 5. Ardlie KG, Kruglyak L, Seielstad M (2002) Patterns of linkage disequilibrium in the human genome. Nature Reviews Genetics 3: 299–309. pmid:11967554
  6. 6. Karimi K, Koshkoiyeh AE, Gondro C (2015) Comparison of linkage disequilibrium levels in Iranian indigenous cattle using whole genome SNPs data. Journal of animal science and technology 57: 47. pmid:26705480
  7. 7. Goddard ME, Hayes BJ (2012) Genome-wide association studies and linkage disequilibrium in cattle. Bovine genomics Oxford, UK: Wiley-Blackwell 1: 192–210.
  8. 8. Makina SO, Taylor JF, van Marle-Köster E, Muchadeyi FC, Makgahlela ML, MacNeil MD, et al. (2015) Extent of linkage disequilibrium and effective population size in four South African Sanga cattle breeds. Frontiers in genetics 6: 337. pmid:26648975
  9. 9. Singh A, Mehrotra A, Gondro C, da Silva Romero AR, Pandey AK, Karthikeyan A, et al. (2020) Signatures of Selection in Composite Vrindavani Cattle of India. Frontiers in genetics. 11: 589496. pmid:33391343
  10. 10. Chhotaray S, Panigrahi M, Pal D, Ahmad SF, Bhanuprakash V, Kumar H, et al. (2019) Genome-wide estimation of inbreeding coefficient, effective population size and haplotype blocks in Vrindavani crossbred cattle strain of India. Biological Rhythm Research 5:1–4.
  11. 11. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. The American journal of human genetics. 81: 559–575. pmid:17701901
  12. 12. Computing R (2013) R: A language and environment for statistical computing. Vienna: R Core Team.
  13. 13. Barbato M, Orozco-terWengel P, Tapio M, Bruford MW (2015) SNeP: a tool to estimate trends in recent effective population size trajectories using genome-wide SNP data. Frontiers in Genetics. 6: 109. pmid:25852748
  14. 14. Zhao H, Nettleton D, Soller M, Dekkers JC (2005) Evaluation of linkage disequilibrium measures between multi-allelic markers as predictors of linkage disequilibrium between markers and QTL. Genetics Research. 86: 77–87. pmid:16181525
  15. 15. Zhao F, Wang G, Zeng T, Wei C, Zhang L, Wang H, et al. (2014) Estimations of genomic linkage disequilibrium and effective population sizes in three sheep populations. Livestock Science 170: 22–29.
  16. 16. Wray NR, (2005) Allele frequencies and the r2 measure of linkage disequilibrium: impact on design and interpretation of association studies. Twin Research and Human Genetics. 8: 87–94. pmid:15901470
  17. 17. Sargolzaei M, Schenkel FS, Jansen GB, Schaeffer LR (2008) Extent of linkage disequilibrium in Holstein cattle in North America. Journal of Dairy Science. 91:2106–17. pmid:18420642
  18. 18. Khatkar MS, Nicholas FW, Collins AR, Zenger KR, Cavanagh JA, Barris W, et al. (2008) Extent of genome-wide linkage disequilibrium in Australian Holstein-Friesian cattle based on a high-density SNP panel. BMC genomics 9: 187. pmid:18435834
  19. 19. Bohmanova J, Sargolzaei M, Schenkel FS, (2010) Characteristics of linkage disequilibrium in North American Holsteins. BMC genomics 11: 421. pmid:20609259
  20. 20. Lu D, Sargolzaei M, Kelly M, Li C, Vander Voort G, Wang Z, et al. (2012) Linkage disequilibrium in Angus, Charolais, and Crossbred beef cattle. Frontiers in Genetics 3:152. pmid:22912646