Figures
Abstract
Copy number variations (CNVs) are increasingly understood to affect phenotypic variation. This study uses SNP genotyping of trios of mixed breed swine to add to the catalog of known genotypic variation in an important agricultural animal. PorcineSNP60 BeadChip genotypes were collected from 1802 pigs that combined to form 1621 trios. These trios were from the crosses of 50 boars with 525 sows producing 1621 piglets. The pigs were part of a population that was a mix of ¼ Duroc, ½ Landrace and ¼ Yorkshire breeds. Merging the overlapping CNVs that were observed in two or more individuals to form CNV regions (CNVRs) yielded 502 CNVRs across the autosomes. The CNVRs intersected genes, as defined by RefSeq, 84% of the time – 420 out of 502. The results of this study are compared and contrasted to other swine studies using similar and different methods of detecting CNVR. While progress is being made in this field, more work needs to be done to improve consistency and confidence in CNVR results.
Citation: Wiedmann RT, Nonneman DJ, Rohrer GA (2015) Genome-Wide Copy Number Variations Using SNP Genotyping in a Mixed Breed Swine Population. PLoS ONE 10(7): e0133529. https://doi.org/10.1371/journal.pone.0133529
Editor: Shuhong Zhao, Huazhong Agricultural University, CHINA
Received: January 6, 2015; Accepted: June 27, 2015; Published: July 14, 2015
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was funded by Current Research Information System #5438-31000-083-00D of the Agricultural Research Service, a division of the US Department of Agriculture. The funder did not contribute to the study design, data collation and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Copy number variation (CNV) refers to segments of DNA typically larger than 1 kb that exist as variable numbers of copies among members of a species. CNV are a form of genetic variation distinct from the more commonly studied single nucleotide polymorphisms (SNP) and CNV have been shown to affect a larger number of nucleotides than SNPs [1]. Many studies have identified CNV in humans [2–4], other model organisms [5,6] and agricultural animals (reviewed in Clop [7]), including pigs [8–21] – the focus of this study. CNVs can affect gene dosage and disrupt normal gene regulation, leading to complex disease traits in humans (reviewed by Stankiewicz and Lupski [22]). In studies in humans, some of the missing heritability of SNP-based GWAS studies of complex traits has been assigned to CNVs [23,24]. The most commonly discussed example of CNV affecting pigs is the white coat phenotype caused by copy number variation of the KIT gene [25,26].
CNVs are typically detected using either array comparative genomic hybridization (aCGH) or an SNP genotyping array, although high-throughput sequencing is increasingly being used (reviewed by Kaplan et al. [27]). The main advantage of aCGH is higher signal to noise ratio. However, SNP genotyping chips use less DNA, are less expensive and provide genotyping of the population of animals so that SNP and CNV contributions to the heritability can be simultaneously determined. High-throughput sequencing, given sufficient investment, has superior resolution across the genome, but requires greater computational resources.
Recently published results for detection of CNVs in pigs cover all three methods of detection: aCGH [8, 9, 20], SNP array both with [11,12] and without [13–15, 21] pedigree information, and high-throughput sequencing [16–18]. One study used the SNP array method on 217 highly inbred Iberian pigs and then used high-throughput sequencing on four of those pigs for validation [19]. Most of the pigs studied were either pure or half Chinese breeds, in contrast to the present study which utilizes composite pigs from Landrace, Duroc and Yorkshire lines. Thus, current results may be more relevant to the commercial swine industry. This study uses the Illumina PorcineSNP60 BeadChip (Illumina, San Diego, CA) coupled with the PennCNV algorithm [28]. PennCNV was chosen for this study in part due to its success when compared to competing algorithms [29] and due to its ability to effectively integrate pedigree relationships of boar-sow-offspring trios.
Results
Every pig had at least one CNV called, the average was 19.9 and the median was 14 CNV called per animal. CNV regions (CNVRs) were determined for the population by merging CNV that overlapped between animals. Including singletons, the full set of 949 CNVR covered 28.8% of the genome. Filtering out the singleton CNV reduced the results to 502 CNVR that cover 19.1% of the genome. The latter number is more consistent with other studies and requiring more than one observation also should eliminate any non-germline CNV as well as many false positives. S1 Table lists the 502 chromosomal positions for each of the CNVR along with their lengths and the number of pigs that contributed to each CNVR. The median number of pigs per CNVR was 8 with a range from 2 to 1129. The lengths of the CNVR ranged from 933 to 31,727,386 bp with a median value of 147,171 bp. The total length of all 502 CNVR is 495.29 Mb.
Table 1 shows the coverage of each chromosome by CNVR, from the low of 3% in chromosome 7 to the high of 61% in chromosome 11. It also lists the total number of CNVR, their average length and the number that intersects known genes as reported by RefSeq [30]. Chromosome 8 exhibited the lowest percentage of CNVR that overlapped genes at 70%, while chromosome 12 had the highest rate of gene overlap at 100%. On an absolute basis, Chromosome 13 had the most CNVR with 63 and the most CNVR that overlapped known genes with 52, slightly ahead of chromosome 1 with 59 and 44, respectively. The total number of RefSeq genes that intersect the CNVRs in this study is 5422, with 1418 being characterized well enough to be assigned gene symbols.
Discussion
CNVR have been detected in many species and clearly are important components contributing to the missing heritability of complex traits. This study employed the use of a SNP genotyping beadchip containing 49,208 usable elements spread throughout the genome. Unfortunately, the broad and uneven spacing severely limits the accuracy of predicting end positions of the CNVR, while minimizing false-positives by filtering results to regions spanning three consecutive SNP prevents the identification of many small sized CNVR. Selection of predominantly single locus SNP to include on BeadChips limits the use of this technology to discover CNVR that have copy numbers greater than two. In addition to these technological limits, prior studies in cattle and swine have shown great variation between breeds in CNVR content and a sizable increase in CNVR detection rate for crossbred animals [11, 31].
This study uses a mixed breed population with SNP array detection and pedigree information to produce its results. The most similar published studies are those of Wang et al. [15], whose population consisted of 585 pigs that were a cross of Large White and Minzhu and Chen et al. [12] who tested 752 pigs that were an F2 cross of White Duroc and Erhualian. In the same study, Chen et al also reported results for 941 additional pigs covering 17 other populations. In an attempt to find the most robust CNVR that could be used for future investigations, the intersection of CNVR among this study and those of Wang et al. [15] and Chen et al. [12] was determined (Fig 1). Of the 502 CNVR reported in the present study, 237 (47%) overlapped at least one CNVR in the previous studies. There were 48 CNVR (9.6%), some very large, common to both Wang et al. [15] and Chen et al. [12] that overlapped a total of 77 CNVR reported in the present study. The intersection of all three sets of CNVR resulted in 77 regions spanning 12.51 Mb as listed in Table 2. Included in Table 2 is a list of 52 RefSeq genes with a defined gene symbol that intersect the CNVRs.
Comparison of CNVR discovered with the Illumina SNP60 BeadChip in the current study (USMARC_2015, black) with the results of Chen et al. [12] (Chen_2012, green) and Wang et al. [15] (Wang_2012, blue). In addition, the results of Li et al. [9], which used CGH arrays (Li_2012, red), are also displayed. Diagram was generated using PhenoGram (http://visualization.ritchielab.psu.edu/phenograms/document).
Different statistical methods to discover CNVR from SNP BeadChip data are available and each method produces a unique set of CNVR. Winchester et al. [29] conducted an objective evaluation of different methods using human HapMap data and concluded that the statistical method used should be one developed for the type of data to be analyzed. In addition, they indicated that inclusion of pedigree information in the analyses reduces the number of false-positives. Similarly, Wang et al. [15] analyzed their data with four different software programs and they found that PennCNV yielded the most CNVR that were discovered with at least one of the other programs. As PennCNV is the only software program that incorporates pedigree information with Illumina SNP data, it has been used in all studies with pigs when genotypic data was collected on both parents as well as progeny (trios).
High-throughput sequencing, due to its kilobase resolution, is able to discover the more abundant smaller CNVR. Over 80% of the CNVR discovered by Jiang and coworkers were smaller than the average interval between adjacent SNP on the BeadChip (50 kb) and more than half of the CNVR discovered were between 10 and 20 kb[18]. In the study of Fernández et al. in which sequencing was used on four of the pigs with SNP genotyping data available, they were able to confirm only 16 of 65 BeadChip CNVRs with overlapping high-throughput analysis [19]. To illustrate the differences between BeadChip CNVR and sequencing CNVR, from Table 2 of Fernández et al. [19], CNVR 32 on chromosome 10 is 268 Kb long by BeadChip analysis and is overlapped by 51 smaller CNV found through sequencing. The large spacing of SNP in the Illumina PorcineSNP60 BeadChip and filtering single SNP CNVR creates low resolution CNVR that may be an aggregate of multiple smaller CNVR. The low confirmation rate of BeadChip CNVRs is not due to low resolution, but may be a technical issue related to the design and chemistry of this system. Therefore, stringent criteria need to be applied to limit the number of false-positives reported. Inclusion of pedigree information of genotyped trios and the use of PennCNV reduces the number of false positives. Each study likely finds only a fraction of the CNVR in its population. Poor overlap between swine studies may be due to a high rate of undetected CNVR within each population as well as the dramatically different breeds used in each of the studies.
The high-throughput study of Rubin et al. reported 1928 CNVR in a population of 117 European pigs and wild boars [16]. These CNVR were found to overlap, or nearly overlap, 557 known genes. Of those, only five are in common with the genes listed in Table 2, further indicating an unfortunate lack of consensus between studies. Only 72 genes from Rubin et al. [16] were in common with the 1418 known genes that intersect CNVR observed in the present study Although several studies have successfully reported CNVR in a wide range of swine breeds, insufficient progress has been made in determining the phenotypic effects, and in particular, economically significant effects of these genetic variations. Rubin et al. found few CNVR within regions where signatures of selection were documented [16]. However, their study was based on a comparison between improved and unselected breeds. Two experiments were able to detect significant associations between CNVR and estimated breeding values for boars. Fowler et al. [32] conducted a GWAS for back fat thickness genotyping boars with extremely different breeding values. Along with the GWAS, they also used two different analyses to identify CNVR. Fowler et al. [32] reported 12 different CNVR along with 32 SNP associated with back fat thickness. Revay et al. [33] genotyped boars with extremely high and extremely low breeding values for a fertility trait (direct boar effect on litter size) and reported 35 CNVR detected and seven of these CNVR remained significantly associated with fertility upon testing them in a validation set of animals. However, more detailed studies are required to identify CNVR that affect phenotypic variation within populations.
Failure to identify similar CNVR across studies is concerning. While refinement in experimental protocols is needed, the problem is amplified by variability between breeds and between detection methods. The experiment by Revay et al. [33] utilized purebred boars from the same breeds used to develop the composite population for the current study and 40% of their CNVR associated with fertility were identified in this study. Two of the lines studied for back fat thickness by Fowler et al. [32] were similar to germplasm in this study and 50% of the CNVR associated with back fat thickness were identified in this study. While the primary objective of these two reports was to detect associations with performance, they are the only two studies that used comparable commercially relevant germplasm. More work needs to be done to improve detection techniques for high-throughput testing of animals; thus, facilitating detection of significant CNVR effects on economically important traits.
Materials and Methods
The experimental procedures were approved and performed in accordance with the U.S. Meat Animal Research Center’s (USMARC) Animal Care and Use committee and the Guide for Care and Use of Agricultural Animals in Research and Teaching (FASS, 2010).
Animals
A composite swine population was developed at the USMARC starting in 2001 by crossing mixed Landrace-Yorkshire sows with one of 24 founding boars – 12 Landrace and 12 Duroc. The second generation was produced by mating Landrace-sired animals to Duroc-sired animals. Subsequent generations were created by choosing one male and ten females produced by each founding boar then randomly mating them while avoiding full-sib and half-sib pairings [34]. This study uses trios from crosses of 50 boars with 525 sows producing 1621 piglets, all born in the years 2005–2010. The piglets were members of the 5th through 8th filial generations of this closed composite population. Animals in this population were managed under typical commercial standards and either sold or slaughtered at the USMARC abattoir using conventional humane stunning methods followed by exsanguination.
DNA Isolation, SNP Array Genotyping, and Quality Control
Genomic DNA was extracted from the frozen tail sections clipped at 1 day of age of each pig using the Wizard SV Genomic DNA Purification kit (Promega, Madison, WI). The DNA samples were genotyped with the Illumina PorcineSNP60 BeadChip (Illumina, San Diego, CA) [35]. Genotype reactions were completed at the USMARC (Clay Center, NE) and the chips were then scanned at the USDA-ARS Bovine Functional Genomics Laboratory (Beltsville, MD). The scan results were interpreted at the USMARC using Illumina’s BeadStudio Genotyping software.
The SNP with call rates <80% or minor allele frequencies < 0.05 were excluded from the data set, as were SNP that did not map or mapped to multiple positions in the Sus scrofa genome assembly 10.2. A final set of 49,208 SNP were used for further analysis.
Identification of Pig CNVs
Pig CNVs in this study were identified using PennCNV software [28]. PennCNV primarily utilizes the Log R Ratio (LRR) and the B Allele Frequency (BAF) output by BeadStudio, and the population frequency of B allele (PFB) calculated from the genotyping results. To improve the accuracy of the calls, PennCNV was provided a gcmodel file generated by calculating the gc content for the nearest 1 Mb of sequence around each SNP. A minimum of three consecutive SNP was required to call a CNV. PennCNV also utilizes pedigree information to significantly improve the accuracy of CNV calls. This study exclusively used pig samples with full trio information. To further improve the reliability of the results, all CNVs that were called only once in the population were discarded. CNV regions (CNVRs) were created by merging overlapping CNVs.
Mention of trade names or commercial products is solely for the purpose of providing information and does not imply recommendation, endorsement or exclusion of other suitable products by the U.S. Department of Agriculture.
Supporting Information
S1 Table. Information on all CNVR regions discovered.
Chromosome position, length, and number of pigs contributing to each of the 502 CNVR identified in the present study.
https://doi.org/10.1371/journal.pone.0133529.s001
(XLSX)
Acknowledgments
The authors thank Kris Simmerman (USMARC) for technical assistance, Linda Parnell (USMARC) for manuscript preparation and Tad Sonstegard and Steve Schroeder of the USDA, ARS, Animal Genomics and Improvement Laboratory for scanning the beadchips. USDA is an equal opportunity provider and employer.
Author Contributions
Conceived and designed the experiments: RTW. Performed the experiments: RTW DJN GAR. Analyzed the data: RTW. Contributed reagents/materials/analysis tools: RTW DJN GAR. Wrote the paper: RTW DJN GAR.
References
- 1. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, et al. (2010) Origins and functional impact of copy number variation in the human genome. Nature 464: 704–712. pmid:19812545
- 2. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, et al. (2004) Detection of large-sale variation in the human genome. Nat Genet 36: 949–951. pmid:15286789
- 3. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. (2004) Large-scale copy number polymorphism in the human genome. Science 305: 525–528. pmid:15273396
- 4. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. (2006) Global variation in copy number in the human genome. Nature 444: 444–454. pmid:17122850
- 5. Graubert TA, Cahan P, Edwin D, Selzer RR, Richmond TA, Eis PS, et al. (2007) A high-resolution map of segmental DNA copy number variation in the mouse genome. PLoS Genet 3: e3. pmid:17206864
- 6. Guryev V, Saar K, Adamovic T, Verheul M, van Heesch SAAC, Cook S, et al. (2008) Distribution and functional impact of DNA copy number variation in the rat. Nat Genet 40: 538–545. pmid:18443591
- 7. Clop A, Vidal O, Amills M (2012) Copy number variation in the genomes of domestic animals. Anim Genet 43: 503–517. pmid:22497594
- 8. Fadista J, Nygaard M, Holm LE, Thomsen B, Bendixen C (2008) A snapshot of CNVs in the pig genome, PLoS One 3: e3916. pmid:19079605
- 9. Li Y, Mei S, Zhang X, Peng X, Liu G, Tao H, et al. (2012) Identification of genome-wide copy number variations among diverse pig breeds by array CGH. BMC Genomics 13: 725. pmid:23265576
- 10. Wang J, Jiang J, Wang H, Kang H, Zhang Q, Liu J-F (2014) Enhancing genome-wide copy number variation identification by high density array CGH using diverse resources of pig breeds. PLoS One 9: e87571. pmid:24475311
- 11. Ramayo-Caldas Y, Castelló A, Pena RN, Alves E, Mercadé A, Souza CA, et al. (2010) Copy number variation in the porcine genome inferred from a 60 k SNP BeadChip, BMC Genomics 11: 593. pmid:20969757
- 12. Chen C, Qiao R. Wei R, Guo Y, Ai H, Ma J, et al (2012) A comprehensive survey of copy number variations in 18 diverse pig populations and identification of candidate copy number variable genes associated with complex traits. BMC Genomics 13: 733. pmid:23270433
- 13. Wang J, Jiang J, Fu W, Jiang L, Ding X, Liu J- F, et al. (2012) A genome-wide detection of copy number variations using SNP genotyping arrays in swine. BMC Genomics 13: 273. pmid:22726314
- 14. Wang J, Wang H, Jiang J, Kang H, Feng X, Zhang Q, et al. (2013) Identification of genome-wide copy number variations among diverse pig breeds using SNP genotyping arrays. PLoS One 8: e68683. pmid:23935880
- 15. Wang L, Liu X, Ahang L, Yan H, Lou W, Liang J, et al. (2013) Genome-wide copy number variations inferred from SNP genotyping arrays using a large white and Minzhu intercross population. PLoS One 8: e74879. pmid:24098353
- 16. Rubin CJ, Megens HJ, Barrio AM, Maqbool K, Sayyab S, Schwochow D, et al. (2012) Strong signatures of selection in the domestic pig genome. Proc Natl Acad Sci U S A 109: 19529–19536. pmid:23151514
- 17. Paudel Y, Madsen O, Megens H-J, Frantz LAF, Bosse M, Bastiaansen JWM, et al. (2013) Evolutionary dynamics of copy number variation in pig genomes in the context of adaptation and domestication. BMC Genomics 14: 449. pmid:23829399
- 18. Jiang J, Wang J, Wang H, Zhang Y, Kang H, Feng X, et al. (2014) Global copy number analyses by next generation sequencing provide insight into pig genome variation. BMC Genomics 15: 593. pmid:25023178
- 19. Fernández AI, Barragán C, Fernández A, Rodríguez MC, Villanueva B (2014) Copy number variants in a highly inbred Iberian porcine strain. Anim Genet 45: 357–366. pmid:24597621
- 20. Wang J, Jiang J, Wang H, Kang H, Zhang Q, Liu J- F (2014) Enhancing genome-wide copy number variation identification by high density array CGH using diverse resources of pig breeds. PLoS One 9: e87571. pmid:24475311
- 21. Wang Y, Tang Z, Sun Y, Wang H, Wang C, Yu S, et al. (2014) Analysis of genome-wide copy number variations in Chinese indigenous and western pig breeds by 60 k SNP genotyping arrays. PLoS One 9: e106780. pmid:25198154
- 22. Stankiewicz P, Lupski JR (2010) Structural variation in the human genome and its role in disease. Ann Rev Med 61: 437–455. pmid:20059347
- 23. Henrichsen CN, Chaignat E, Reymond A (2009) Copy number variants, diseases and gene expression. Hum Mol Genet 18: R1–R8. pmid:19297395
- 24. Zhang F, Gu W, Hurles ME, Lupski JR (2009) Copy number variation in human health, disease, and evolution. Ann Rev Genomics Hum Genet 10: 451–481.
- 25. Marklund S, Kijas J, Rodriguez-Martinez H, Rönnstrand L, Funa K, Moller M, et al. (1998) Molecular basis for the dominant white phenotype in the domestic pig. Genome Res 8: 826–833. pmid:9724328
- 26. Giuffra E, Törnsten A, Marklund S, Bongcam-Rudloff E, Chardon P, Kijas JMH, et al. (2002) A large duplication associated with dominant white color in pigs originated by homologous recombination between LINE elements flanking KIT. Mamm Genome 13: 569–577. pmid:12420135
- 27. Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and genotyping. Nat Rev Genet 12: 363–376. pmid:21358748
- 28. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SFA, et al. (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17: 1665–1674. pmid:17921354
- 29. Winchester L, Yau C, Ragoussis J (2009) Comparing CNV detection methods for SNP arrays. Brief Funct Genomic Proteomic 8: 353–356. pmid:19737800
- 30. Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42: D756–D763. pmid:24259432
- 31. Liu GE, Hou Y, Zhu B, Cardone MF, Jiang L, Cellamare A, et al. (2010) Analysis of copy number variations among diverse cattle breeds. Genome Res 20: 693–703. pmid:20212021
- 32. Fowler KE, Pong-Wong R, Bauer J, Clemente EJ, Reitter CP, Affara NA, et al. (2013) Genome wide analysis reveals single nucleotide polymorphisms associated with fatness and putative novel copy number variant in three pig breeds. BMC Genomics 14:784. pmid:24225222
- 33. Revay T, Quach AT, Maignei L, Sullivan B, King AW (2015) Copy number variations in high and low fertility breeding boars. BMC Genomics 16:280. pmid:25888238
- 34. Lindholm-Perry AK, Rohrer GA, Holl JW, Shackelford SD, Wheeler TJ, Koohmaraie M, et al. (2009) Relationships among calpastatin single nucleotide polymorphisms, calpastatin expression and tenderness in pork longissimus. Anim Genet 40: 713–721. pmid:19422367
- 35. Ramos AM, Croomijmans RPMA, Affara NA, Amaral AJ, Archibald AL, Beever JE, et al. (2009) Desgin of a high density SNP genotyping assay in the pig using SNPs identified and characterized by next generation sequencing technology. PLoS One 4: e6524. pmid:19654876