Genome-Wide Copy Number Variations Inferred from SNP Genotyping Arrays Using a Large White and Minzhu Intercross Population

Copy number variations (CNVs) are one of the main contributors to genetic diversity in animals and are broadly distributed in the genomes of swine. Investigating the performance and evolutionary impacts of pig CNVs requires comprehensive knowledge of their structure and function within and between breeds. In the current study, 4 different programs (i.e., GADA, PennCNV, QuantiSNP, and cnvPartition) were used to analyze Porcine SNP60 genotyping data of 585 pigs from one Large White × Minzhu intercross population to detect copy number variant regions (CNVRs). Overlapping CNVRs recalled by at least 2 programs were used to construct a powerful and comprehensive CNVR map, which contained249 CNVRs (i.e., 70 gains, 43 losses, and 136 gains/losses) and covered 26.22% of the regions in the swine genome. Ten CNVRs, representing different predicted statuses, were selected for validation via quantitative real-time PCR (QPCR); 9/10 CNVRs (i.e., 90%) were validated. When being traced back to the F0 generation, 58 events were identified in only Minzhu F0 parents and 2 events were identified in only Large White F0 parents. A series of CNVR function analyses were performed. Some of the CNVRs functions were predicted, and several interesting CNVRs for meat quality traits and hematological parameters were obtained. A comprehensive and lower false rate genome-wide CNV map was constructed for Large White and Minzhu pig genomes in this study. Our results may provide an important basis for determining the relationship between CNVRs and important qualitative and quantitative traits. In addition, it can help to further understand genetic processes in pigs.


Introduction
The pig (Sus scrofa) is not only an economically important livestock worldwide but also an ideal animal model for human disease research because its genome is similar in size and organization. Copy number variations (CNVs) are global genetic structural variations in human and animal genomes, and they defined as a segment of large DNA [kilobases (Kb) to megabases (Mb) in length] presenting with copy-number differences through the comparison of 2 or more genomes [1][2][3][4][5][6]. CNVs occupy a significant portion of all pig genomic variations, CNVs can directly impact gene expression by changing gene dosage or indirectly affecting gene expression by disrupting the regulation of gene expression [1,[7][8][9][10][11]. Many studies have shown that CNVs play important roles in normal phenotypic variability and disease susceptibility [1,[12][13][14][15][16][17][18]. They are considered promising markers for identifying economic-and disease-related traits in domestic animals [19].
In the present study, weconstructed a Large White 6 Minzhu intercross population and measured various traits [24,25]. Each individual was genotyped using an Illumina PorcineSNP60 Beadchip. The goal of this study was to construct a more accurate and comprehensive map of CNVs in the pig genome in order to determine the relationship between CNVRs and some important qualitative and quantitative traits and provide useful information for understanding the genetic processes of pigs. In this study, 4 different programs (i.e., GADA, PennCNV, QuantiSNP, and cnvPartition) [26][27][28][29]were used to analyze Porcine SNP60 genotyping data of 619 pigs from one Large White 6 Minzhu intercross population to detect CNVRs. A number of integrative analyse were also conducted.

CNV detection
In this study, a total of 585 samples were processed using the Illumina Porcine SNP60 BeadChip and passed through a series of quality control measures for CNV detection. The initial number of CNVs identified by GADA, PennCNV, QuantiSNP, and cnvPartition was 4678, 1550, 3485, and 316, respectively. CNVRs that overlapped on more than one contig and contained gaps due to the high error rate of this preliminary assembly were discarded. By aggregating overlapping CNVs, a total of 660, 505, 966, and 60 CNVRs were identified by the 4 programs (Table S1 in File S1  Figure S1 and Figure S2 in File S2). The locations and characteristics of all CNVRs on the autosomal and 6chromosomes are shown in Figure 1 and the 60 unique CNVRs detected in F0 parents are shown in Table 2.
Using the online Gene Functional Classification and Annotation Tool in the database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov) [30], 7 Benjiamini correction, statistically significant Gene Ontology (GO) [31] terms (Table S5 in File S1) and 4 Benjiamini correction statistically significant Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Table S6 in File S1)were identified [32]. The detected genes in significant GO terms were mainly involved in alternative splicing, splice variants, phosphoproteins, cytoplasm RNA-binding, translation regulation, and membrane-enclosed lumen significant GO terms. The detected genes in KEGG pathways were mainly involved in axon guidance endocytosis homologous recombination and the ErbB signaling pathway. Furthermore, 116 CNVRs (46.6%) overlapped with 1345 QTLs (Table S7 in File S1) in the pig QTLdb database [33]. These overlapped QTLs were mainly related to meat quality traits (59.33%) and the remainders were related to exterior, health, meat quality, productive and reproduction traits.
In our previous studies, genome-wide association studies (GWAS) with meat quality, production and health traits were performed using the same population [24,25]. Combining analyses found that a total number of 27, 22, 4, 3, 10, 3, and 2 genomewide significant SNPs associated with intramuscular fat (IMF), marbling, moisture, color score, lean meat in ham, lean meat weight, and mean corpuscular volume (MCV), respectively (Table S8-S14 in File S1), were located in 6 CNVRs identified in this study. Moreover, most of these CNVRs (i.e., 4/6) only appeared in Minzhu pigs and not in the Large White pig F0 generation.

Discussion
In the current study, 4 different programs (i.e., GADA, PennCNV, QuantiSNP, and CnvPartition) were used to detect CNVRs. These 4 programs calculated CNVs by using different algorithms as follows: (1)  Each of these programs had their own weaknesses (i.e., GADA is weak in the accuracy of Illumina, PennCNV has no way of ranking events due to likelihood, QuantiSNP has limited support for further event analysis and cnvPartition may miss events) [29]. Therefore, following the recommendations for increasing the frequency and decreasing the rate of false positives from Winchester et al. [29], the CNVRs, which were detected by at least 2 algorithms, were selected for use in the present research. Furthermore, in this study, a 3-generation resource population was produced by intercrossing Large White boars and Minzhu pig sows from 2007 to 2011. The population size in the current study was larger (i.e., 619 individuals) than previous studies on pigs and may decrease false CNVs. As a result, better QPCR validation was obtained than that reported by Fasista et al., Ramayo-Caldas et al. and Wang et al. (50.00%, 71.43%, and 66.67%, respectively) [19,21,22].
The special genetic background also cannot be ignored. CNVs in animals have been reported to have breed-specific characteristics [5,19]. Similar to previous reports, after analyzing CNV delivery in the F0 generation, 58 and 2 CNVRs were detected only in Minzhu and Large White pigs, respectively. The use of a Minzhu 6 Large White intercross population and 4 CNVs detection programs in this research may have minimized overlapping rates (from 38.78% to 43.98%). Another reason for the lower overlapping rates could be the different platforms we used. The SNP genotyping and CGH arrays, for instance, were different in calling technology, resolution differences, and genome coverage [19]. When the PennCNV programs were used both in this study and in the study of Wang et al. [19], 207 CNVRs (54.19%) overlapped.The low overlapping rates were also encountered in the studies of pigs and other mammals [5,6,19,34,35].
CNVRs identified in unrelated pig samples from different genetic backgrounds are important criteria in retaining CNVRs for downstream analysis. As the breed-specific CNVRs may contribute to breed differences, we first analyzed the traits and CNVR differences in the F0 parents. The Minzhu pig is a breed indigenous to northeast China. Average environmental temperatures of 4uC/year are experienced in this region and, in response, the Minzhu pig breed has good stress resistance and has developed excellent characteristics of fat deposition, [i.e., back fat thickness of 5.1 cm and 5% IMF in the longissimus muscle (LM) at 240 d of age]. Compared to the Minzhu pig, the Large-White pig has a higher rate of lean meat and faster growing rates. Under the supposition that some of the CNVRs only detected in Minzhu pigs and Large-White pigs affected these traits, we selected these CNVRs for further analyse. In order to minimize the number of these CNVRs, GO, KEGG, QTL, and comparative genomic analyse were conducted simultaneously. Oure analyses identified some interesting CNVRs.
(2011) also found genome-wide significant SNPs associated with IMF in this domain [38]. Moreover, among the genes contained in this domain, spermatogenesis associated 20 (SPATA20) is one of the putative transcripts expressed in significantly different levels during bovine intramuscular adipocyte differentiation profiled [39]. We inferred that this CNVR is positively associated with meat quality by changing the gene dosage or disrupting the regulation of gene expression. In addition, the copy number   polymorphism (CNP) genotyping using next-generation sequencing [40] in this region is in the pipeline. Another interesting CNVR is CNVR31 (Chr. 2, 42783:6186192). This CNVR, also, only appeared in the F0 Minzhu pig generation and contained 62 protein-coding, 3 miRNA, 1 pseudogene, and 1 snRNA gene (Table S16 in File S1). Most of the genome-wide significant SNPs associated with lean meat in ham (10/23, 43.48%) and lean meat weight (3/14, 21.43%) were located in these domains. In this region, 4 members of the fibroblast growth factor (FGF) family (FGF3, 4, 19) genes were identified. The FGF family is involved in numerous cellular processes including growth, angiogenesis, and development [41][42][43][44]. Transgenic mice overexpressing human FGF19 have an increased metabolic rate and decreased adiposity [45,46]. There were also 5 QTLs [33,47,48] related to traits of production in this region. Therefore, we inferred that this CNVR may have effects on lean meat.
Other CNVRs, such as CNVR109 (Chr. 8, 19534783: 19709874) and CNVR110 (Chr. 8, 27976730:29061313), were also interesting. There was 1 genome-wide significant SNPs associated with MCV located in these two regions respectively. There were also 4 healthy related QTLs [49] located in these regions, which indicated the potential immune-related function of these CNVRs.

Conclusions
By using the Porcine SNP60 Genotyping BeadChip and an F2 pig resource population, we identified 249 CNVRs and generated a powerful and comprehensive CNVR map of the pig genome. Nine out of 10 CNVRs were validated by QPCR, indicating that   our detection was highly efficient. Fifty-eight potential Minzhu pig breed-specific and 2 potential Large White pig breed-specific CNVRs were also identified. In addition, we obtained several interesting CNVRs with the integration of previously gathered QTL and SNP data for the pig families, or other populations. Our work provides an important basis for understanding pig genetic processes and obtained several interesting CNVRs for meat quality traits and hematological parameters.

Ethics Statement
All animal procedures were performed according to the guidelines developed by the China Council on Animal Care, and all protocols were approved by the Animal Care and Use Committee of Beijing, China. The approval ID or permit numbers were SYXK (Beijing) 2008-007 and SYXK (Beijing) 2008-008.

Animals
In this study, an F2 resource population was produced by intercrossing Large White boars and Minzhu pig sows during the period of 2007 to 2011. Five Large White boars were mated with 19 Minzhu pig sows. The resulting F1 generation, comprising 9 sires and 46 dams were mated (avoiding full-sib mating) to produce 576 F2 animals in 3 parities. Most sows were mated to the same boar for all 3 litters to provide large, full-sib populations. Male pigs of the F2 generation were castrated. All F2 animals were reared under identical feeding conditions at the pig research station of the Institute of Animal Science at the Chinese Academy of Agricultural Sciences.

Genotyping and quality control
Genomic DNA was extracted from ear tissue according to standard protocols. Genotyping was performed using the Porci-neSNP60 Genotyping BeadChip technology (Illumina), which contained 62,163 SNPs across the whole genome. BEADSTUDIO software (Illumina) was used to call the genotypes for all samples. Data were quality controlled for sample call rate, SNP call rate, minor allele frequency (MAF) and deviations from Hardy Weinberg Equilibrium (HWE). SNPs were excluded according to the following criteria: (1) call rate,90%, (2) MAF,3%, and (3) significant divergence from HWE with P-values lower than 10 26 .
At the second step of the iterative procedure, individuals were excluded with call rates,90%.
The final data set that passed the quality control procedure and was used in the analysis contained 48,238 SNPs and 506 F2 individuals. The distribution of SNPs after quality control and the average distance between adjacent SNPs on each chromosome are shown in Table S1 in File S1.

CNV detection
Beadstudio software (Illumina) was used to export the total signal intensity (Log R Ratio, LRR) and allelic intensity ratio (B Allele Freq, BAF) to employ GADA, PennCNV, and QuantiSNP. The version of the SNPs physical position on chromosomes derived from the Ensembl website was 9.2. The cnvPartition analysis Plug-in of Beadstudio Software (Illumina) was used for CNV detection. The minimum probe count was set to 3 and all other parameters used the default settings.
We used R statistical programming language version 2.9.2 [50] and the multiple array analysis mode of GADA to perform CNV detection, with 0.8 for sparseness hyperparameter (a) of the sparse Bayesian learning (SBL) model and 4 for the critical value of backward elimination (BE). The minimum number of SNPs at each segment was 3. Except for the LRR and BAF, to launch QuantiSNP, we also needed a genderfile. We generated the genderfile following the manufacturer's instructions and used the command line to run the QuantiSNP software with the default parameters. Then, the knock-out CNVs appeared in only one individual and the ones that contained less than 3 SNPs.
The PennCNV program also needs more information, such as the population frequency of the B allele (pfb) of SNPs, the pedigree information, and the gcmodel file. The pfb file we used was calculated based on the BAF for each marker. The pedigree information used was compiled following the manufacturer's instructions. The pig gcmodel file used was generated by calculating the GC content of the 1-Mb genomic regions surrounding each marker. The CNV detection by PennCNV was performed using the default parameters. Additionally, after calling, CNVs presented in only one individual were also knocked out.
In order to balance false positives and power, we knocked out the CNVs, which were called only in one algorithm and presented in only one individual. Then, we aggregated overlapping CNVs to

CNVR analysis
Genes within the detected CNVRs were retrieved from the Ensembl Genes 64 Database using the BioMart (http://www. biomart.org) software. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyse were carried out from the database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov). A little program named overlapping was written by Visual Basic to retrieve the QTLs within the CNVRs from the pig QTLdb (http://www.animalgenome.org/cgi-bin/QTLdb/SS/index). Some of the GWAS data we used in this paper was retrieved from the paper of LUO et al. [24]; the others were calculated using the method reported in the paper of LUO et al. [24,25]. All gene positions were transformed to fit the style of Ensembl Genes 64.

Quantitative real time PCR
The Quantitative real time PCR amplification was performed using the default conditions in 384-well optical PCR plates using an ABI 7900HT instrument (Applied Biosystems, Inc., Foster City, CA). TaqMan primer/probe sets were designed to query random CNVs using the Primer 3 web tool (http://frodo.wi.mit.edu/ primer3/). For each assay, 15 ng of genomic DNA was assayed in quadruplicate in 15-mL reactions containing a 16 final concentration of the TaqMan Universal Master Mix (ABI part number 4304437), and 150 nM each for the primers and probes. The SDS 2.4 software was used to analyze the results. The glucagon gene (GCG) [51] was used as the single copy control. Copy number was calculated by the 22DDCT method [52,53], where DCT is the cycle threshold (CT) of the target region minus the CT of the control region. In addition, 22DDCT compares the DCT value of samples with the CNV to the calibrator without the CNV. The PCR cycle was as follows: 2 min at 50uC, 10 min at 95uC, and 40 cycles of 15 sec at 95uC and 1 min at 60uC. A list of the 11 probes used in the study is shown in Table S17 in File S1.