Genome-Wide Association Studies Identify the Loci for 5 Exterior Traits in a Large White × Minzhu Pig Population

As one of the main breeding selection criteria, external appearance has special economic importance in the hog industry. In this study, an Illumina Porcine SNP60 BeadChip was used to conduct a genome-wide association study (GWAS) in 605 pigs of the F2 generation derived from a Large White × Minzhu intercross. Traits under study were abdominal circumference (AC), body height (BH), body length (BL), cannon bone circumference (CBC), chest depth (CD), chest width (CW), rump circumference (RC), rump width (RW), scapula width (SW), and waist width (WW). A total of 138 SNPs (the most significant being MARC0033464) on chromosome 7 were found to be associated with BH, BL, CBC, and RC (P-value  = 4.15E-6). One SNP on chromosome 1 was found to be associated with CD at genome-wide significance levels. The percentage phenotypic variance of these significant SNPs ranged from 0.1–25.48%. Moreover, a conditional analysis revealed that the significant SNPs were derived from a single quantitative trait locus (QTL) and indicated additional chromosome-wide significant association for 25 SNPs on SSC4 (BL, CBC) and 9 SNPs on SSC7 (RC). Linkage analysis revealed two complete linkage disequilibrium haplotype blocks that contained seven and four SNPs, respectively. In block 1, the most significant SNP, MARC0033464, was present. Annotations from pig reference genome suggested six genes (GRM4, HMGA1, NUDT3, RPS10, SPDEF and PACSIN1) in block 1 (495 kb), and one gene (SCUBE3) in block 3 (124 kb). Functional analysis indicated that HMGA1 and SCUBE3 genes are the potential genes controlling BH, BL, and RC in pigs, with an application in breeding programs. We screened several candidate intervals and genes based on SNP location and gene function, and predicted their function using bioinformatics analyses.


Introduction
External appearance is a major breeding selection criterion in the hog industry. Exterior traits such as body height (BH), body length (BL), cannon bone circumference (CBC), chest depth (CD), chest width (CW), and rump circumference (RC) are closely related to body growth, food intake capability, sow reproductive efficiency and longevity, and so on. Hence, these factors has attracted extensive attention in modern pig industry [1]. In human being, additive genetic effect explains 81% of the variation in height [2], and the heritability for BL,CD, and CW in swine are 0.16-0.32,0.34, and 0.25, respectively [1,3]. Understanding the genetic mechanism of inter-individual variation in body measurements might provide new tools that can help manipulate animal growth and production [4].
In the past, advances in DNA-based marker technology made it possible to identify quantitative trait loci (QTLs) associated with complex economically important traits in pigs. To date, more than 762 QTLs for exterior traits have been identified via genome scans and included in the Pig QTLdb (http://cn.animalgenome.org/ cgi-bin/QTLdb/SS/index) [5]. However, due to the large intervals of QTLs (covering more than 20 cM [6], even a whole chromosome [7]), few quantitative trait nucleotides (QTN) have been identified in pigs by fine-scale mapping of QTLs.
Since its emergence, high-density single nucleotide polymorphism (SNP) array has been widely used and proved to be a powerful tool in identification of causal mutations associated with complex traits such as reproduction, disease susceptibility, and meat quality in livestock [8,9,10,11,12,13,14]. Although numbers of genome-wide association studies (GWAS) have been carried out on pigs, only a few GWAS focused on identifying genes associated with exterior traits [1]. Moreover, among the exterior traits studied, GWAS data on important body measurement traits such as body height, cannon bone circumference, and rump circumference are still lacking.
In order to identify novel genes and novel quantitative trait nucleotides (QTN) related to exterior traits across the Sus scrofa genome, the present GWAS was performed to detect potential genetic variants associated with 10 exterior traits in a Large White 6 Minzhu pig resource population.

Phenotype statistics
Summary statistics results of the phenotypes of the F 2 individuals are presented in Table 1 Table S1 in File S1. In the phenotype correlations analysis, CBC was highly correlated with BH and BL (correlation coefficients are 0.57 and 0.57, respectively), and was medium correlated with RC (correlation coefficients is 0.42). In the genetic correlations analysis, CBC was highly correlated with BH, BL, and RC (correlation coefficients are 0.55, 0.77 and 0.65, respectively).

GWAS and bioinformatics analysis
After the quality control procedure, 48,238 SNPs and 594 F 2 individuals were used for the genome-wide association studies. The number of effective SNPs is 12309 and the significant threshold was 4.15E-06 (0.05/12039). None of the SNPs for AC, CW, RW, SW and WW reached a genome-wide significance level. One hundred and thirty-eight significant SNPs of BH, BL, CBC, and RC occurred in an enlarged region on SSC7 (Table S2 in File S1) while only one significant SNP of CD occurred on SSC1 (Table S3 in File S1). The Manhattan and quantile-quantile (Q-Q) plots for these five significantly associated traits are shown in Figure 1 and 2, The lambda value for BH, BL, CBC, RC and CD, were 1.18, 1.18, 1.18, 0.98, and 1.00, respectively. Other Manhattan plots and Q-Q plots are displayed in Figure S1 and S2 in File S2, respectively and the lambda value for other traits were closed to 1.The Q-Q plots and the lambda value indicated a certain degree of deviation between the predicted and actual data of BH, BL, and CBC.
There are also peeks on other chromosomes, e.g. peeks on chromosome 4 for BH, BL and CBC ( Figure 1). To avoid the potential missing of important SNPs, and meanwhile to investigate the LD SNPs of the most significant SNPs, most significant SNPs (MARC0033464 of BH, BL, CBC and RC, while H3GA0005226 of CD) were used for the conditional GWAS analysis. The Manhattan plots obtained from the conditional analysis are shown in Figure S3 in File S2. No other significant association of SNPs was detected after the conditional analysis. However, 25 SNPs on SSC4 (P-value ,6.08E-05) and 9 SNPs on SSC7 (P-value , 6.16E-05) showed chromosome-wide association with BL, CBC, and RC (Table S9 in File S1).

Discussion
A GWAS using the PorcineSNP60 BeadChip was performed by Mix Model and Regression-Genomic Control (GRAMMAR-GC) methods. This pedigree-based method could exploit inter-family variation in addition to intra-family variation, and could rapidly analyze hundreds of thousands of markers [18,19]. Although the genomic control (GC) procedure was used to correct for stratification, we also could see a little population stratification of some of the traits. We think the population stratification is odd and maybe caused by cryptic relatedness.
Heavy linkage disequilibrium in intercross population is one of the well-known limitations in genetic studies, which makes it difficult to identify the causative gene and mutation. In order to decrease non-positive rates, Bonferroni corrections have been used for all multiple tests. Since the Bonferroni correction is a conservative method to determine significance thresholds, we calculated the number of effective SNPs, and used the number of effective SNPs for the correction to ensure that no QTL was missed. Anyhow, there are no SNP significant associated with AC, CW, RW, SW and WW in present study. This may suggest that single locus plays a minor role in AC, CW, RW, SW and WW variation or the variation of AC, CW, RW, SW and WW were caused by copy number variation and so on.
In our research, the genome-wide significant association of SNPs on SSC7 disappeared after conditional analysis ( Figure S1 in File S2 and Table S9 in File S1). This may suggest that these significant SNPs are derived from a single QTL.
It is known that one QTL could influence a multifactorial trait and most biological traits also have a complex inheritance influenced by numerous genes [20]. In the phenotype and genetic correlation analysis in the present study, the BH, BL, CBC and RC showed high phenotype or genetic correlations with each other suggested significant association between the major genes responsible for these traits. This GWAS also showed that all significant SNPs associated with more than one exterior trait localized in the same region, suggesting a pleiotropic effect of QTL on SSC7. Therefore, we chose the 20 most significant SNPs for further analysis. These SNPs, which were associated with every trait, are located in the proximal region from 31.24 Mb to 36.00 Mb on SSC7. SSC7 has been previously reported to be rich in QTLs influencing exterior traits [21,22]. The region from 17.05 Mb to 45.42 Mb on SSC7 has been shown to have pleiotropic and significant QTL effects on CBC and Body length, Figure 1. Manhattan plots of genome-wide association study with BH, BL, CBC, and RC. Chromosomes 1-18 and X are shown separated by color. The red horizontal lines indicate the genome-wide significance levels (-log 10 (6.08E-05)). doi:10.1371/journal.pone.0103766.g001   Erhualian F2 intercross population [21,22,23,24,25].
In previous reports, QTLs detected in F2 populations usually defined a high percentage of phenotypic variance [16,19]. Similar to these reports, the present study indicated that 18.94% and 25.48% phenotypic variance of BH and BL, respectively, were explained by QTLs, which suggests that QTLs have a major effect on BH and BL in this 4.76 Mb region.
Linkage analysis revealed two haplotype blocks in the significant region on SSC7 at complete linkage disequilibrium, which contained eight annotated genes of the pig reference genome. Of the 8 genes, HMGA1 is of particular interest because it is ubiquitous in all cells of higher eukaryotes and the HMGA1 protein plays an important role in cell growth and differentiation [26,27]. By influencing the expression of two IGFBP (insulin-like growth factor binding protein) protein species, HMGA1 serves as a IGF1 (modulator of insulin-like growth factor 1) activity [28]. Comparing to its abundance during embryonic development, the HMGA1 protein is nearly disappeared in fully differentiated adult tissues [29]. A single SNP haplotype of IGF1 is usually present in small dog breeds, but nearly vanished in giant breeds, indicating that the variant plays important role in body size [30]. Moreover, in human studies, some GWAS have reported that HMGA1 may be an ideal candidate as an anthropometric trait [31,32,33]. For instance, HMGA1 might be a prime candidate for BH and BL in pigs.
SCUBE3 (signal peptide CUB EGF-like 3) is also of particular interest. SCUBE proteins usually have three domains including multiple epidermal growth factor (EGF) domains, bone morphogenetic protein 1 (BMP1) C-terminal complement subcomponent (CUB) domain, and a huge spacer domain with several N-linked glycosylation sites [34]. In humans, SCUBE3 functions as an endogenous transforming growth factor (TGFb) receptor ligand, increasing Smad2/3 phosphorylation, and thus up-regulates target genes such as hedgehog (Hh), TGFb, and bone morphogenetic protein 2 (BMP2) [35]. In human primary osteoblasts and long bones such as humerus and femur, the expression level of SCUBE3 is usually very high [36], indicating that this gene is a major contributor to bone cell development. Thus, SCUBE3 might also be an influencing factor in determining the size of bones and is a prime candidate for BH, BL, and RC in pigs.
In summary, this study identified a total of 138 significant SNPs on SSC7 associated with one or more exterior traits and one SNP on SSC1 associated with CD. Conditional analysis indicated that significant SNPs originated from a single QTL. Furthermore, linkage analysis identified two complete linkage disequilibrium regions containing the HMGA1 and SCUBE3 gene. Bioinformatics analysis indicated that HMGA1 and SCUBE3 genes might be prime candidates for some exterior traits in pigs.

Ethics statement
All animals used in the study were housed and handled following the guidelines of experimental animals established by the Council of China. Animal experiments were approved by the Science Research Department of the Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS) (Beijing, China). The overall association between haplotypes and the response. 2 Estimated frequency of each haplotype in the population. 3 The score for the haplotype, which is the statistical measurement of association of each specific haplotype with the trait. 4 The asymptotic chi-square P-value was calculated from the square of the score statistic. doi:10.1371/journal.pone.0103766.t004 Animals and phenotypic data A 3-generation resource population constructed by intercrossing Large White boars and Minzhu sows between 2007 and 2011 was used in this study. In F1 generation, four Large White boars were mated with 16 Minzhu sows. Nine boars and 46 sows, all from the F1 population, were intercrossed to produce an F2 population consisted of 618 animals. All the animals were reared under the same nutritional and environmental conditions, and were housed at the experiment base of Institute of Animal Science, CAAS. All ten traits were measured by following standard procedures. Phenotypic and genetic correlations among the traits were calculated to investigate whether they reflect the correlation among GWAS results. Pearson and genetic correlations among the traits were performed using SAS (version 9.1) and DMU (version 4.7), respectively.
Genotyping and data quality control A very small piece of ear tissue (about 50 mg) was collected when the ear number were coded using the ear tag pliers, and the longissimus dorsi were collected when the pigs were dead, all the procedure need no anaesthesia or analgesia disposition. Genomic DNA was extracted from the ear or longissimus dorsi using the phenol-chloroform method. Quantification and qualification of DNA were performed using a NanoDrop 2000 (Thermo Fisher Scientific Inc., Wilmington, DE, USA) with the standard: DNA concentration .50 ng/ul; 260/280 ratio between 1.8 and 2.1; 260/230 ratio .1.8. Genotyping was performed using the Illumina Porcine SNP60 BeadChip containing 62,163 SNPs across the whole genome. All F2 animals were genotyped using the BEADSTUDIO software (version 2.0, Illumina). Before quality control, the maximum likelihood method was applied using the Cervus program [37] to check pedigree mismatching using SNP information. F2 animals were quality controlled by the GenABEL R package. In the quality control procedure, call rate (CR) more than 90%, genotype minor allele frequency (MAF) more than 3% and Hardy-Weinberg equilibrium (HWE) values (P,10-6) were applied.

Statistical analysis
The association analysis was conducted using the Mix Model and Regression -Genomic Control (GRAMMAR-GC) method [18,19].
In the first step, the residuals were analyzed using DMU software.
In this step, data were analyzed using the mixed model: , a is the vector of random additive genetic effects with a,N(0, As a 2 ) (where A is the relationship matrix calculated from the corrected pedigree and s a 2 is the additive genetic variance), X, T and Z are incidence matrices relating records in y to fixed and random effects, p is the regression coefficient of body weight and e is the vector of residual errors with e,N(0, Is e 2 ),where I is the identity matrix and s e 2 is the residual variance. The vector of residuals y * is estimated as where b * , p * , c * and a * are estimates and predictors for b, p, c and a, respectively.
Second, the residuals were used as the dependent traits for single locus regression analysis, and the unadjusted test statistic factor of the i th SNP T i 2 was calculated in the genomic control (GC) procedure in the R statistical environment using the GenABEL package [18,19].
The Bonferroni method was employed for the genome-wide significance threshold, in which the conventional P-value was divided by the number of effective SNPs. The number of genomewide effective SNPs, which was 12,039 (Table S2, File S1), was estimated using a simpleM method [38] following 3 steps: 1) Using the cor () function in R to derive the CLD correlation matrix from the SNP data set; 2) Using the R function eigen () to calculate the eigenvalues and 3) Inferring effective number of independent tests (Meff G) through PCA.
The significant threshold was 4.15E-06 (0.05/12039). The Manhattan plot was constructed using the GAP package and Q-Q plot was constructed within the R statistical environment. Proportion of the explained variation of each SNP is calculated using Tassel software [39].

Conditional GWAS
In the conditional GWAS, the most significant SNP for each trait was used as a additional fixed effect, and all the analysis were performed by following the state GWAS procedure. The most significant SNP, MARC0033464, was used for the conditional analysis of BH, BL, CBC, and RC, while the most significant SNP H3GA0005226 was used for conditional analysis of CD.

Bioinformatics analysis
All significant SNPs were BLASTed in the websites of Ensembl and NCBI (http://www.ncbi.nlm.nih.gov/). Genes within the detected regions were retrieved from the Ensembl Genes 64 Database using the BioMart software (http://www.biomart.org). Gene ontology and KEGG pathway analyses were performed to extract the functional annotation clustering and the pathways involved using DAVID (http://david.abcc.ncifcrf.gov) with the threshold of P,0.05.

Haplotype analysis
Haplotype block detection was performed in the region which contained the 24 overlapped SNPs that were most significantly associated with the selected traits. The HAPLOVIEW V3.31 program [40] was used to detect and visualize the haplotype blocks with default parameters. The genotypes of significant SNP loci for 594 F2 individuals and their parents were used to detect the haplotype blocks.
Association analysis of the detected haplotype blocks and traits of 594 F2 individuals were performed using the Haplo.Stats R package [41]. The global score statistic index, positive/negative score for a particular haplotype (Hap-Score) P-value for the significance of each hap-score (p-val) and the frequency of each haplotype (Hap-Freq) were calculated to test overall associations among haplotype blocks and traits.

Supporting Information
File S1 Table S1. The correlations between each trait.  File S2 Figure S1. The Manhattan plots for AC, CW, RW, SW and WW. Figure S2. The Q-Q plots for AC, CW, RW, SW and WW. Figure S3. The Manhattan plots for conditional GWAS for BH, BL, CBC, CD and RC.