Genome-Wide Detection of Selective Signatures in Chicken through High Density SNPs

Chicken is recognized as an excellent model for studies of genetic mechanism of phenotypic and genomic evolution, with large effective population size and strong human-driven selection. In the present study, we performed Extended Haplotype Homozygosity (EHH) tests to identify significant core regions employing 600K SNP Chicken chip in an F2 population of 1,534 hens, which was derived from reciprocal crosses between White Leghorn and Dongxiang chicken. Results indicated that a total of 49,151 core regions with an average length of 9.79 Kb were identified, which occupied approximately 52.15% of genome across all autosomes, and 806 significant core regions attracted us mostly. Genes in candidate regions may experience positive selection and were considered to have possible influence on beneficial economic traits. A panel of genes including AASDHPPT, GDPD5, PAR3, SOX6, GPC1 and a signal pathway of AKT1 were detected with the most extreme P-values. Further enrichment analyses indicated that these genes were associated with immune function, sensory organ development and neurogenesis, and may have experienced positive selection in chicken. Moreover, some of core regions exactly overlapped with genes excavated in our previous GWAS, suggesting that these genes have undergone positive selection may affect egg production. Findings in our study could draw a comparatively integrate genome-wide map of selection signature in the chicken genome, and would be worthy for explicating the genetic mechanisms of phenotypic diversity in poultry breeding.


Background
The chicken have gone through intensive selection because of domestication and breeding, what has gave rise to various phenotypes when compared with their wild counterparts [1]. Selection signatures are the selective footprints across the organism genome due to the effect of artificial selection, which displayed the long range linkage disequilibrium in chromosome or genetic diversity reduction [2,3]. Thus, identifying selection signatures in chicken, we could effectively and efficiently uncovered the selected genes and genomic regions, which would contribute to understand the relationships between genotype and phenotype.
Genotyping Array (Affymetrix, Inc. Santa Clara, CA, USA). All hens were caged individually and reared with feed and water ad libitum in the same environment (Farm of Jiangsu Institute of Poultry Science).
Before analyses, we first removed 7,883 SNPs with unknown chromosome and 112 SNPs with redundant genomic positions. Then we performed the quality control with Affymetrix Power Tools (APT) and R scripts according to the guidelines by Affymetrix (http://affymetrix. com/). We kept the samples with dish quality control (DQC) > 0.82 and sample call rate >99% or SNP call rate >97%, only 532,299 SNPs remained for the following analyses. In addition, we deleted 26,656 SNPs on sex chromosomes and 302 SNPs on the two linkage groups. SNPs with minor allele frequency (MAF) less than 5% and those deviating from Hardy-Weinberg equilibrium (HWE) test (P-value < 1×10 −6 ) using the PLINK package [19] were excluded. Besides, we used Beagle V.4.1 [20] to impute some sporadic missing genotypes and to reconstruct haplotypes for every chromosome with the default parameters.

Genome-wide Detection of Selection Signature
We firstly identified core regions, which were characterized by SNPs with strong linkage disequilibrium (LD) and consisted of some core haplotypes after the EHH test by the software Sweep v.1.1 (http://www.broadinstitute.org/mpg/sweep/) [21] in the whole chicken genome, in which 3 to 20 SNPs located. The plot of LD decay was available in S1 Fig. As the ICGSC (2004) reported, the recombination rate ranged from 2.5 to 21 cM/Mb among the chicken chromosomes [22,23]. And rates were much higher on microchromosomes than on macrochromosomes, where the median value were 6.6 cM/Mb and 2.8 cM/Mb, respectively. Considering that macrochromosomes occupied a large proportion in chicken, physical distance of 300Kb was chosen as the matched distance to determine the REHH value for each core region, as well as evaluating how LD decayed across the whole genome. The REHH (Relative EHH) statistic corrected EHH through eliminating the influence of variability in chicken recombination rates [13].

REHH ¼ EHH=EHH
Which the EHH was defined as the decay of the special core haplotype of EHH on all other core haplotypes.
Furthermore, we treated and ordered the frequency of all core haplotypes into 20 bins to compute the significance of the REHH values, obtaining P-values by log-transforming to reach normality and calculating the mean and standard deviation.

Annotation and Functional Analysis
After performing EHH tests, regions with extreme REHH p-values were considered as candidates for selective sweeps, as proposed by Sabeti et al. [5]. Comparing with chicken QTL database (http://www.animalgenome.org/cgi-bin/QTLdb/GG/index) [24], we firstly screened the distribution of the candidate selective signature regions located in QTL using the Perl script. Genes participated in the significant core regions were annotated using the online Genome Browser and Biomart tools by Ensembl [25,26]. We also compared these genes with what found in our previous GWAS, in order to dig interesting genes determining or affecting some important economic traits. Functional analyses were carried out for the sweeps identified in the F 2 population using the function annotation and clustering tools in the Database for annotation, Visualization and Integrated Discovery (DAVID) [27].

Descriptive Statistics for Markers and Core Haplotype
With an average neighbor marker distance of 1.79 kb, 580,961 SNPs were genotyped by using 600 K Affymetrix Axiom Chicken Genotyping Array. After the quality control and discarding the SNPs on two linkage groups and sex chromosome, 389,618 SNPs and 1,512 individuals were finally remained for the further analyses. Table 1 completely described the distribution of SNPs and haplotypes identified among the whole chicken genome.
For the SNPs analyzed in this study, a total of 49,151 core regions covering 479.51 Mbp (52.15%) of the genome were uncovered. The average length of core region was approximated as 9.76±13.11 Kb, while the maximum was of 965.16 Kb located in chromosome 5. For the every chromosome, the covered proportion of length by core regions with total length, as same as the number of SNPs, were also given in Table 1. What's more, we draw Fig 1 to depict the distribution of the size of haplotype blocks as well as the number of SNPs within core regions.
Overall, there were 265,840 SNPs (48.68%) located in core regions, and the number of them were sprayed between three to twenty in each core, since twenty was designed as upper limit for the SNPs, even if it may exceed.

Genome-wide Selection Signature
A total of 434,509 EHH tests were performed in 49,151 core regions and averaged out to 8.84 tests per core region. Core haplotypes under selection would have a relatively high frequency according to the selection signature theory. Hence, the core haplotypes with frequency below 25% were totally excluded. The distribution of remaining core haplotypes across the whole genome was visualized via a Manhattan plot in Fig 2, which displayed the P-value of REHH by minus log-transforming located in the different chromosomal position. It was evident that these selection signals mainly concentrated in macro-chromosome such as chromosomes 1, 2, and 3. Table 2 indicated that 149,662 EHH tests were remained for all core haplotypes with the frequency > 25%. There were 5,166 and 806 tests reaching significant level with the Pvalue < 0.05 and 0.01, respectively. We then examined the conformity of the distribution of Tukey's outliers with the threshold level of the P-value of 0.01 and 0.001 in core haplotypes. The-log 10 of the REHH P-value distributed within each bin, which was partitioned by core haplotype frequency, was displayed in Fig 3. As it showed that the majority of the extreme outliers appeared with small haplotype frequencies. The genome-wide map of selective signature was shown in Fig 4. The blue vertical line represented selective signature across the whole genome.

Genome Annotation within Significant Regions
Core regions owning the significant P-values (P < 0.01) of REHH were explored to identify all overlaps with published QTLs in the chicken QTL database which was available online. The current release of the Chicken QTLdb contains 4,676 QTLs (319 different traits) from 224 publications. As a consequence, the overlapping core regions were detected to be mainly associated with production traits and immune function, and the lowest P-values (top six) were displayed in S1 Table. We further annotated overlapping chicken genes located in significant core regions based on Ensembl gene database. A summary of statistics was shown in S2 Fig, and there were 232 genes found within positively selected regions and about half of them were located on chromosomes 1, 2 and 3.
The subset of genes in all core regions with extreme REHH P-values (p < 0.001) were displayed in Table 3. Genes, including AASDHPPT (aminoadipate-semialdehyde dehydrogenasephosphopantetheinyl transferase), GDPD5 (glycerophosphodiester phosphodiesterase domain containing 5), PARD3 (par-3 family cell polarity regulator) and SOX6 (SRY-box 6), were related to immune function and neurogenesis. Particularly, AKT1 and GPC1 participating in AKT signal pathway and Wnt signal pathway respectively, which played a key role in organogenesis and reproduction performance, were obtained.

Functional Enrichment Analyses
The annotations of genes and analysis of pathways were conducted using online DAVID software [27]. The genes were found to be significantly (P<0.05) enriched in 10 Go terms ( Table 4). The term of 'cell part morphogenesis' (GO: 0032990) indicates genes associated with immune function. The term of 'eye development' (GO: 0001654) and 'neuron projection morphogenesis' (GO: 0048812) suggested the distinct biological association with organ development and neurogenesis in chicken.

Discussion
Chicken are reported to have gone through strict artificial selection and breeding for multipurpose as far as thousands years ago [28] and can be served as an experimental model in studying important biological process and effective disease treatment [29,30,31]. Especially the improvement of the chicken genome sequence (ICGSC Gallus_gallus-4.0/galGal4 Nov.2011) makes it possible to reveal the genetic basis under chickens' recent evolution across the whole genome. To be specific, identifying the genes in the core regions that have experienced artificial selection would effectively explicate economic traits or biological process due to breeding. For example, Zhang et al. [32] genotyped 475 DNA samples of two divergent chicken lines that were selected for abdominal fat (AF) content. They detected 51 and 57 significant core regions in the lean and fat lines, respectively, with some important genes involved in AF deposition in chickens. In present study, EHH test was conducted to identify the selection signatures across the whole genome and bioinformatics analyses was applied to convince the biological significance for the detected core regions in F 2 chicken population. Furthermore, we took fully advantage of high density SNP chips which contain much more information, so that to improve the accuracy of detection. The 806 significant core regions (S2 Table) identified in this study was much more than previous report because of the high density SNP chip and genome recombination events [33,34]. We further compared significant core haplotypes with other haplotypes in these regions and found that the former possess the larger extent of homozygosity than the later. Therefore, we inferred that the long stretch of homozygosity were not only simply resulted from the strong selective pressure but also owed to the local recombination rate [35]. Given that the method of EHH test may failed to detect the lower-frequency alleles because of the lack of sensitivity, [36], we removed the haplotypes with frequency below 25%. It would contribute to reduce the frequency of false positives as far as possible and lead to an authentic result. We then aligned significant core regions (P < 0.01) with the chicken QTL database. These positively selected regions were illustrated to be relevant to some important economic traits, like egg production, feed efficiency, body weight and immunity. These results were in accordance with the findings reported by Li DF et al. [14].
To elaborate whether the selection signatures were correlative with phenotypic traits, we compared them with our previous genome-wide association studies. In our previous GWAS studies [37,38,39], 37 genes played a key role in egg production and feed efficiency in chicken such as egg weight (EW), egg number (EN), feed intake (FI), feed conversion ratio (FCR) and residual feed intake(RFI). Twenty-two genes overlapped with core regions had been demonstrated to be associated with economic traits. For instance, the NCAPG gene locating on chromosome 4 had been identified influencing both egg weight and body weight simultaneously in a pleiotropic manner [37]; the CLSPN gene on chromosome 23 displayed significant haplotype [38], may related with egg number via affecting the function of ovary and uterus [40,41]. These results could effectively elucidate that some important economic traits like egg production had undergone selection. As shown in S2 Fig, a total of 232 genes were screened within positively selected regions and most of them were located on macrochromosomes due to the greater length of themselves. Furthermore, a panel of genes emerging in the extreme significant core regions with P-value of REHH < 0.001 including AASDHPPT, GDPD5, PARD3, SOX6, AKT1 and GPC1, were testified to be able to exert influence on some biological process. Among them, the AASDHPPT gene was reported to have a role in either the adaptive or innate immune response [42]. The GDPD5 gene, sharing homology with glycerophosphodiester phosphodiesterase 2 (GDE2), is important to initiate neurogenesis and cellular proliferation and differentiation [43]. According to Afonso's study [44], the morphogenesis of embryonic neural tissue and the process of neurogenesis in chicken were partly regulated by PARD3. Additionally, the positive feedback loop between Sox2 and Sox6, whose functions were to inhibit premature neuronal differentiation, played a key role in maintaining the neural progenitor cells [45]. The other genes were involved in two critical signaling pathway, namely PI3K/Akt pathway and Wnt signal pathway. The PI3K-Akt pathway participated in early infection of some exogenous avian leucosis viruses [46] and could mediate IGF-1 survival during the otic neuronal progenitor phase of early inner ear development [47]. The GPC1 gene involved in Wnt pathway was of great importance in chick, which can regulate the signaling mechanisms in early formation of the trigeminal sensory system and cell proliferation hence affected reproduction performance [48,49]. The significant GO terms were shown in table and the terms of biological process appealed to us mostly. The terms including 'cell part morphogenesis' , 'eye development' , 'neuron projection morphogenesis' and 'sensory organ development' were consistent with our previous result about the function of the extreme significant genes. Among our findings, genes participated in these terms that overlapping with positive selection core regions, like VSX2 and Bmp7, which were associated with cell growth [50] and sensory organ development [51,52], played a key role in chicken. Unfortunately, no pathway achieved significant level (P-value < 0.05). The explanation may be able to account for the result was that the current annotation of the chicken genome limited availability of genes mapped in the KEGG pathways, further decreased the sensitivity of the analysis. Hence, it's in urgent need of new efficient technique to enhance the efficiency for the detection of selection signatures in different populations in the future. It might be helpful to combine the diverse test in a composite likelihood statistic, because each single test only provided partial information of selective signatures [36].
In conclusion, 806 significant core regions were detected across the whole genome in chicken applying the EHH test together with certain bioinformatics analyses. Genes in these regions related to immune function, sensory organ development and neurogenesis may experience positive selection. Moreover, our results draw a comparatively integrated genome-wide map of selection signature in chicken and yielded valuable insight into the genetic basic of artificial selection in poultry breeding.