Genomic Footprints in Selected and Unselected Beef Cattle Breeds in Korea

Korean Hanwoo cattle have been subjected to intensive artificial selection over the past four decades to improve meat production traits. Another three cattle varieties very closely related to Hanwoo reside in Korea (Jeju Black and Brindle) and in China (Yanbian). These breeds have not been part of a breeding scheme to improve production traits. Here, we compare the selected Hanwoo against these similar but presumed to be unselected populations to identify genomic regions that have been under recent selection pressure due to the breeding program. Rsb statistics were used to contrast the genomes of Hanwoo versus a pooled sample of the three unselected population (UN). We identified 37 significant SNPs (FDR corrected) in the HW/UN comparison and 21 known protein coding genes were within 1 MB to the identified SNPs. These genes were previously reported to affect traits important for meat production (14 genes), reproduction including mammary gland development (3 genes), coat color (2 genes), and genes affecting behavioral traits in a broader sense (2 genes). We subsequently sequenced (Illumina HiSeq 2000 platform) 10 individuals of the brown Hanwoo and the Chinese Yanbian to identify SNPs within the candidate genomic regions. Based on allele frequency differences, haplotype structures, and literature research, we singled out one non-synonymous SNP in the APP gene (APP: c.569C>T, Ala199Val) and predicted the mutational effect on the protein structure. We found that protein-protein interactions might be impaired due to increased exposed hydrophobic surfaces of the mutated protein. The APP gene has also been reported to affect meat tenderness in pigs and obesity in humans. Meat tenderness has been linked to intramuscular fat content, which is one of the main breeding goals for brown Hanwoo, potentially supporting a causal influence of the herein described nsSNP in the APP gene.


Introduction
individuals within the breeds. The average genomic relationship between animals within the current population was 0.003, 0.049, 0.05 and 0.037 for HW, BR, JB, and YB, respectively (S1 Fig). In addition to the four East-Asian breeds, we included a European taurine breed (Angus) in the analysis to provide an anchor point for the breed diversification. The PCA results grouped individuals of the four East-Asian and the Angus breed in agreement to their origins (Figs 1 and 2). Some of the YB cattle spread from the East-Asian cluster to the out-group of Angus cattle. This could suggest that YB either had some unaccounted intercrossing with European cattle breeds, or genetic artefacts of the same domestication origin as European taurine remain in this unselected population [20]. Further, HW and YB cattle are genetically very similar whilst BR and JB are more distinguishable ( Table 1, Fig 2), probably due to the more intense drift and founder effects in these small populations. Nevertheless, compared to other breed differences, these two cattle varieties can be classified as Hanwoo.
Based on Admixture results, the BR and JB cattle have on average 30% of their allelic frequencies in common with HW, and YB share on average 67.2% of their genetic make-up with HW ( Fig 2B). As a comparison, the East-Asian cattle breeds shared only between 0.2% and 1.5% of their genetic make-up with the European Angus cattle. Further, F ST -statistics, which is designed to describe differences between two populations based on allele frequencies, also found that the Angus cattle were more distantly related to the Korean breeds than they are amongst each other ( Table 1). Closest relationships were found between the YB and the other Korean breeds, especially with HW. Lastly, genomic relationships within the populations were <0.01 for the East-Asian breeds, indicating low relatedness and random sampling of animals. The similarity of the selected East-Asian cattle breeds makes them a good choice to detect recent selection footprints in the brown Hanwoo caused by the relatively recent breeding program.

Identification of footprints of selection
Footprints of selection were analyzed with the Rsb statistic which is based on extended haplotype homozygosities (EHH). Rsb scores are designed to detect recent selective sweep regions   Table). We further applied the Benjamini-Hochberg false discovery rate (FDR P-value < 0.05) to correct for multiple testing (Fig 3, [21]). Out of the 31 suggestive regions, 16 candidate regions containing 37 SNPs remained significant ( Table 2). All but one significant candidate region overlapped with QTLs for meat and production traits according to a search in the Animal QTLdb (Release 17; http://www.animalgenome.org/ cgi-bin/QTLdb/BT/index). Among the 16 significant candidate regions, 21 known protein coding genes were located 300kb up-or down-stream of the SNP. Most of these genes (14 genes) are known to affect meat production, obesity, or lipid metabolism such as GHRH [22], APP [23][24][25], or TRPC1 [26].
The brown Hanwoo cattle have been selected for meat production since the 1970s [2]. The breeding goal heavily emphasizes traits such as marbling score and meat tenderness, carcass weight, backfat thickness, and eye muscle area [27]. Over the past 30 years, the annual genetic gain in HW increased from 0.02 to 0.82 kg/year, with body weight (at 18 months of age) increasing from 331 to 574 kg, and intramuscular fat (final slaughter age) from 15% to 23% (NIAS, 2009). Therefore, we can assume that changes to the genetic make-up of HW cattle in favor of meat enhancing alleles had taken place which was supported by the large number of QTLs and genes affecting meat traits within the identified candidate regions (Table 2).
Other genes affected reproduction including mammary gland development (3 genes: MTMR2 [28], CWC15 [29], ACCN1 [30]), coat color (2 genes: WNT1 [31], POMC [32]), and behavioral traits, including autism, neural development defects, and maternal behavior in humans, and aggressive behavior in pigs (2 genes: RELN [33], AVPR1A [34][35][36], Table 2).   [30] calf size, body weight (mature, yearling), residual feed intake Selection footprints in regions of fertility and mammary gland traits can be explained by the need of any breeding scheme to produce healthy and fertile individuals that are capable of nurturing their offspring to produce sufficient replacement animals ( Table 2, [37]). Additionally, calm and social animals are easier to handle and keep under constricted housing conditions [38,39], which is why the behavioral genes might have been indirectly selected for in the HW ( Table 2). The coat color genes were most likely identified due to the variation in this trait between the chosen populations ( Table 2).
Only one of our identified regions (chr. 29 44.5-47Mb) was in agreement with a previous study on selection signatures in Korean beef cattle [16]. This region contained the calpain 1 (CAPN1) gene which was reported to be associated with carcass weight and marbling score in Hanwoo [40]. This lack of confirmation with previous studies might be due to a sampling bias between the studies or the nature of the comparison in this study. This study was designed to identify recent signatures of selection [27], rather than older signatures most likely due to natural selection (e.g. adaptive signatures for different environments) as reported in Porto-Neto et al. [41].

Detection of sequence variation in the selection signals
To provide further information and proof for the regions exhibiting positive selection footprints in HW, we sequenced the entire genome of the HW and YB populations (10 individuals of each group). The sequence data consisted of a total of 7,268,700,964 sequence-reads across the bovine genome. The average depth of sequence coverage for each sample was approximately 11-fold. The GATK and ANNOVAR analyses identified 103 SNP (53 nsSNP, 50 synonymous SNP) in the 16 candidate regions described in the previous section. Allele frequencies of the same allele in the HW and YB population were calculated for each nsSNP. Frequencies differed by more than 20% between the two populations for 23 nsSNP, however, significant differences were only found for 6 nsSNPs ( Table 3). Out of these 6 nsSNP, 3 nsSNP were classified by PolyPhen as benign, 2 nsSNP as possibly damaging, and 1 nsSNP as probably damaging ( Table 3).
Out of the 3 possibly or probably damaging nsSNP, i.e. protein changing, only one gene matched putative candidate genes that were found within the selective sweep regions described in Table 2: The amyloid beta (A4) precursor protein (APP) on chromosome 1 (9-10.5 Mb). The APP gene was previously associated with meat and fat traits in pigs and mice [42][43][44]. Further, analyses of haplotype structures around the five possibly or probably damaging nsSNP showed a haplotype block in the HW population only in proximity to the APP gene (located between markers ARS-BFGL-NGS-43310 and Hapmap38887-BTA-22281, S2 Fig). In theory, the linkage disequilibrium between markers weakens over time. However, artificial selection accumulates the favorable allele including a small region surrounding the mutation which manifests itself as a haplotype block. Therefore, we decided to investigate the APP gene in more detail.
The nsSNP in exon 5 of the APP gene (APP: c.569C>T, Ala199Val) had a 30% difference in allele frequencies between the HW and YB populations (p-value = 0.002, Table 3). The frequency of the Valine variant (minor T allele) was increased in the HW population compared to the YB indicating an accumulation of a possibly favorable variant under selection pressure.
The APP is a highly conserved gene and its protein is found in a variety of tissues as a cell surface receptor and transmembrane precursor. Several different splice variants are known [45][46][47][48][49]; however, the exact function of the protein remains unclear. The human APP protein consists of five domains including a transmembrane domain linking the ecto-and cytoplasm. Dawkins and Small [50] discuss several possible functions of the APP protein from trophic actions including cellular growth and neuronal recovery, to stem-cell proliferation, to blood coagulation. The APP gene is possibly best known for its involvement in Alzheimer's disease as its protein is a precursor molecule for beta-amyloid (Aβ) which is a major component of amyloid plaques [51]. Further, APP has also been associated with a common progressive muscle disease (inclusion body myopathy, IBM) in elderly people [52,53]. Lobjois et al. [42] inferred that this progressive muscle disease might explain a link to meat tenderness in livestock populations. Currently, 40 coding mutations in the human APP gene are known; however, for exon 5 only one mutation was reported without a pathogenic effect [54,55].
We modelled the structure of the bovine APP protein based on the entire human N-terminal APP-E1 domain with a sequence identity to the bovine sequence of 92%. Only four out of the five known subunits of the human APP protein were also predicted in the bovine protein structure (Fig 4, [50]). A Kunitz-type protease inhibitor domain that was reported for longer isoforms of the human APP was missing. The position of the Ala199Val amino acid exchange in exon 5 was predicted to result in exposed hydrophobic surfaces of parts of the protein interfaces. Such an exposition is energetically unfavorable especially in aqueous environments. The folding free energy of each structure is defined as the free energy difference between the folded and unfolded states of the protein. Further, the folding structure of three out of the four protein subunits was affected by this nsSNP resulting in predicted unfavorable secondary and tertiary folding energies ( Table 4). The pathogenic effect of the APP gene in humans was linked to an overexpression and thus an accumulation of Aβ peptides in brain and muscle cells [52]. Lobjois et al. [42] also reported that meat of pigs with a lower shear force had a higher expression of APP in the longissimus dorsi muscle. Possibly, our identified nsSNP could cause an accumulation of Aβ peptide in the muscle through an impaired protein-protein interaction which could inhibit post-translational protein cleavage, and thus, affect meat tenderness.

Conclusion
By comparing a selected population of brown Hanwoo with three closely related and unselected populations we identified 16 genomic regions suggesting recent selection in Hanwoo. These regions harbor 21 genes and 14 of these genes have been previously associated to production traits, especially meat traits and fat accumulation. Through the application of several analytical and comparative methods, we were able to single out the APP gene which was previously reported to be overexpressed in pigs with tender meat. A non-synonymous SNP (Ala 199 Val) that affected the protein structure was identified through sequencing and protein structure modelling. The most notable involvement of the APP gene is in its muscle degenerative function in humans; however, whether and how these human diseases and animal production traits are linked or caused through the same pathways and sequence variations remains unclear.

Ethics statement
The brown Hanwoo DNA was extracted either from AI bull semen straws or from blood samples obtained from different veterinary practitioners in the Hanwoo Improvement Center of the National Agricultural Cooperative Federation with the permission of the owners. The protocol was approved by the Committee on the Ethics of Animal Experiments of the National Institute of Animal Science (Permit Number: 2013-028). No ethics statement was required for the collection of DNA samples from the Brindle, Jeju Black Hanwoo, or Chinese Yanbian cattle because they were not sampled specifically for this study.

Animals and genotype assays
The data on brown Hanwoo (N = 100) were collected from steers of candidate bulls for progeny testing in the Hanwoo Improvement Center of the National Agricultural Cooperative  Genomic DNA for genotyping assays was extracted from semen or blood sample. The DNA was isolated from semen using the DNeasy 96 Blood and Tissue Kit (Qiagen, Valencia, CA, USA) with 100 μl of sperm added to 10 ml of Buffer 1. The mix was gently vortexed and centrifuged for 10 minutes at 4000 rmp to isolate pure cell pellets. Lysis of the cell walls was achieved by adding 300 μl Buffer 1 and 100 μl proteinase K to the pellet. The samples were incubated for 2 hours at 56°C. Purification of DNA was carried out with the DNeasy Blood & Tissue Kit; protocol 1 (Qiagen, Valencia, CA, USA) according to manufacturer's protocol.
DNA quantification was performed using a NanoDrop 1000 (Thermo Fisher Scientific Inc., Wilmington, DE, USA). DNA samples were submitted for genotyping with total DNA of 900 ng, 260/280 ratio .1.8, and DNA concentration of 20 ng/ul. The single nucleotide polymorphism (SNP) genotyping was performed using the Illumina Bovine SNP 50K Bead chip (Illumina Inc., San Diego, CA). Approximately 200 ng of genomic DNA was used to genotype each sample on the chip. Samples were processed according to the Illumina Infinium-II assay. Normalized bead intensity data for each sample were processed with the Beadstudio 3.0 software (Illumina) which converted fluorescent intensities into SNP genotypes.

Analysis of SNP statistics
Stringent filtering criteria were applied to the genotype data per breed. Briefly, SNPs were excluded from the analysis if they failed to provide genotypes in more than 5% of the cases, had median GC scores below 0.6, a GC scores under 0.6 in more than 90% of the samples, and deviated in heterozygosity by more than three standard deviations from the heterozygosity of other SNPs. Individual genotypes with GC scores under 0.6 were treated as missing. Loci out of Hardy-Weinberg equilibrium in a chi 2 -test for a cut-off p-value of 1 −15 , unmapped SNPs, SNPs on sex chromosomes, and SNPs with a minor allele frequency < 0.01 were also excluded. Finally, genotype data of 38,266 SNP remained after quality control.

Analysis of genetic structure and relationship
Population structures were assessed with a principal component analysis (PCA) based on the genomic relationship matrix (GRM, [56]). Genotype records (0, 1, 2) were used to create a GRM. Through eigendecomposition, the principal components of the GRM were factorized. As such, a PCA summarizes the genetic similarity between subjects through eigenvalues and eigenvectors. The genetic variation within and between breeds were assessed with Weir and Cockerham's F-statistics (F ST , [19]. F-statistics describe the reduction in heterozygosity in comparison to Hardy-Weinberg expectations. The F ST -statistic in particular describes the difference in allele frequencies between two independent populations with a potential value of 0 to 1, with 1 being the most different/ distantly related. F ST -distance matrices were used for hierarchical clustering using Ward's minimum variance algorithm. European Angus samples were added as an out-group to the East-Asian populations.
To provide a finer quantification of the different ancestry proportions, we performed a model-based unsupervised hierarchical clustering of the individuals using Admixture 1.22 software [57]. Admixture provides a likelihood estimate of breed proportions dependent on allele frequencies of the assumed ancestral populations. Ancestral populations in an unsupervised analysis are clustered based on allele frequency similarities. We analyzed breed proportions with K = 2 to 4 assumed ancestral populations. Cross validation showed that K = 3 provided the best fit to our data.

Detecting footprints of selection using Rsb score
Selection footprints per population were analyzed based on extended haplotype homozygosities (EHH) which is a measure for the breakdown of linkage disequilibrium with increasing distance from a SNP. Based on EHH, we computed Rsb scores according to Tang [58] using the rehh package in R [59]. Rsb is the standardized log-ratio of the integrated EHHS (iES) between pairs of populations and designed to detect sweeps that have occurred in only one population compared to another population. As our study is designed to detect sweeps specific to HW, genomic regions with extreme Rsb are indicative for signals of positive selection in HW. Rsb scores were transformed into P Rsb = -log[F(Rsb)], assuming Rsb are normally distributed (under neutrality). P Rsb can be interpreted as log10(1 ⁄ P) where P is the one-tailed hypothesis associated to the neutral hypothesis (no selection). To account for multiple testing, we applied the Benjamini-Hochberg false discovery rate correction (P-value = 0.05) [21].
Haplotypes were predicted for a pooled sample (UN) from the three unselected breeds (BR, JB, and YB) to create a larger counter population to the selected HW and a more accurate haplotype prediction. Haplotypes of the HW and UN were predicted with fastPHASE [60], respectively. We used default parameters except for the number of random starts for the EMalgorithm (-T option), which was set to 10 to reduce calculation time.

Identification and annotation of candidate regions
Candidate regions with positive selection footprints were defined as containing at least two SNPs with a significant Rsb (FDR corrected P-value < 0.05) for a 1 Mb window (with a 0.5 Mb overlap) for the HW/UN. Candidate regions were further considered those containing at least two SNPs exceeding these thresholds for at least three population comparisons (S3 Fig). In case of several continuous windows with two SNPs or more, we combined the overlapping windows into one candidate region. Genes within the candidate regions were determined with the intersectBed command from the BedTools software [61]. The BedTools software allows to identify genes that overlap by at least one base out of a 4-bp integration site within a given candidate region. We also selected genes within +/-300kb around SNPs that were identified in all four comparisons using "closetBed" in the BedTools. The genomic position of the genes was obtained from 'refGene' information on the UCSC Genome Browser (http://genome.ucsc.edu/). To identify the functions/roles of genes in the candidate regions, we used the medScan database that provides sentences from MEDLINE abstracts [62]. QTL regions were identified from information on Cattle QTLs in the Animal QTLdb (Release 17; http://www.animalgenome.org/cgi-bin/QTLdb/BT/ index). QTL locations by bp (Btau4.0) were downloaded and meat and production trait associated QTLs were selected. The associated names of these QTL types described in the Animal QTLdb were as follows: intramuscular fat, marbling score, average daily gain, yield grade, marbling score (EBV) shear force, tenderness score, body weight, height, feed conversion ratio, and average daily gain.

Detection of sequence variation in the selection signals
We sequenced the entire genome of 10 HW and 10 YB cattle by randomly shearing 3 μg of genomic DNA to generate about 90bp inserts (Covaris INC, Woburn, USA). The fragmented DNA was amplified and end-repaired using T4 DNA polymerase, which has an inactive 5'-3' exonuclease function. Illumina paired-end adaptor oligonucleotides were ligated to the sticky ends (Illumina Inc, San Diego, USA). We analyzed the ligation mixture by gel electrophoresis from which we purified fragments of 200-250bp length. These fragments were then sequenced on the HiSeq 2000 platform according to manufacturer's specifications (Illumina Inc, San Diego, USA).
Additionally, we performed SNP annotation according to their functional type such as intergenic, 5'UTR, 3'UTR, coding SNP, or non-synonymous (nsSNP) in the target region using the ANNOVAR software [66]. The sequencing processes and more detailed information can be found in Choi et al. [67].

Predicting protein structures
We used the PolyPhen program (Polymorphism Phenotype version 2.2; http://genetics.bwh. harvard.edu/pph2) to predict the effects of nsSNP (from the ANNOVAR analysis) on the protein structure. We based this analysis on the sequence data that we established in the previous section. This sequence data was translated into a protein sequence including the amino acid exchange of the nsSNP. PolyPhen provides a prediction about the effect of a SNP on the protein structure and categorizes them as 'benign' (i.e. most likely lacking any phenotypic effect) or 'damaging' (probably or possibly damaging: affecting protein function with higher or lower confidence, respectively) according a Bayesian approach. Regions with possibly or probably damaging nsSNP were further analyzed by investigating the haplotype structure surrounding these nsSNPs based on markers on the Illumina Bovine SNP 50K Bead chip. Haplotypes of the regions were generated with Haploview vs.4.2 [68] with the default algorithm described by Gabriel et al. 2002 [69]. Genes with damaging nsSNP were compared to the known genes located in our identified regions under selection.
Lastly, we predicted the bovine protein structure of the gene of interest (APP) through comparative homology modelling. We used the known human protein template structure (PDB code 3KTM) to explore mutation effects of the candidate genes due to a nsSNP. The homology modelling was performed using the MODELLER9v10 program within the Discovery Studio (DS) 3.5 molecular modelling packages (http://accelrys.com/products/discovery-studio/, Accelrys Inc, San Diego, USA). The detailed process is shown in S4