A Post-GWAS Replication Study Confirming the PTK2 Gene Associated with Milk Production Traits in Chinese Holstein

Our initial genome-wide association study (GWAS) demonstrated that two SNPs (ARS-BFGL-NGS-33248, UA-IFASA-9288) within the protein tyrosine kinase 2 (PTK2) gene were significantly associated with milk production traits in Chinese Holstein dairy cattle. To further validate if the statistical evidence provided in GWAS were true-positive findings, a replication study was performed herein through genotype-phenotype associations. The two tested SNPs were found to show significant associations with milk production traits, which confirmed the associations observed in the original study. Specifically, SNPs lying in the PTK2 gene were also detected by sequencing 14 unrelated sires in Chinese Holsteins and a total of thirty-three novel SNPs were identified. Thirteen out of these identified SNPs were genotyped and tested for association with milk production traits in an independent resource population. After Bonferroni correction for multiple testing, twelve SNPs were statistically significant for more than two milk production traits. Analyses of pairwise D’ measures of linkage disequilibrium (LD) between all SNPs were also explored. Two haplotype blocks were inferred and the association study at haplotype level revealed similar effects on milk production traits. In addition, the RNA expression analyses revealed that a non-synonymous coding SNP (g.4061098T>G) was involved in the regulation of gene expression. Thus the findings presented here provide strong evidence for associations of PTK2 variants with dairy production traits and may be applied in Chinese Holstein breeding program.


Introduction
With the maturing of genome sequencing and high throughput SNP genotyping technologies, genome-wide association studies (GWAS) have become a routine strategy for investigating mutations underlying complex traits. So far GWAS have been successfully employed in identifying genes involving human diseases [1][2][3][4], economically important traits in livestock [5][6][7][8] and various complex traits in other species [9,10]. Numerous candidate loci associated with respective target traits emerged from outcomes of these GWAS studies. However, it is widely accepted that GWAS is solely the first step in the process of gene discovering [11][12][13][14][15], and findings from GWAS still require further validation for ascertaining bona-fide causal variants via genetic replication as well as functional assessment [16].
Until recently,a large number of genome-wide association studies in dairy cattle focused on identifying genomic regions or SNPs associated with milk production traits [17][18][19][20][21][22][23]. Nevertheless, merely a few reports [24,25] concerned replicated association studies involving potential functional genes. In our initial GWAS in the Chinese Holstein population [19], in addition to some functional genes such as DGAT1 and GHR reported previously [26,27], several novel potential candidate genes, proxied by hundreds of significant SNPs within these genes or in surrounding regions, were also identified. Among these novel genes, the protein tyrosine kinase 2 (PTK2) gene, firstly identified by our GWAS, can be considered as a promising candidate gene for milk production traits. The PTK2 gene is located on bovine chromosome (BTA) 14. A substantial number of quantitative trait loci (QTLs) for milk production traits have been identified on BTA14 [28][29][30][31][32][33][34][35]. For instance, the well-known DGAT1 gene located at ,0.44 Mb has been functionally confirmed as a major gene affecting milk production traits [27]. Bennewitz et al. [36] paid special attention to the QTLs on BTA14 and declared that there should exist a second conditional QTL for these traits. In our previous GWAS results [19], a large proportion of the significant SNPs (61 out of 105) were located on BTA14, 59 of which lied in the reported QTL regions.
In our initial GWAS findings, it is a remarkable fact that two significant SNPs, ARS-BFGL-NGS-33248 (P = 1.26 E-08, n = 1815) and UA-IFASA-9288 (P = 2.19 E-12, n = 1815) harbored within the regions of introns 1 and 5 of the PTK2 gene respectively, showed powerful associations with milk fat percentage from the viewpoint of statistics. These two SNPs were located in the QTL regions for fat percentage reported in previous studies [31,34,35] and were also observed in association with milk production traits in Holstein-Friesian cattle [22]. In addition, findings from studies in humans suggested that PTK2 plays a prominent role in the mammary gland development and function [37][38][39]. According to the basic idea of comparative genomics as well as significant signals in our initial GWAS, we therefore assumed that the PTK2 gene could be a functional as well as positional candidate gene for milk production traits in dairy cattle. So far the association of PTK2 polymorphisms with milk production traits has not been reported in dairy cattle.
Motivated by searching for potential casual genetic variants associated with milk production traits, we not only conducted a replication study in an independent dairy cattle population to provide convincing statistical evidence for associations of PTK2 variants discovered in our initial GWAS study, but also performed an association study of novel SNPs within PTK2. Furthermore, we performed expression analyses and identified a potentially functional SNP that influences mRNA expression of PTK2. The results showed that the identified variants in PTK2 may be important genetic factors implicated in milk production ability in dairy cattle and have the capability to be used in marker-assisted breeding based on further validation.

Materials and Methods
All protocols for collection of the tissue samples of experimental individuals and phenotypic observations were reviewed and approved by the Institutional Animal Care and Use Committee (IACUC) at China Agricultural University.

Resource Population
A daughter design was applied in this study. A total of 638 daughters together with 14 corresponding sires were collected to construct the study population. Daughters were from 15 Holstein cattle farms in Beijing, where regular and standard performance testing (dairy herd improvement, DHI) has been implemented since 1999. Estimated breeding values (EBVs) of all individuals for five milk production traits (milk yield, MY; fat yield, FY; protein yield, PY; fat percentage, FP; protein percentage, PP) were predicted by official Dairy Data Center of China based on the genetic parameters estimated via the complete DHI (Dairy Herd Improvement) data of Chinese dairy cattle population. The EBVs were then considered as ''phenotypic'' observations used for the subsequent analyses.

DNA Extraction
Genomic DNA was extracted from whole blood of the daughters by DP (318) Blood DNA Kit (Tiangen Biotech Co., China) following the manufacturer's instructions and from semen sample of the sires using a salt-out procedure [40]. The quantity and quality of isolated DNA were measured with NanoDrop TM ND-2000c Spectrophotometer (Thermo Scientific, Inc.).

SNP Identification
A total of 32 pairs of PCR primers (File S1) were designed with Primer Premier5.0 (Premier, Canada) based on the genomic sequence of the bovine PTK2 gene referring to Bos_tauru-s_UMD_3.1 assembly (NCBI Reference Sequence: AC_000171.1) to amplify all exons and partial adjacent introns. DNA pooling strategy was used to identify potential SNPs involved in the gene. DNA samples of 14 sires were selected to construct a DNA pool with equal DNA concentration of 50 ng/ml for each individual. PCRs were performed in 25 ml volume containing 50-100 ng of genomic DNA, 10 pmol of each primer, 5 mM of dNTP mix, 2.5 ml of 106PCR buffer, 0.625 U Taq DNA polymerase (Takara Biotechnology Co. Ltd.). The PCR reaction conditions were as follows: a pre-denaturation at 95uC for 5 min, followed by 34 cycles of 30 s at 95uC, annealing from 45uC to 60uC for 35 s, 35 s at 72uC, and a final extension at 72uC for 10 min. The PCR products were sequenced using the ABI37306l DNA analyzer (Applied Biosystems, Foster City, CA, USA).
The details of all SNPs identified have been submitted to dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) and will be publicly available (accession numbers from ss836185560 to ss836185594) in dbSNP version 140.

Genotyping
Among the identified SNPs within the region of PTK2, 13 SNPs was selected according to their positions for as the candidate marks. These SNPs were further genotyped for all experimental individuals using the iPLEX MassARRAY system (Sequenom Inc.).
Additionally, the same two SNPs, ARS-BFGL-NGS-33248 and UA-IFASA-9288 from the original GWAS findings were genotyped and analyzed in another independent population with the sample size of 2,284 for the sake of replication study. The information and corresponding results of these two SNPs were illustrated in File S2.

Bioinformatics Analysis of Bovine PTK2 Protein and mRNA Structure
The potential impact of the amino acid change on the structure or function of protein was predicted by applying the two web server tools SIFT (http://sift.jcvi.org/www/SIFT.html) and Poly-Phen (http://genetics.bwh.harvard.edu/pph2/). Secondary structures of the full-length bovine PTK2 mRNA were predicted by the software RNA structure4.6 [41].

Haplotype Construction and Linkage Disequilibrium (LD) Measure
To further explore the linkage disequilibrium extent between each pair of SNPs genotyped within the PTK2 gene, haplotypes were reconstructed for each individual using the software fastPHASE [42], and possible missing genotypes were also imputed where necessary.
Measure of pairwise LD for all pairs of SNPs within the region of the PTK2 gene was performed using the software Haploview [43]. Accordingly, haplotype blocks where SNPs are in high LD were also determined via Haploview based on the criterion of D'. Haplotypes within each block will be employed to test their associations with phenotypes in subsequent analyses.

Association Analyses
In the study, association analyses based on both single locus genotypes and the haplotype block were conducted to validate the effect of PTK2 variants on milk production traits.
For single locus analyses, we adopted the same analytical strategy as employed in our initial GWAS [19]. A linear mixed regression model was fitted as follows: where y is the vector of EBVs of all daughters, m is the overall mean. b is the regression coefficient of EBVs on SNP genotypes, x is the vector of the SNP genotype predictors which defined as 0,1 or 2 in correspondence with the three genotypes 11,12 and 22 (assuming 1 is the minor frequency allele), a is the vector of residual polygenetic effects with a,N (0, Ad 2 a ) (where A is the additive genetic relationship matrix based on the pedigree data regarding all individuals investigated and d 2 a is the additive variance), and e is the vector of residual errors distributed as e,N (0, Wd 2 e ), where W is a diagonal weight matrix with the diagonal elements calculated by 1=REL i (REL i is the reliability of EBV for individual i) and d 2 e is the residual error variance. As for each SNP, the estimate of b and the corresponding sampling variance Var( b ) were obtained by solving the mixed model equations. Subsequently, a Wald chi-squared statistic b 2 =Var( b ) with df~1 was used to determine if the SNP were associated with the dairy traits considered in this study.
For haplotype analyses considering multiple loci in high LD, we extended the original haplotype trend regression (HTR) approach [44] by allowing random effects in the regression model. For haplotypes within each haplotype block, the haplotype linear regression model with polygenic random effects was as follows: where y is the vector of EBV of all n experimental daughters; m is the overall mean. 1 is the n-dimensional vector with all elements equal to 1; h is the haplotype fixed effect vector with elements h i (i = 1, 2, …, k) being the effect of haplotype i of all k distinct haplotypes within the haplotype block; X is the n|k indicator matrix with the same pattern as that in [44]; a is the vector of the residual polygenetic effects with a,N (0, Ad 2 a ) (where A is the additive genetic relationship matrix based on the pedigree data regarding all individuals investigated and d 2 a is the additive variance), and e is the vector of residual errors with e,N (0, Wd 2 e ) (where d 2 e is the residual error variance and the weight matrix W is a diagonal matrix with each diagonal element equal to the reciprocal of the reliability of the estimate of the breeding value corresponding to individual i). For each haplotype block, the estimate of the haplotype effects vector h and the corresponding sampling variance Var( ĥ h) were obtained by solving the mixed model equations (MME), and a Wald chi-squared statistic with df~k{1 (k is the number of distinct haplotypes in the experimental populations within the region of haplotype block investigated) was constructed to examine whether the haplotype block was associated with the trait.
For both single locus analyses and haplotype analyses, the Bonferroni method was adopted to correct for multiple testing according to the number of SNP loci or haplotype blocks. Associations were considered significant if a raw P value ,0.05/N, where N is the number of SNP loci or haplotype blocks tested in analyses.
The effect of a SNP on a specific trait was measured as the proportion of phenotypic variance of the trait explained by the SNP. The proportion of variance explained by a SNP was calculated as 2p(1{p)â a 2 =s 2 P , where p is the allele frequency of the SNP analyzed, a is the average effect of gene substitution calculated based on the linear mixed model employed in this study, and s 2 P is the estimate of the phenotypic variance using the complete DHI (Dairy Herd Improvement) data of Chinese dairy cattle population. We employed Fortran 95 to code the computing programs and they are available upon request.

RNA Preparation, cDNA Synthesis and Quantitative Realtime PCR
To further validate the potential function of PTK2, we conducted differential expression analyses between different tissues as well as different genotypes at the mRNA level. Samples of eight different tissues, i.e., heart, liver, lung, kidney, mammary gland, ovary, uterus and skeletal muscle, were collected from eight cows in later lactation after slaughter within 30 min. All tissue samples were frozen in liquid nitrogen and stored at 280uC. Total RNA was extracted using TRIzol Reagent (Life technologies, Carlsbad, CA, USA) following the manufacturer's protocols. The quality of isolated RNA was checked by electrophoresis in 1% agarose gel and quantified with the NanoDropTM 2000 spectrophotometer (Thermo Scientific, USA). RNA was purified and reverse transcribed into cDNA using PrimerScriptH RT reagent Kit with gDNA Eraser (TaKaRa Biotechnology Dalian Co., Ltd) according to the manufacturer's instructions. Quantities of mRNA were then analyzed with real-time PCR using a LightCyclerH 480 Real-Time PCR System (Roche, Hercules, CA, USA). The real time PCR reactions were performed in triplicate with a volume of 20 ml containing 10 ml SYBRGreen Mixture, 1 ml template of cDNA, 1 ml of each primer, 7 ml deionized water. The reaction conditions were as follows: 95uC for 5 min, 45 cycles of 95uC for 10 s, 60uC for 10 s, 72uC for 10 s. Primer sequences synthesized by Sangon Biotech (Shanghai, China) for amplifying bovine PTK2 gene were: forward 59-CAAGAAGAGCGTATGAGGATGG-39, reverse 59-GAGATGCCTGACCTGGGTAGAT-39. The GAPDH (glyceraldehyde-3-phosphate dehydrogenase) gene was applied as an internal reference gene for normalization and the primers were: forward 59-GGTGCTGAGTATGTGGTGGA-39, reverse 59-GGCATTGCTGACAATCTTGA-39. The relative expression level was analyzed by the 2 2DDCt method [45]. The PTK2 gene expression levels in different tissues from three individuals were measured. To further explore the potential effects of mutations within the PTK2 gene on its expression at mRNA level, expression levels of mammary glands from eight individuals with different genotypes at some important SNP loci were also analyzed. Gene expression data were analyzed by a t-test applying the SAS9.0 program (SAS Institute, Inc., Cary, NC, USA), with a P value ,0.05 considered significant.

Analyses of SNPs
A total of thirty-three SNPs were detected in the PTK2 gene,of which eight were located in exons and the other twenty-five in introns (Table 1). Five SNPs were synonymous substitutions and one was non-synonymous substitution (g.4061098T.G) that changed amino acid Ile into Met (NP_001068718.2:p.I981M). In accordance with the positions of these polymorphisms (Figure 1), thirteen SNPs were finally selected and genotyped for the association study. Twelve out of the selected SNPs were in Hardy-Weinberg equilibrium. The SNP12 (g.4059863A.C) did not fit with Hardy-Weinberg equilibrium (P,0.0001), due to the small size of the population used here. The locations and allele frequencies of the 13 SNPs are shown in Table 2.
In addition to the above thirteen SNPs within the PTK2 gene, the details of the two replicating SNPs (ARS-BFGL-NGS-33248, UA-IFASA-9288) are presented in Table S1 in File S2.

Functional Prediction of the Non-synonymous SNP13
The bioinformatics analysis by applying the SIFT and PolyPhen web server tools was performed for the purpose of predicting the effect of the non-synonymous SNP13 (g.4061098T.G) on protein structure or function. The effect of this SNP was predicted to be ''tolerated'' and ''benign'' by the SIFT and PolyPhen programs respectively, suggesting that the protein structure and function may not be influenced. Afterwards, the alteration in the secondary structure of PTK2 mRNA caused by the T/G substitution was predicted, showing little difference in mRNA structure (Figure 2).  However, the free energy of PTK2 mRNA was predicted to be changed by this substitution (21586.5 kal/mol for T allele and 21583.1 kal/mol for G allele), indicating that the predicted mRNA structure of T allele with lower free energy might be more stable than G allele.

Association Analyses
Single locus-based regression analyses. Associations between EBVs of five milk production traits and the thirteen SNPs are shown in Table 3. Of the thirteen SNPs, after Bonferroni correction for multiple testing, eight SNPs (SNP1, SNP5, SNP7, SNP8, SNP10, SNP11, SNP12, SNP13) were significantly associated with MY, PY and FP, two SNPs (SNP6, SNP9) were significant for MY, FP and PP, and one SNP (SNP3) was strongly associated with FP. However, no significant associations were observed in other SNPs (SNP2, SNP4). The significant associations between the same two SNPs from our original GWAS findings and FP were successfully confirmed (Table S2 in File S2). Haplotype regression analyses. Two haplotype blocks were inferred (Figure 3). The block 1 consisted of 5 SNPs, which formed 4 haplotypes in the resource population. The common haplotypes TGTAAT, CGTAAT and TATGAC occurred at the frequency of 33.5%, 33.1% and 25.6% respectively. The pooled haplotypes accounted for 7.8% ( Table 4). The block 2 was composed of 2 SNPs and 3 haplotypes were formed. The frequency of the haplotypes CC, AC and CA were 44.0%, 38.8% and 17.2% respectively ( Table 4). The association study of the haplotypes with EBVs of five milk production traits showed that the haplotypes in block 1 were associated with MY, FP and PP and haplotypes in block 2 were correlated with MY and FP. In the analysis, the haplotype with frequency .5% was treated as a distinguishable haplotype, and those haplotypes each with relative frequency ,5% were pooled into a single group.
The results of the single-locus and haplotype association analyses were mostly in agreement, thus providing support to the existence of associations between these SNPs and haplotypes with milk production traits.

Expression Analysis of the Bovine PTK2 Gene
The relative mRNA expression of PTK2 in different tissues was determined by quantitative real-time PCR. The results showed that bovine PTK2 mRNA was expressed in all detected tissues, with higher expression level in mammary gland, uterus and kidney. A relatively lower expression level was found in heart and skeletal muscle tissues (Figure 4). Motivated by the results of functional prediction of the non-synonymous SNP13, the mRNA expression of PTK2 was measured in mammary glands with different genotypes of SNP13. The results showed that mRNA expression level in mammary glands (n = 4, mean relative expression = 0.496) with the TT genotype was higher than in mammary glands (n = 4, mean relative expression = 0.008) with the TG genotype (P = 0.06, Figure 5), reaching more than sixtyfold. This difference did not reach the statistical significance perhaps because of the small sample size.

Discussion
In this study, we not only replicated the associations of the same two SNPs within PTK2 that identified in our initial GWAS study [19], but also presented a link of several novel PTK2 variants with milk production traits. Moreover, the functional analysis revealed that the SNP13 was related to mRNA expression of the PTK2 gene. Thus the findings presented here provide strong evidence for associations of PTK2 variants with dairy production traits.
The EBVs of daughters herein were used as phenotypic observations in our association analyses. Although using EBVs as phenotypes may cause double-counting of relatives information, previous studies have shown that EBV-based phenotype does not significantly lower the statistical power [46,47] compared with the other two commonly used phenotypic observations, i.e., yield deviation (YD) [48] and de-regressed EBVs [49]. So far EBVs are routinely used as dependent variables in genetic association study concerning milk production traits in dairy cattle [18][19][20]. We have also compared phenotypes denoted by EBVs and de-regressed EBVs for association analyses in our initial GWAS study [19], which demonstrated that the findings based on the two different phenotypes were basically overlapped. In addition, we considered that using the same type of phenotype and analytical method employed in the initial study are very important prerequisites for exact replication [50]. Accordingly, we performed current study by applying EBVs as phenotypes to confirm the putative association from our initial GWAS study.
Two analytical methods, i.e., single locus-based regression and haplotype regression analyses, were implemented to determine whether these genetic variants were associated with milk production traits. The single marker analysis was usually thought less powerful than multi-point analysis in statistics for lack of the simultaneous use of multiple marker information [51,52]. Considering the indispensability of using the same analytical method for exact replication [50], we herein applied the same statistical method as that used in our previous GWAS study [19]. In the linear mixed model for EBVs, the polygenic random effects were treated as the random variable and the SNP genotypes as fixed effects. Under the framework of such mixed model, the biological meanings of polygenic random effects and residual errors denoted true breeding values and the sampling errors of the estimated breeding values of individuals. That is, we assumed EBVs can be explained by the SNP genotype effects investigated, the remaining polygenic effects and the residual random effects, e.g., sampling errors of EBVs. In addition, we performed multiple-point detection using haplotype regression approach to further confirm the findings from the single marker analyses. It is also notable that we succeeded in validation of significant SNPs with strong statistical signals ever after conservative Bonferroni correction for multiple testing in both single-locus and haplotype analyses, indicating the novel SNPs identified in PTK2 could be considered as convincing genetic markers for individual selection in future cattle breeding program.
It should be mentioned that joint consideration of a set of correlated traits can offer additional information in comparison with information contained in a single trait. The previous studies concerning joint genetic linkage analyses and association analyses of multiple traits have proven that the multiple-trait analyses have statistical advantage over the single-trait analyses in detection power and evaluation of genetic effects by joint analysis of suites of correlated phenotypes [53,54]. The reason we merely adopted the single-trait analyses herein is to keep the statistical method in our validation study consistent with that in our initial GWAS study   [19], since our study goal did not focus on the search for novel genes/mutations. The PTK2 gene, also known as Focal Adhesion Kinase (FAK), encodes a cytoplasmic non-receptor protein tyrosine kinase which was firstly isolated from chicken embryo fibroblasts transformed by Rous sarcoma virus v-Src [55]. PTK2 could be implicated in several signal transduction pathways such as cell motility [56,57], microtubule stability [58,59] and the regulation of cell-cell junctions [60]. As a major mediator of integrin signaling, PTK2 has been found to be important for the survival, proliferation, and differentiation of mammary epithelial cells in vitro [61,62]. In addition, PTK2 plays a prominent role in maintaining the  mammary gland development and function in vivo [63]. In this study, ubiquitous mRNA expression of PTK2 was detected in eight different tissues, with a relatively higher expression level in the mammary gland than in other tissues, indicating its important role in the mammary gland. All these findings suggested that the participation of PTK2 in biological processes may have vital implications for milk production in dairy cattle.
In addition to the two SNPs identified by our initial GWAS, twenty-five novel SNPs were discovered in introns. The functional role of SNPs within introns in altering gene transcriptional level has been clearly validated [64,65]. Seven SNPs were synonymous and these SNPs always were not expected to change the function of related proteins since the non-substitution of amino acid. With better insight into the influencing factors of protein function, synonymous SNPs have increasingly received much attention. It has been reported that synonymous SNPs could impact on protein expression and thus function via altering or increasing the mRNA stability [66,67]. In conjunction with association analyses, some associated SNPs could be selected for functional analyses in vitro to examine whether these polymorphisms are involved in the process of milk production through transcriptional alteration of the PTK2 gene.  Pooled haplotypes* The SNP13 (g.4061098T.G) was a non-synonymous mutation, leading to an amino acid substitution (p.Ile981Met). Our study showed a significant association of this SNP with MY, PY and FP. Non-synonymous SNPs are always evaluated for their effect on protein stability and/or function. Some investigations have reported the impact on the transcriptional process and mRNA stability as a result of non-synonymous SNPs [68,69]. To determine whether this SNP may modulate the expression of this gene and influence the structure or function of the related protein, real-time PCR analysis and functional prediction of non-synonymous SNP were performed. The results showed that mammary glands with the homozygous TT genotype had higher expression rates of PTK2 mRNA than the heterozygous genotype TG. On account of genetic variation influencing gene expression and the heritability of gene expression [70], it is likely that T allele was associated with increased expression of the PTK2 gene. Nonetheless, we cannot exclude the possibility that SNP13 was in strong linkage disequilibrium with known SNPs (SNP5, SNP7, SNP9) or undetected SNPs that disturbing gene expression. Although no predictive alteration of protein function was found, it was quite necessary to detect the PTK2 protein levels based on the genotype at SNP13 in a larger sample size and to conduct experiments on mRNA stability and functional analyses in vitro to precisely assess the effects of this polymorphism.
Based on our findings herein, a further step is needed for practical use of these variations in selection and breeding programs. Specifically, we can incorporate the information of the PTK2 gene carried by individuals into selection program via marker-assisted selection in Chinese Holstein breeding program. The promising mutations of the PTK2 gene can be used to increase the frequency of the marker that is positively associated with the milk production traits of interest by selecting cattles carrying two copies of the marker, and against those carrying no copies of it. As each marker in this study merely explains a small proportion of the genetic variance (Table 3), the application of marker-assisted selection may be limited. Alternatively, the marker information could be incorporated into the panel of high density SNP array in genomic selection that potentially leads to more rapid genetic gain in the dairy industry.
In conclusion, we replicated the significant associations of PTK2 variants derived from the previous GWAS findings with milk production traits and identified a non-synonymous coding SNP (g.4061098T.G) to be involved in the regulation of gene expression. These findings strongly suggest that the associated variants are either directly responsible for the QTL effect or closely related to the causal mutation and would be useful in advanced marker-assisted selection. Further functional studies will be required to validate the effects of these markers in other populations before their applications to marker-assisted breeding in the Chinese Holstein population.

Supporting Information
File S1 The detailed information of primers used for PCRs of the bovine PTK2 gene.

(DOCX)
File S2 This file contains the description of the replication study. Table S1, Information of the two SNPs used for replication study. Table S2, Associations of the same two SNPs identified via GWAS with EBVs of five milk production traits. (DOCX) Figure 5. The relative quantification of PTK2 in mammary glands with different TT and TG genotypes. The internal reference gene GAPDH was used for normalization. Bars represent mean6SE (n = 4). doi:10.1371/journal.pone.0083625.g005