Identification of Genetic Associations and Functional Polymorphisms of SAA1 Gene Affecting Milk Production Traits in Dairy Cattle

Our initial RNA sequencing (RNA-seq) revealed that the Serum amyloid A1 (SAA1) gene was differentially expressed in the mammary glands of lactating Holstein cows with extremely high versus low phenotypic values of milk protein and fat percentage. To further validate the genetic effect and potential molecular mechanisms of SAA1 gene involved in regulating milk production traits in dairy cattle, we herein performed a study through genotype-phenotype associations. Six identified SNPs were significantly associated with one or more milk production traits (0.00002< P < 0.0025), providing additional evidence for the potential role of SAA1 variants in milk production traits in dairy cows. Subsequently, both luciferase assay and electrophoretic mobility shift assay (EMSA) clearly demonstrated that the allele A of g.-963C>A increased the promoter activity by binding the PARP factor while allele C did not. Bioinformatics analysis indicated that the secondary structure of SAA protein changed by the substitution A/G in the locus c. +2510A>G. Our findings were the first to reveal the significant associations of the SAA1 gene with milk production traits, providing basis for further biological function validation, and two identified SNPs, g.-963C>A and c. +2510A>G, may be considered as genetic markers for breeding in dairy cattle.


Introduction
The dairy industry provides milk and dairy products for the human diet and it plays a key role in agricultural economy [1][2]. As the most important economic traits, an improvement in milk production traits continues to be the most profitable breeding goal [3][4]. Although classical DNA-based marker technology has expedited the genetic improvement of animal performance [5], the implementation of new strategies for investigating genes underlying complex traits could facilitate the improvement in selection accuracies.

DNA Extraction
Genomic DNAs were extracted from whole blood samples of cows by DP318 Blood DNA Kit (Tiangen Biotech Co., China) according to the manufacturer's instructions and from semen samples of the sires using the routine salt-out procedures [19]. The quantity and quality of extracted DNAs were measured by NANODROP 2000 Spectrophotometer (Thermo Scientific, DE, USA).

SNP Identification and Genotyping
Based on the genomic sequence of the bovine SAA1 (GenBank accession no.: AC_000186.1), eight pairs of primers were designed with Oligo 6.0 software to amplify the entire coding region and 1920 bp of 5' flanking sequences (Table 1). A DNA pool (50 ng/mL) was constructed with equal DNA concentration for each of the 12 sires. PCR conditions were as follows: initial denaturation at 95°C for 5 min, followed by 36 cycles of 30s at 95°C, annealing at 60°C for 30s, extension at 72°C for 30s, and a final extension at 72°C for 5 min. The PCR products were directly sequenced using ABI3730xl DNA analyzer (Applied Biosystems, CA, USA), and the sequences were compared by DNAMAN 6.0 (Lynnon Biosoft, USA) and Chromas 2.0 (Technelysium, Helensvale, Australia) to search SNPs. The identified SNPs were further genotyped for all the individuals using the iPLEX MassARRAY system (Sequenom Inc., CA, USA).

Haplotype Analysis
Sporadic missing genotype data were imputed using Beagle 3.2 program [20]. Linkage disequilibrium (LD) extent between all SNP pairs and haplotype blocks were estimated using the software Haploview 4.2 (Broad Institute of MIT and Harvard, Cambridge, MA, USA).

Association Analyses
Association analyses were conducted using SAS 9.1.3 (SAS Institute Inc., Cary, NC, USA) between the seven SNP genotypes or haplotypes and the five lactation traits, based on the following single-trait linear regression model: For each trait, y is a vector of "phenotypes" (EBVs) for all the 717 daughters, μ is the overall mean, b is a vector of SNP genotype or haplotype effects (i.e., fixed-effects), X is the design vector (0 or 1) linking a SNP genotype or haplotype to its genetic effect, a $ Nð0; As 2 a Þ is vector of (random) additive genetic value of each animal (not including the effects of SNP genotypes or haplotypes) with s 2 a being the additive genetic variance, Z is a diagonal incidence (0 or 1) matrix that links each animal additive genetic value to its "phenotype", and e $ Nð0; Ws 2 e Þ is vector of residual terms with s 2 e being the residual variance. In the above, A is an additive genetic relationship numerator matrix, which was computed by tracing the pedigree back to three generations of 1,619 involved individuals, and W is a diagonal matrix with each diagonal element being equal to the inverse of the reliability of EBV for the corresponding individual [21]. Variance components were estimated based on the data of 30,000 Chinese Holstein cows in Beijing area by using the DMU package version 6 (University of Aarhus, Foulum, Denmark).
Bonferroni correction was performed for multiple t-testing through dividing the significance level by the number of tests, which is the number of SNPs being investigated. Hence, an association was considered statistically significant from being null effect if a raw P value is less than 0.05/N, where N is the number of SNP loci.

Recombinant Plasmid Construction
To directly detect the allele-specific effects of the SNPs g.-963C>A and g.-781A>G on the promoter activity, three luciferase reporter gene fragments corresponding to the SAA1 promoter (-973 to +46, +1 represents transcription start site) were synthesized, representing three haplotypes constructed by g.-963C>A and g.-781A>G (CA; CG; AG). As designed, the three promoter fragments, including the KpnI and BglII restriction sites at the 5' and 3' termini, respectively, were synthesized (Genewiz, Suzhou, China) and cloned into the pGL4.14 Luciferase Assay Vector (Promega). The three plasmid constructs were sequenced to confirm the integrity of each construct's insertions. All plasmids were purified with the Endo-free Plasmid Maxi Kit (ComWin Biotech, Beijing, China).

Cell Culture and Luciferase Assay
Human Embryonic Kidney (HEK) -293T cells (provided by the Department of Biochemistry, Guangdong Medical, China) were maintained in Dulbecco's modified Eagle's medium (DMEM) containing 10% heat-inactivated fetal bovine serum (FBS), 100 mg streptomycin and 100 U/ml penicillin at 5% CO 2 and 37°C. Cells were seeded in 24-well plates at approximately 2×10 5 cells per /well before transfection. Cells were transiently transfected using Lipofectamine 2000 (Invitrogen, CA, USA) according to the manufacturer's protocol. For each cell, 0.9 g of the construct was co-transfected along with 0.1 g of pRL-TK renilla luciferase reporter vector (Promega) to control transfection efficiency and variation in plating. All the experiments were performed in three replicates for each construct.
Approximately 48 h after transfection, cells were harvested and the activity of both firefly and Renilla luciferases were measured using a Dual-Luciferase Reporter Assay System (Promega) on a Modulus microplate multimode reader (Turner Biosystems, CA, USA). The normalized luciferase data (firefly/renilla) were used to perform the average statistics of three replicates.

Electrophoretic Mobility-Shift Assay (EMSA)
To assess the alteration of the DNA-binding ability caused by the regulatory mutations, EMSA was performed using biotinylated oligonucleotide probes containing variant g.-963C>A (AGGATGAAGCACAAGA/CAACATTTTCTT). Nuclear extracts from the mammary gland tissue of lactating Holstein cows were prepared using a NE-PER 1 Nuclear and Cytoplasmic Extraction Reagents (Thermo Scientific).
Fifteen μg of crude nuclear protein were incubated for 20 min at room temperature in a 15μl binding reaction system, which included 1.5μl 10×binding buffer, 1.5 μl poly(dI-dC) (1.0ug/ ul), and ddH 2 O to meet a final volume of 10 μl. For each reaction, 0.6 μl Bio-wild probe or Biomutant probe (500fM) was added, and the reaction was incubated for 20 min at room temperature. As negative controls for the competition binding, 100-fold excess of non-radio labelled probes were used as competitors before the labeled probe and incubated for 20 min. Finally, the protein-DNA complexes were separated by electrophoresis in a native 6% DNA retardation gel at 150 V for 4 hours.

Bioinformatics Analysis
Firstly, putative differential transcription factor binding sites (TFBSs) of the genetic variants in promoter region were predicted with the online software TFSEARCH (http://mbs.cbrc.jp/ research/db/TFSEARCH.html). Also, secondary structures of the bovine SAA1 mRNA were predicted by the software RNA structure4.6. For the missense mutation c. +2510A>G, the potential impact on the structure alteration of the SAA protein was predicted by SOPMA SERVER (http://npsa-pbil.ibcp.fr/) while functional prediction was performed by the web server tool SIFT (http://sift.jcvi.org/www/SIFT.html).

SNP Identification
A total of 7 SNPs for SAA1 were discovered (Fig 1), including 4 novel SNPs in the promoter regions (g.-1788C>T, g.-963C>A, g.-781A>G and g.-757T>C) and 3 previously reported SNPs in coding region, c. +2510A>G (rs378661311), c.+2535C>T (rs383809871) and c.+-2565G>T (rs208890658). Of these, c. +2510 A>G was predicted to result in an amino acid replacement of Gly with Asp. All the 7 SNPs were finally individually genotyped for the association analysis. Chi-square tests showed that all these SNPs were in Hardy-Weinberg equilibrium (P >0.05) in this experimental dairy population. The genotypic and allele frequencies were summarized in Table 2.

Regulation of the Promoter SNPs on Transcriptional Activity
As shown in Figs 3 and 4, the promoter activity of all three constructs was significantly higher than that of the pGL4.14 vector (P <0.01). The transcriptional level of haplotype AG was significantly higher than haplotypes CA and CG (P <0.01). These results suggested clearly indicated that the A allele in g.-963C>A up-regulated the transcriptional activity of the SAA1 promoter, while g.-781A>G did not impact the transcription (P >0.05).  Identification of TFBSs at the g.-963 C>A by EMSA EMSA was performed to determine the above luciferase assay results. A pair of 5'-biotinylated oligonucleotide probes that contained either the wild-type sequences or the mutated bases were generated. As shown in Fig 5, the wild type A probe of g.-963 C>A visualized a DNA-protein complex while the mutated type C probe showed a weaker complex, confirming a binding of transcription factor in the allele A of g.-845 G>A. Whereas, the allele C and A of g.-963 C>A were shown to have undetectable DNA-protein binding (Fig 5).

TFBSs Prediction
It was predicted that g.-963C>A created a TFBS, Poly ADP-ribose polymerase (PARP), while its substitution by A allele did not. The other 3 SNPs (g.-1788C>T, g.-781A>G and g.-757T>C) in the promoter did not change the binding of any TFBSs. Therefore, g.-963C>A polymorphism might affect the binding affinity of the surrounding sequences with the PARP and may be responsible for the activity of SAA1 promoter.

Non-Synonymous Functional Prediction
The alterations in the secondary structure of the SAA1 mRNA caused by the non-synonymous mutation c. +2510A>G were predicted, which showed no difference in mRNA structure. Additionally, the effect of this mutation was predicted to be 'tolerated' by the SIFT programs, and the protein function might not be influenced. The results from the SOPMA server revealed that alpha helix were found to be changed predominantly from 51.54% to 53.08% by the substitution Gly/Asp (c. +2510A>G; G/A).

Discussion
This study was a follow-up work to our previous RNA-seq work, in which the SAA1 gene was detected to differentially express in the mammary gland between the high and low groups for milk protein and fat percentage of lactating Holstein cows [11]. By association analysis and function validation, we found that two SNPs, g.-963C>A changing the promoter activity and c. +2510A>G changing protein secondary structure, could be causative mutations. Our findings provided the first evidence for significant associations of the SAA1 gene with milk production traits in dairy cattle. Genetic variations in promoter regions may cause significantly potential phenotype diversity [22][23]. Coincided with these observations, we found that g.-963C>A and g.-781A>G loci in promoter region significantly associated with MY and PY. Individuals with genotypes CC and AA had higher MY or PY than those with genotypes AA and GG. Numerous studies have reported that the polymorphisms in the promoter region are associated with gene expression by altering putative transcription factor-binding sites [24][25][26][27][28]. In the present study, luciferase assay and EMSA analysis clearly demonstrated that the allele A of g.-963C>A in the promoter of the SAA1 gene was associated with increased the promoter activity possibly initiated by binding the PARP factor while allele C did not. PARP, as a transcription factor, plays important roles in regulating gene expression involved in DNA damage repair, cell cycle and metabolism [29][30][31][32]. Furthermore, PARP has been identified as a co-activator of several transcription factors to participate in the activation of NF-κB [33].
Considering the significant association effects of g.-963C>A on milk yield and milk protein yield traits and its impact on transcription activity, it is possible that g.-963C>A might be the causative mutation of SAA1, which regulates the SAA1 expression by changing the binding status of the transcription factor, PARP, to impact the formation of milk and protein traits. As for the loci g.-781A>G, its impact on transcription activity did not reach significance level. Hence, it was excluded to be a potential causative mutation. The significant association of g.-781A>G with MY and PY may be due to the very strong LD between g.-781A>G and g.-963C>A (D' = 0.97). These findings suggest that the locus g.-963C>A is directly responsible for the SAA1 expression and would be applied in marker-assisted selection.
On the other hand, the SNP c. +2510A>G was a missense mutation, leading to an amino acid substitution (p. Gly48Asp). Generally speaking, non-synonymous SNPs might be acting through influencing protein stability and/or function. Association study showed that this SNP was significantly associated with MY, FP and PY (0.00002 < P < 0.0004) Further prediction using the SOPMA server revealed that the alpha helix was changed from 51.54% to 53.08% by the substitution Gly/Asp (G/A). Because the alpha helix was preferably located at the core of the protein and had important functions in proteins for flexibility and conformational changes [34], it was presumed that the SAA protein was more stable in conformation when the base was replaced by G. Hence, the SNP c.+2510A>G might be another potential functional mutation.
Because synonymous SNPs (sSNPs) do not change the amino acids sequences, this type of genetic variations is not expected to change the structure and function of the related proteins. However, accumulating evidence indicates that sSNPs could be acting through influencing the conformation and stability of pre-mRNAs or protein expression [35][36]. Besides, sSNPs may influence gene expression by different codon usage [37][38]. In the present study, c.+2535C>T and c.+2565G>T were actually sSNPs. However, considering their significant effects on MY and PY, we inferred the most important reason was the very strong LD with the missense mutation c. +2510A>G (the values of D' was 0.95 and 0.98, respectively). Also possibly, the two sSNPs (c. +2535C>T and c. +2565G>T) identified in SAA1 also might be involved in the process of milk production through altering or increasing the mRNA stability.
In cattle, the SAA1 encodes the acute phase protein serum amyloid A (SAA) which is mainly synthesized in the liver during inflammatory response [39]. However, Mahony et al (2006) also showed that SAA did not significantly influence milk amyloid A and the latter was correlated with with cell-based indicators of intramammary inflammation [40]. In addition, previous study reported that SAA1 might be involved in the development of the mammary gland by an NF-κB-dependent signaling pathway [41]. Over-expression of SAA1 suppressed growth of mammary epithelial cells [42]. In this study, integrated analysis of differential gene expression, phenotype data, and transcriptional activity indicated that SAA1 may play a negative regulation role in the milk production traits in dairy cattle. Our results suggested that the SAA1 may be an important gene for milk production traits in dairy cattle.

Conclusion
The present study confirmed that the SAA1 gene had significant association effects on milk yield and composition traits in dairy cattle. A total of 7 SNPs were identified in SAA1 and 6 out of them were observed to be significantly associated with milk yield and composition traits. Further function analysis indicated that the loci g.-963C>A and c. +2510A>G could be the causal mutations contributed to the observed associations through altering promoter transcriptional activity and protein conformational changes. Our findings indicate that the identified variants are helpful for advanced marker-assisted selection in dairy cattle population. Further in-depth investigations are required to validate the biological mechanisms of the SAA1 gene on the formation of milk production traits in dairy cattle.
Supporting Information S1 File. This file contains associations of the haplotypes with 5 milk production traits. Table A, Associations of the haplotype combinations formed by g. -963C>A and g. -781A>G in block 1 with milk production traits in Chinese Holsteins. Table B, Associations of the haplotype combinations formed by c. +2510A>G, c. +2535C>T and c. +2565G>A in block 2 with milk production traits in Chinese Holsteins. (DOCX)