Bovine Mammary Gene Expression Profiling during the Onset of Lactation

Background Lactogenesis includes two stages. Stage I begins a few weeks before parturition. Stage II is initiated around the time of parturition and extends for several days afterwards. Methodology/Principal Findings To better understand the molecular events underlying these changes, genome-wide gene expression profiling was conducted using digital gene expression (DGE) on bovine mammary tissue at three time points (on approximately day 35 before parturition (−35 d), day 7 before parturition (−7 d) and day 3 after parturition (+3 d)). Approximately 6.2 million (M), 5.8 million (M) and 6.1 million (M) 21-nt cDNA tags were sequenced in the three cDNA libraries (−35 d, −7 d and +3 d), respectively. After aligning to the reference sequences, the three cDNA libraries included 8,662, 8,363 and 8,359 genes, respectively. With a fold change cutoff criteria of ≥2 or ≤−2 and a false discovery rate (FDR) of ≤0.001, a total of 812 genes were significantly differentially expressed at −7 d compared with −35 d (stage I). Gene ontology analysis showed that those significantly differentially expressed genes were mainly associated with cell cycle, lipid metabolism, immune response and biological adhesion. A total of 1,189 genes were significantly differentially expressed at +3 d compared with −7 d (stage II), and these genes were mainly associated with the immune response and cell cycle. Moreover, there were 1,672 genes significantly differentially expressed at +3 d compared with −35 d. Gene ontology analysis showed that the main differentially expressed genes were those associated with metabolic processes. Conclusions The results suggest that the mammary gland begins to lactate not only by a gain of function but also by a broad suppression of function to effectively push most of the cell's resources towards lactation.


Introduction
Lactating cows are generally dried off simply by stopping the milking process approximately two months before the next parturition. The mammary gland then undergoes an involution process, which is marked by the cessation of secretory activity and the reabsorption of milk residue, followed by a relatively static period. The dry period has been proven important for dairy cows. Omitting or shortening the dry period imposes negative effects on mammary health and milk yield in the next lactation [1][2][3]. The mammary gland does not resume its activity until approximately 2 to 3 weeks before the next parturition, when dramatic changes occur to prepare for profuse milk secretion after parturition. Lactogenesis is defined as the process from the resumption of mammary activity until profuse milk secretion and is divided into two stages [4]. Stage I is the period from the resumption of mammary activity to the time of parturition and is characterized by mammary differentiation, proliferation, and progressive expression of milk protein, as well as the secretion of precolostrum. Stage II is initiated around the time of parturition and extends for several days afterwards. This stage is characterized by the closure of the tight junctions between alveolar cells and the formation and secretion of colostrum and milk. The mammary gland is the only organ that experiences regular proliferation and involution cycles after maturity, which makes it an ideal model for the study of organ development. Knowledge of the molecular events driving lactogenesis in dairy cows has contributed not only to the understanding of organ development but also to the development of new technologies in the management and breeding of dairy cattle. To date, knowledge about this aspect of the mammary gland has mainly come from studies using mammary cell lines and genetically modified mice [5][6][7]. Some proteins involved in lactogenesis and their related signaling or metabolic pathways have been identified [8,9]. It has been suggested that there is no sudden transcriptional switch around the time of parturition. Preparation of the gland for lactation includes modifications to the transcriptional program, but the onset of lactation appears to be primarily controlled by post-transcriptional mechanisms [10]. Lemay et al [11] analyzed the microarray data sets of mammary gland RNA samples collected from FVB mice at 10 time points during mammary development, and the results indicated SAM68 (an RNA-binding transduction protein and a putative regulator of mRNA splicing, transduction and nuclear export) to be an important post-transcriptional regulator of both milk secretion and mammary cell survival during lactation. Finucane et al [12] studied the molecular events in stage II of  lactogenesis using Affymetrix GeneChip Bovine Genome Array, and found that genes associated with cell cycle and proliferation were downregulated. It was consistent with the result of Sorensen et al [13], which indicated that the mammary glands of cows proliferated mainly in late pregnancy and almost ceased proliferating after parturition. However, in dairy goats, mammary growth continued into early lactation, peaking at day 5 of lactation [14]. In addition, Suchyta et al [15] compared the microarray generated transcript profiles of liver, spleen, thymus, adrenal, ileum, and lymph tissues collected from a 3 month old Holstein steer and of mammary tissues collected from pre-pubertal and post-pubertal Holstein heifers, and identified a putative set of 16 genes being preferentially expressed in the developing mammary gland. In stage II of lactogenesis, the mammary gland undergoes a set of developmental processes that lead to the secretion of colostrum, and then milk. The major compositions of milk are lactose, protein and fat. Bionaz and Loor [9,16] evaluated the expression of 44 genes involved in bovine mammary milk protein synthesis and 45 genes involved in milk fat synthesis via quantitative PCR. The results of those studies supported a pivotal role for the concerted action of PPARG, PPARGC1A, and INSIG1 in the regulation of milk fat synthesis and a central role of amino acid and glucose transporters and insulin signaling through mTOR in the regulation of protein synthesis in the bovine mammary gland. Currently, little is known concerning the molecular events underlying lactogenesis. Comparisons of the transcriptomes of preand post-parturition mammary glands of dairy cows by Kiera et al [12], provided only some insights into the molecular events of stage II. To gain information on what occurs during stage I, the transcriptomes of the relatively static mammary gland and that of stage I should be compared. Therefore, mammary gland samples from Holstein cows at approximately 235 d, 27 d and +3 d relative to parturition were collected for the present study. Deep sequencing of the transcriptomes in these samples was performed using the newly developed digital gene expression (DGE) method, which was thought to be more reliable, repeatable, and precise compared to previous microarray technology [17].

Ethics Statement
Our study had been approved by the Institutional Animal Care and Use Committee of Shandong Agricultural University and performed in accordance with the ''Guidelines for Experimental Animals'' of the Ministry of Science and Technology (Beijing, China).

Mammary tissue collection
Mammary gland samples were collected from 18 Holstein cows reared in dairy farm No. 1 of the Jiabao dairy company, Jinan, China. They were all in the second parity with the same age.  There were 6 cows in each group. Mammary gland tissue samples were collected from cows at three different time-points wherein distinct differences exist in the internal status of the gland: one, 3562 days before parturition (235 d) which is a totally dry period, two, 762 days before parturition (27 d) characterized by accelerated cell proliferation and pre-colostrum secretion (stage I of lactogenesis) and three, 3 days after parturition (+3 d) characterized by colostrum production (stage II of lactogenesis). Mammary biopsy was performed using Bard Magnum biopsy system (Bard Peripheral Vascular, Inc., Tempe, AZ, US). The collected samples were immediately frozen and stored in liquid nitrogen until further analysis.

RNA isolation
To reduce the number of mammary samples needed for DGE analysis, samples from cows in the same group were pooled together for RNA isolation. Total RNA was extracted with Trizol reagent (Invitrogen, Carlsbad, CA, USA) in accordance with the manufacturer's instructions. The quantity and quality of the extracted RNA was determined using a spectrophotometer (Biophotometer Plus, Eppendorf, Germany), with RNA quality being evaluated by the absorbance ratio at 260 nm/280 nm.

High-throughput sequencing
High-throughput sequencing of mRNA was performed on an Illumina HiSeq 2000 platform. All sample preparation and sequencing procedures were executed according to the manufacturer's instructions (Illumina, Inc., San Diego, CA, USA). Briefly, mRNA from 6 mg of total RNA extracted from samples at each time point was purified by adsorbing to Oligo (dT) magnetic beads, and then use Oligo (dT) as primer to synthesize the first and second-strand cDNA. The bead-bound cDNA was digested with restriction enzyme NlaIII, which recognized and cut off the CATG sites. The fragments apart from the 39 cDNA fragments connected to Oligo (dT) beads were washed away and the Illumina adaptor 1 was ligated to the sticky 59 end of the digested beadbound cDNA fragments. The junction of Illumina adaptor 1 and CATG site was the recognition site of Mmel, which was a type of Endonuclease with separated recognition sites and digestion sites. It cut at 17 bp downstream of the CATG site, producing tags with adaptor 1. After removing 39 fragments with magnetic beads precipitation, Illumina adaptor 2 was ligated to the 39 ends of tags, acquiring tags with different adaptors of both ends to form a tag library. After 15 cycles of linear PCR amplification, 105 bp fragments are purified by 6% TBE PAGE Gel electrophoresis. After denaturation, the single-stranded cDNA was anchored on Illumina flowcells of the Illumina HiSeq 2000 system and sequenced. Raw data (tag sequences) were deposited in NCBI's Gene Expression Omnibus (GEO) database under submission number GSE44796.

Tag mapping
For the raw data, we filtered adaptor sequences, low quality tags (tags with unknown nucleotides N), empty reads and tags that were too short or too long, and tags with only one copy to get clean tags.
A virtual library containing all the possible CATG+17 bases length sequences of the bovine gene sequences were constructed by BGI-shenzhen [17][18][19], using Bos taurus UMD 3.1 [20]. All clean tags were mapped to the reference sequences, and only upto 1-bp mismatch was considered. Clean tags that mapped to reference sequences from multiple genes were filtered out. The remaining clean tags were designated as unambiguous clean tags. The number of unambiguous clean tags for each gene was

Screening of differentially expressed genes
A rigorous algorithm has been developed to identify genes expressed differentially between samples. Because every gene's expression occupies only a small part of the library, p(x) was within the Poisson distribution. The probability of a given gene being expressed equally between two samples can be calculated with the following equation [21]: where the total number of clean tags in sample 1 is N 1 , the total number of clean tags in sample 2 is N 2 , and there are x tags in sample 1 and y tags in sample 2 for gene A. False Discovery Rate (FDR) analysis was applied to determine the threshold P value in multiple tests and analyses [22]. The significance of the difference in gene expression was judged using a threshold of FDR#0.001 and fold-change value $2 (|log2 Ratio|$1).

Sequencing data analysis
The selected genes with significant modification in their expression were subjected to further analysis, which were GO enrichment analysis, pathway enrichment analysis and functional annotation clustering.
GO enrichment analysis of functional significance applies hypergeometric test to map all differentially expressed genes (DEGs) to terms in Gene Ontology (GO) database (Version 1.1.1631) [23], looking for significantly enriched GO terms in DEGs comparing to the genome background. The calculating formula is [24]: where N is the number of all genes with GO annotation; n is the number of DEGs in N; M is the number of all genes that are annotated to the certain GO terms; m is the number of DEGs in M. After Bonferroni correction, GO terms with corrected-p value #0.05 were significantly enriched in differentially expressed genes.     Pathway enrichment analysis based on Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Version 58) can identify significantly enriched biological pathways in DEGs compared with the whole genome background [25]. Enriched pathways were calculated using the same formula as that in GO analysis. Here N is the number of all genes with KEGG annotation, n is the number of DEGs in N, M is the number of all genes annotated in specific pathways, and m is the number of DEGs in M. Pathways with a Q value #0.05 were significantly enriched in differentially expressed genes.
The differentially expressed genes detected in stage I and stage II of lactogenesis were further analyzed using DAVID functional annotation clustering tool [26]. This tool can provide a look at the internal relationships of the clustered terms.

Quantitative real-time PCR (qRT-PCR)
QRT-PCR analysis was used to verify the DGE results. The RNA samples used for the qRT-PCR assay were both the same as for the DGE experiments and independent RNA extractions from biological replicates. 13 candidate genes were selected and detected using qRT-PCR, including caseinb, CIDEA, GLYCAM, BGN, ELL3, PAH, GP2, LOC525947, SLC14A1, SST, PNMT, caseina-S2 and integrinb. qRT-PCR was performed according to the TaKaRa manufacturer specifications (TaKaRa SYBRH PrimeScript TM RT-PCR Kit, Dalian, China). SYBR Green PCR cycling was denatured using a program of 95uC for 10 s, and 40 cycles of 95uC for 5 s and 60uC for 40 s, and performed on an ABI 7500 instrument (Applied Biosystems, Foster City, CA, USA). Data were reported as values normalized to the housekeeping gene b-actin, and they were subjected to one-way ANOVA analysis using Statistical Analysis Systems statistical software package (Version 8e, SAS Institute, Cary, NC, USA). Means were considered significantly different at p,0.05.

Analysis of the three cDNA libraries
Characterization of the three cDNA libraries (235 d, 27 d and +3 d) was listed in Table 1. Sequencing depths of 6,203,209, 5,857,755 and 6,115,741 tags were identified in the three libraries,   Figure 1. When the number of sequenced tags reached 2M or more, the number of detected genes almost ceased increasing. There were 6.2M, 5.8M and 6.1M tags in the three libraries; thus, all three were sequenced to saturation, producing a full representation of the transcripts present in each of the three stages.

Analysis of tag annotation
A reference database of the bovine genome includes 35,945 transcripts [20]. A total of 193,547 reference tags with 94,396 (48.77%) unambiguous reference tags were obtained. All clean tags in the three cDNA libraries were mapped to the reference tags, and the results were shown in Figure 2. There were 59,537 (30.86%), 48,944 (28.72%) and 48,742 (28.98%) distinct tags matched to the reference genes in the three libraries (235 d, 27 d and +3 d), respectively. Among these tags, there were 8.99%, 9.60% and 9.33% tags, respectively, matched to the anti-sense strand of the genes, suggesting that the antisense strand of these genes also had transcripts and that these genes might therefore have sense-antisense regulation. The unmatched tags were then mapped to the bovine genome, and 37.90%, 40.08% and 39.40% of tags, respectively, were matched to the genomic sequences in the three libraries. These tags might represent non-annotated genes or could possibly be derived from intergenic regions not encoding any transcripts.

Detection of differentially expressed genes and GO enrichment analysis
At the cutoff criteria of FDR#0.001 and |log2 Ratio|$1, many genes were differentially expressed; these results were shown in Figure 3. The details of these differentially expressed genes were supplied in Excel S1, S2 and S3. A total of 812 genes were significantly differentially expressed at 27 d compared with 235 d (stage I), accounting for 9.70% of transcripts in the 27 d cDNA library. There were 234 (28.80%) genes upregulated (in red) and 578 (71.20%) genes downregulated (in green). The number of genes having greater than a five-fold difference accounted for 0.45% of the total differentially expressed genes (Figure 4). A total of 1,189 (14.20%) genes were significantly differentially expressed, with 274 (23.00%) genes upregulated and 915 (77.00%) genes downregulated at +3 d compared with 27 d (stage II). The number of genes having greater than a five-fold difference accounted for 0.41% of the total differentially expressed genes. Moreover, there were 1,672 genes significantly differentially expressed at +3 d compared with 235 d. There were 209 (12.5%) upregulated genes, and 1,463 (87.5%) downregulated genes. Among those genes, there were 11 (0.65%) genes having greater than a five-fold difference.
The differentially expressed genes were analyzed in the context of gene ontology (GO) biological processes. This analysis revealed that the upregulated genes in DEGs were mainly associated with the cell cycle and lipid metabolism, and the downregulated genes were mainly associated with the immune response and biological adhesion at 27 d compared with 235 d. And the upregulated genes in DEGs at +3 d compared with 27 d were mainly associated with the immune response, and the downregulated genes were mainly associated with the cell cycle phase. The most significantly enriched biological processes in DEGs were metabolic processes at +3 d compared with 235 d.

Pathway enrichment analysis of differentially expressed genes
After pathway enrichment analysis, the top 10 significantly enriched pathways were listed in Tables 2, 3 and 4. The results suggested that many of the pathways enriched in DEGs were

Analysis of differentially expressed genes throughout the onset of lactation
There were 231 genes differentially expressed in the two stages of lactogenesis. Among these genes, 92 genes were downregulated in stage I and upregulated in stage II, 111 genes were upregulated in stage I and downregulated in stage II, and only 11 genes continued to increase in both stages. After analysis of functional annotation clustering, 92 genes fell into 24 clusters. Each term in Cluster 4 was associated with the immune response, and each in cluster two was associated with the extracellular matrix. The top 5 enriched clusters were listed in Table 7. After pathway enrichment analysis, the ECM-Receptor interaction was enriched ( Figure 5). In addition, functional annotation clustering of the 111 genes that were upregulated initially and then downregulated fell into 19 clusters; the top 5 enriched clusters were listed in Table 8. We can find that all terms in Cluster 1 were associated with cell cycle.

Quantitative real-time PCR (qRT-PCR) confirmation
The results of qRT-PCR were shown in Figure 6, the expression patterns of 13 genes showed a general agreement with the results of the DGE experiments, suggesting that DGE was an efficient and accurate strategy for the detection of differentially expressed genes.

Discussion
The major goal of this study was to explore genome-wide gene expression profiles during the onset of lactation and to provide information for understanding the underlying molecular mechanisms. Three libraries were constructed from mammary tissues at three different stages (235 d, 27 d and +3 d relative to parturition). Saturation analysis of the sequences indicated that the three libraries had reached saturation and were therefore complete assessments of all transcripts present in the libraries. The three libraries contained 192,928, 170,427 and 168,233 distinct clean tags, respectively. However, a reference database includes 35,945 transcripts. Theoretically, a tag should be generated by NlaIII digestion from the 39-most end of a transcript. However, many transcripts contain more than one CATG site, so tags from other NlaIII sites were also generated in our libraries. Because only one tag can be generated per transcript from a given NlaIII site in a cDNA, these extra NlaIII tags represent genes redundantly present in the expression profile. Thus, the number of unique tags generated was greater than that of the annotated bovine genome.
To validate the DGE method, the levels of 13 genes were analyzed using qRT-PCR. Although the differences in the expression of some genes did not match the magnitude of those detected using the DGE method, the trends of upregulation and downregulation were similar. Compared with microarray and qRT-PCR methods, the sequencing method has been documented to be more sensitive for the estimation of gene expression, especially for low-abundance transcripts [17].
Changes in mammary epithelial cells and the extracellular matrix at the onset of lactation During the transition from pregnancy to lactation in the dairy cow, the mammary gland undergoes dramatic functional and metabolic changes, including the morphogenesis of mammary ducts during early pregnancy and differentiation of the mammary alveolus during late pregnancy [27,28]. This study had found that many genes associated with the cell cycle were upregulated in stage I and downregulated in stage II, including cell division cycle associated 3 (CDCA3), Protein FAM83D (FAM83D), coiled-coil domain containing 99 (CCDC99), and others. These results suggested that cell proliferation occurred in late pregnancy and almost ceased after parturition, which was consistent with the early study [13]. Genes encoding extracellular matrix proteins were downregulated in stage I and upregulated in stage II. In addition, the ECM-Receptor interaction pathway was significantly enriched, and the differentially expressed genes in this pathway were downregulated and then later upregulated. These results demonstrated that, in stage I, the communication between cells and the extracellular matrix became weak with the proliferating of mammary epithelial cells and decreasing of the extracellular matrix, which recovered in stage II. A study in mice found that the communication between mammary epithelial cells and their environment became weak during the lactation period [11].
Changes in genes associated with the immune response At 27 d compared with 235 d, genes associated with the immune response accounted for 14.5% of differentially expressed genes, and genes associated with the defense response accounted for 8.20% of differentially expressed genes. Most of these genes were downregulated; for example, Chemokine (C-C motif) ligand 28 (CCL28) was completed turned off at 27 d. CCL28 is selectively expressed in certain mucosal tissues such as the exocrine glands, trachea, and colon and has a potent antimicrobial activity against Candida albicans, Gram-negative bacteria, and Grampositive bacteria [29,30]. Udder infections and mastitis are major problems for the dairy industry throughout the world, and the yearly costs are substantial. The risk of udder infection is the highest during the drying-off period and around the time of parturition [31,32]. Adequate immune function is essential for the defense against udder infections. These genes associated with the immune response might be important for the immune function of the mammary gland and thus could be candidate genes for improving immune function.

Expression profiles and regulation of milk protein genes
In this study, many milk protein genes were upregulated in stage I of lactogenesis. It has been shown previously in mouse [33], rat [34], and rabbit [35] that the expression of milk protein genes starts in early to mid-pregnancy, increases throughout pregnancy and reaches a plateau in late pregnancy and early lactation. However, Finucane et al [12] found that milk protein genes did not show significant changes in expression from late pregnancy to early lactation but that protein expression of glucose transporter GLUT1 did increase. The regulation of protein synthesis, particularly translation, in all mammalian tissues appears to be under control of the mTOR pathway [9]. Recently, studies in rodents and ruminants found that the mTOR pathway was very important for milk protein synthesis [36][37][38][39][40]. However, our results indicated that AKT (21.10*) and PI3K (21.20*) were downregulated significantly at 27 d compared with 235 d, which were upstream regulators of mTOR. In addition, some amino acid transporters were also downregulated during stage I, e.g. SLC43A2 (21.85*), SLC3A2 (21.58*) and SLC14A1 (21.04*). In the mouse oocyte, mRNAs can be stored and then activated at the proper time [41]. Further study is required to investigate whether mRNA storage occurs in the bovine mammary gland.
The main regulator of milk protein expression in non-ruminant mammary glands appears to be the Jak-Stat5 signaling pathway [42]. In addition to protein synthesis, STAT5 is important for mammary gland development [43]. In bovines, STAT5 responds to prolactin and other lactogenic growth factors, and its activity increases during lactation [44,45]. However, when compared with the rodent mammary gland, the role of bovine STAT5 in controlling milk protein expression through the Jak-Stat5 signaling pathway appears to be considerably weaker [46]. Our data appear to support a minor role of Jak-Stat5 signaling in milk protein synthesis, implied by the lack of any changes in the expression of PRLR and STAT5B during lactation. However, STAT5B activity is mostly regulated by its phosphorylation status, and this appears to be already regulated at the onset of lactation. Further study is required to investigate whether STAT5 phosphorylation increases at the onset of lactation.

Lipid metabolism and its regulation at the onset of lactation
It has been speculated that there are two modes of regulation to control fatty acid synthesis in the mammary gland of the lactating mouse: the well-known SREBF1 system and a novel mechanism that acts at the posttranscriptional level in SCAP deletion mice fed a high-fat diet, leading to alterations to enzyme protein [47]. In this study, sterol regulatory element binding transcription factor 1 (SREBF1) and cofactor insulin induced gene 1 (INSIG1) increased 2.5-fold and 2.4-fold at 27 d compared with 235 d, respectively. It is possible that SREBF1 and cofactor INSIG1 regulate fat synthesis at the onset of lactation [48]. It has been postulated that, with the activation of secretion, SREBP-1 and its congener SREBP-2, the activators of cholesterol biosynthesis, are shuttled from the endoplasmic reticulum to the Golgi apparatus. Once in the Golgi, they are activated by proteolytic cleavage of a cytoplasmic fragment, a bHLH transcription factor that travels to the nucleus, directly activating the genes required for the synthesis of fatty acids and cholesterol [49]. This mechanism of action has been well studied in liver [50] and adipose tissues [51], and the promoters of the activated genes include SREs (sterol response elements) as well as binding sites for NF-Y, USF, SP1, and SP3 [52][53][54].
It is well known that the onset of lactation includes two different stages. And in stage I of lactogenesis, the mammary epithelial cells proliferate and differentiate, the extracellular matrix diminishes, and the communication between the two becomes weak. As milk protein genes and lipogenesis genes are upregulated, genes associated with the immune response are downregulated. In stage II of lactogenesis, the mammary gland restores the communication between mammary epithelial cells and the extracellular matrix. The main result of this process is the synthesis of milk using nutrients from the blood.

Supporting Information
File S1 The details of differentially expressed genes at 27 d compared with 235 d.