Comparative transcriptome analysis of mammary epithelial cells at different stages of lactation reveals wide differences in gene expression and pathways regulating milk synthesis between Jersey and Kashmiri cattle

Jersey and Kashmiri cattle are important dairy breeds that contribute significantly to the total milk production of the Indian northern state of Jammu and Kashmir. The Kashmiri cattle germplasm has been extensively diluted through crossbreeding with Jersey cattle with the goal of enhancing its milk production ability. However, crossbred animals are prone to diseases resulting to unsustainable milk production. This study aimed to provide a comprehensive transcriptome profile of mammary gland epithelial cells at different stages of lactation and to find key differences in genes and pathways regulating milk traits between Jersey and Kashmiri cattle. Mammary epithelial cells (MEC) isolated from milk obtained from six lactating cows (three Jersey and three Kashmiri cattle) on day 15 (D15), D90 and D250 in milk, representing early, mid and late lactation, respectively were used. RNA isolated from MEC was subjected to next-generation RNA sequencing and bioinformatics processing. Casein and whey protein genes were found to be highly expressed throughout the lactation stages in both breeds. Largest differences in differentially expressed genes (DEG) were between D15 vs D90 (1,805 genes) in Kashmiri cattle and, D15 vs D250 (3,392 genes) in Jersey cattle. A total of 1,103, 1,356 and 1,397 genes were differentially expressed between Kashmiri and Jersey cattle on D15, D90 and D250, respectively. Antioxidant genes like RPLPO and RPS28 were highly expressed in Kashmiri cattle. Differentially expressed genes in both Kashmiri and Jersey were enriched for multicellular organismal process, receptor activity, catalytic activity, signal transducer activity, macromolecular complex and developmental process gene ontology terms. Whereas, biological regulation, endopeptidase activity and response to stimulus were enriched in Kashmiri cattle and, reproduction and immune system process were enriched in Jersey cattle. Most of the pathways responsible for regulation of milk production like JAK-STAT, p38 MAPK pathway, PI3 kinase pathway were enriched by DEG in Jersey cattle only. Although Kashmiri has poor milk production efficiency, the present study suggests possible physicochemical and antioxidant properties of Kashmiri cattle milk that needs to be further explored.


Introduction
Mammary gland development and the physiological control of its dynamics are a vital part of the mammalian reproduction strategy [1][2]. Milk evolved as an essential source of nutrients and immune factors including immune-modulatory, anti-inflammatory and anti-microbial agents that offer protection against infections [3][4]. Milk yield and quality are important economic traits. An increase in the efficiency of milk synthesis both in terms of quality and quantity is a highly desirable goal for the dairy industry [5]. The mammary gland displays a high level of developmental plasticity with the ability to undergo repeated cycles of growth and regression [6]. Lactation is a dynamic physiological process characterized by an initial rapid increase in milk yield during early lactation, which peaks around 6 weeks into lactation, followed by a gradual decrease until the end of lactation [7]. The knowledge of gene expression involved in lactation informs on the biological mechanisms underlying mammary morphogenesis and metabolic activities as well as enhances our understanding of milk composition [8][9]. The ability to manipulate lactation performance in less improved breeds is an area of increasing interest, and knowledge of the biological pathways and mechanisms that govern mammary gland development and lactation may help to increase the lactation performance of dairy animals. Recent developments in "omics" technologies like transcriptomics make it possible to comprehensively and systematically identify the potential factors or processes that may influence lactation [10][11]. Using high throughput RNA sequencing technique, a high number of genes were identified as differentially expressed between different stages of lactation, and the expression alterations may play crucial roles in the regulation of lactation [9][10][11][12]. Thus, a thorough and deeper understanding of the genes and biological networks that regulate bovine milk composition is required.
Cow milk contains a heterogeneous population of somatic cells consisting of lymphocytes, neutrophils, macrophages and exfoliated epithelial cells [13]. Mammary epithelial cells (MEC) are unique in that they are involved in the synthesis and secretion of milk. Although milk somatic cells have been widely used to analyse the expression of genes involved in milk synthesis in ruminants [14][15][16], it is known that some milk trait genes of interest (e.g. genes in apoptosis pathway) are not solely expressed in MEC, but also by other cell types like leucocytes [17]. Thus, compared with MEC, there is the possibility to study genes not specifically expressed in MEC when milk somatic cells are used to study the expression of genes involved in milk synthesis. Moreover, Sciascia et al. [18] reported that milk somatic cells are not suitable for measuring milk protein expression in lactating ruminants. Although many studies have examined the physicochemical properties of cow milk and the genes expressed in the bovine mammary gland [19][20][21][22], limited research has detailed the characterization of genes expressed in milk epithelial cells. Therefore, identification and characterisation of milk quality and yield related genes expressed at different stages of lactation in MEC may represent an important step towards understanding of the complex biology of the milk production process.
The Jersey breed is amongst top milk producers in the world and it is routinely used to upgrade the milk producing capacity of the Kashmiri local cattle of North India. Kashmiri cattle are poor-performing and not improved for milk production, and differ greatly from Jersey in dairy production characteristics. Given the importance of the Kashmiri cattle in crossbreeding programs for augmenting milk production, this study aimed to compare its MEC transcriptome, using RNA Seq, with that of Jersey breed to gain a better understanding of the genes and pathways underlying the different milk producing abilities of the two breeds. We therefore report for the first time the MEC transcriptome of Kashmiri cattle at different stages of lactation using RNA-seq. We also present a characterization of the gene expression profile and differences between the MEC transcriptomes of Kashmiri and Jersey cattle.

Animals and sampling
Animal care and use procedures were approved by the Institutional Animal Ethics Committee of Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir. Three healthy Kashmiri and three Jersey cows in their 3 rd lactation at the Share-Kashmir University of Agricultural Sciences and Technology dairy farm, Mountain Livestock Research Institute (MLRI), Kashmir, India were selected for the study. The animals were kept in free stall housing, fed with balanced ration and had ad libitum access to water. Fresh milk samples (1.5 L) were aseptically collected by milking equally the four quarters of the cows on day 15 (D15), D90 and D250 in milk representing early lactation, mid-lactation and late lactation stages, respectively. Under the management conditions at the MLRI dairy farm, the lactation stages for Jersey cattle are D1-D80 (early lactation), D81-D185 (mid-lactation) and D186-D300 (late lactation). The corresponding periods for Kashmiri cattle are D1-70 (early lactation), D71 to D180 (mid-lactation) and D181-D280 (late lactation). Thus D15, D90 and D250 were chosen to represent early, mid and late lactation stages, respectively in both breeds. In total, nine samples per breed were collected. The milk samples were immediately transported to the laboratory in ice cooled containers. For milk quality analysis, the different parameters like milk yield/day, fat and protein content were recorded for ±7 days relative to day of sampling (i.e. seven days before day of sampling, day of sampling and seven days after day of sampling) for each lactation stage. The fat and protein contents were determined by Milk auto-analyser (Speedy Lab, Astori, Italy).

Isolation of milk epithelial cells and purity check
Milk epithelial cells were isolated from whole fresh milk following the protocol of Boutinaud et al. [13] with some modifications. Milk sample (1.5 L/cow) was aliquoted (125 ml) into each of six 250 ml centrifuge tubes, and 100 ml cold (4˚C) diethylpyrocarbonate (DEPC) treated 1 x phosphate buffered saline (PBS) buffer added. The samples were defatted by centrifugation for 20 min at 2800 x g at 4˚C. The fat layer and whey portion were carefully removed and the remaining fraction (1 ml) at the bottom containing the cell pellet was mixed with 800 μl cold 1 x PBS and transferred into a 2 ml tube. After adding 200 μl EDTA (0.5 M pH 8.0, 4˚C), the sample was centrifuged at 14,000 x g for 1 min at 4˚C. The supernatant was discarded, and pellets of the same sample were pooled and resuspended in 200 μl cold 1 x PBS and centrifuged at 5100 x g for 5 min at 4˚C. The supernatant was discarded and the pellet was resuspended in 1.25 ml cold 1 x PBS containing 1% bovine serum albumin (BSA, Sigma, USA). A portion (500 μl) of the resuspended milk somatic cells (MSC) was used for RNA isolation while the other portion was further purified to obtain MEC. Specific anti-cytokeratin peptide 18 antibody (KRT18, Clone KS-B17.2, Sigma-Aldrich, USA) coated beads (Dynabeads Pan Mouse IgG, Invitrogen, USA) were used to separate MEC from other cell types according to Boutinaud et al., [13]. Briefly, 25 μl of Dynabeads was transferred to a 1.5 ml tube and washed twice with 1 ml 1% BSA-PBS to remove the preservative. The Dynabeads were resuspended in 1 ml 1% BSA-PBS and transferred to a 1.5 ml tube containing 3 μl of KRT18 antibodies. The suspension was incubated for 30 min at 4˚C on a Sample Mixer (Rotospin, Tarson,-India). Then, the tube was placed in the magnetic particle concentrator (Dyna Mag 5, Invitrogen, USA) for 30 sec. After another wash step and aspiration of the supernatant containing unbound antibodies, the antibody-coated Dynabeads were resuspended in 250 μl 1% BSA-PBS and transferred to the 1.25 ml cell suspension and incubated for 1 h at 4˚C on the Sample Mixer. Finally, specifically bound cells were collected by magnetic incubation for 1 min. The bead bound cell pellet was washed (1 ml 1 x PBS added followed by centrifugation at 4000g for 1 min) and immediately used for RNA extraction. Possible contamination of purified MEC was checked by quantification of the expression of marker genes for various MSC types like beta casein (CSN2, mammary epithelial cell marker), cytokeratin 18 (KRT18, epithelial cell marker), lymphocyte-specific protein one (LSP1, leucocyte specific cell marker), haemoglobin sub-unit alpha (HBA, red blood cell marker) and CD18 (macrophage cell marker) [23][24] in samples collected on D90 by real time quantitative PCR (qPCR) (S1 Table).

RNA extraction and sequencing
Total RNA extraction from MEC and MSC was accomplished by Trizol method (Ambion, USA) according to the manufacturer's instructions. RNA was quantified by spectrophotometer (ThermoFisher, USA) and the quality and integrity was assessed by Bioanalyzer (Agilent, USA). The RNA integrity number (RIN) of samples ranged from 6.5-9.3. Only those samples having RIN values above 8 were used for library construction. The RNA isolation process was repeated for samples with lower RIN values until a RIN value of �8 was achieved.
Illumina TruSeq stranded mRNA sample preparation kit was used to generate cDNA libraries according to the manufacturer's recommendations. Total RNA (4μg/sample) was used to prepare the libraries. Poly-A containing mRNA molecules were purified using poly-T oligo-attached magnetic beads. Following purification, RNA was fragmented into small pieces of 300 bp size using divalent cations under elevated temperature. The cleaved RNA fragments was used to synthesize first strand cDNA using reverse transcriptase and random primers (Illumina, USA) followed by second strand cDNA synthesis using DNA Polymerase I and RNase H. After adenylation of 3' ends of DNA fragments, hybridisation was initiated by ligating Illumina paired-end adapter and index. cDNA fragments (200bp) were generated and were selectively enriched to construct the final sequencing paired end library using Illumina PCR Primer Cocktail. Libraries were pooled in equimolar amounts and paired end sequenced (126 bp) on three lanes (6/lane) on a High Throughput Model flow cell on an Illumina HiSeq 2500 platform by SciGenom, Cochin, Kerela-India.

Sequence data processing, alignment and identification of expressed genes
Raw data (reads) in fastq format were first processed by removing adapter sequences and reads having a phred score <30 with Trimmomatic software v0.32. Clean reads were aligned to the bovine reference genome, UMD3.

Differential gene expression analysis
Aligned reads were assembled with Cufflinks and differentially expressed genes (DEG) between lactation stages and breed were detected and quantified with Cuffdiff [26]. Negative binomial distribution was used to calculate gene expression which was normalized in fragments per kilobase of transcript per million mapped reads (FPKM). T-test was used to identify significantly differentially expressed genes and gene expression differences were declared significant at p-values < 0.05 after Benjamini-Hochberg correction.

Gene ontology (GO) and KEGG analysis of differentially expressed genes
GO and pathway enrichment analysis of DEG was accomplished with Gene Ontology Consortium data base (http://www.geneontology.org) [27]. GO terms and KEGG pathways (http:// www.genome.jp/kegg/) with Benjamini and Hochberg corrected p-values < 0.05 were considered significantly enriched.

Protein-protein interaction networks of differentially expressed genes
Protein-protein interaction (PPI) networks were constructed on the basis of information from STRING v10.5 (https://string-db.org), using the Ensemble gene identifiers of DEG as input and Bos taurus as background which provides critical assessment and integration of proteinprotein interactions, including direct (physical) and indirect (functional) associations [28]. The top 20 DEG and DEG for milk traits from lactation stage comparisons in Kashmiri and Jersey cattle were used in protein-protein interaction analysis. Credible interactions (combined_score � 0.4) were further visualized using CytoScape [29].

Quantitative real time PCR
Real time quantitative PCR was performed to verify the expression levels of eight DEG (GPAM, BDH1, SLC2A1, SLC2A8, HK2, SOS2, FAS and XDH) involved in different milk synthesis pathways. cDNA was synthesized from 0.5 μg of the same total RNA used in RNA sequencing using the Revert Aid First Strand cDNA Synthesis Kit (Thermo Scientific, USA) as per the manufacturer's protocol. Primers were designed using Primer3 Plus software (https:// primer3plus.com/cgi-bin/dev/primer3plus.cgi). qPCR was performed on a Light Cycler 480 II Real-Time PCR System (Roche, Switzerland). The reaction volume of 20 μl included 10 μL of 2X SYBR Green MasterMix reagent (Thermo Scientific, USA), 1μl of cDNA and 0.2 μL of each primer (10μM). The sequences of the primers and annealing temperatures are shown in S1 Table. All reactions were conducted in triplicates and included negative controls with no template. The expression levels of genes were normalized with GAPDH and UXT. GAPDH and UXT were initially tested and shown to be stable under the experimental conditions. The relative gene expression was calculated using the 2 -ΔΔCt method [30].

Milk yield traits
Milk yield (kg/day), fat and protein contents were determined for ±7 days relative to day of sampling for each lactation stage. The milk yield per day varied significantly between lactation stages in both Kashmiri and Jersey cattle. The mid lactation was characterised by highest milk yield (p<0.05), whereas protein and fat contents were maximum (p<0.05) during initial stages of lactation in both breeds, except that early lactation protein content (3.21±0.58) was similar to late lactation protein content (3.11±0.39) in Kashmiri cattle ( Table 1). As expected, the milk yield of Jersey cattle was higher than that of Kashmiri cattle at all lactation stages. The fat and protein contents in Jersey cattle ranged from 4.10% to 4.85% and 2.91% to 3.36%, respectively. The corresponding values for Kashmiri cattle were 3.20%-3.94% and 2.81%-3.21%, respectively.

Sequencing and expressed genes in mammary epithelial cells
To test the purity of isolated MEC and validate its use in transcriptional studies of milk trait genes, the expression of marker genes for specific cell types was compared between MEC and MSC. The expression levels of marker genes were normalised against GAPDH and UXT housekeeping genes. The mRNA expression levels of KRT18 and CSN2 were significantly higher (p<0.05) in the isolated MEC as compared to MSC (Fig 1). Furthermore, the expression of LSP1, HBA and CD18, chosen as markers for cell types other than MEC, were significantly higher (p<0.05) in MSC as compared to MEC (Fig 1).
Sequencing of 18 libraries generated a total of 1.65 billion reads (range 68,43-136,83 million reads/library) (S2 Table). Out of this number, 1.47 billion reads (95.82%) passed quality control and were aligned to the bovine genome UMD3.1. A total of 1.44 billion uniquely mapped reads were further processed while reads that mapped to multiple positions, unaligned and discordant reads were discarded (S2 Table). Mapped genes with FPKM �0.01 were discarded and the remainder divided into a low expression group (< 10 FPKM), a moderate expression group (10 FPKM to 500 FPKM) and a high expression group (> 500 FPKM) (S3 Table). For both breeds, the highest number of expressed genes with FPKM >0.01 was during late lactation (D250) in Jersey (13,835 genes) and Kashmiri cattle (14,464 genes) (S3 Table).

Top most expressed genes at each stage of lactation in Kashmiri and Jersey cattle
The numbers of genes with the highest FPKM values (> 500) in each breed and lactation day are shown in S3 Table. For both breeds, 13 top expressed genes at each lactation day accounted for~70% of the total FPKM values ( Table 2). The top expressed genes at each stage of lactation were similar for both breeds, except RPS12 and CCL14, which were highly expressed in Kashmiri cattle only or Jersey cattle only, respectively.

Differentially expressed genes between lactation stages in Kashmiri and Jersey cattle
A total of 1,282, 455 and 665 genes were differentially expressed (FDR<0.05) between D15 vs D90, D90 vs D250, and D15 vs D250, respectively in Kashmiri cattle (Fig 2a, S4 Table). Likewise, 826, 418 and 1,765 genes were differentially expressed (FDR<0.05) between D15 vs D90, D90 vs D250, and D15 vs D250, respectively in Jersey cattle (Fig 2b, S4 Table). The largest number of DEG were observed between D15 vs D90 in Kashmiri (1,282 genes) and between D15 vs D250 in Jersey (1,765 genes). The number of DEG that were common to all lactation stages were 8 and 15 in Kashmiri and Jersey, respectively. The top ten DEG with highest fold changes for each breed are listed in Table 3. A comparison between Kashmiri vs Jersey cattle indicated that 1,103, 1,356 and 1,397 genes were differentially expressed between the two breeds on D15, D90 and D250, respectively (Fig 2c, S4 Table) while 334 DEG were common to all stages. The top 10 DEG at each lactation stage between the two breeds are shown in Table 4.

Candidate genes related to milk quality and yield traits
The expression levels of about 42 genes for milk traits (fat, protein and milk yield traits) in Kashmiri and Jersey cattle are shown in Table 5. It was observed that the major candidate LGB showed up-regulated expression at D250 (D15 vs D250 and D90 vs D250) in both breeds and genes responsible for milk yield like SLC2A4 was highly expressed at D90 (D15 vs D90) and D250 (D15 vs D250) in Kashmiri cattle and SLC2A1 at D250 (D15 vs D250) in Jersey cattle (Table 5).

Protein-protein interaction
We examined possible interactions between top 20 differentially expressed genes (Table 3 and  S4 Table) and candidate genes for milk yield and quality traits (Table 5) at each lactation stage comparisons (D15 vs D90, D90 vs D250 and D15 vs D250) in Kashmir and Jersey cattle using the STRING database [26]. Out of 60 DEG (20 top genes from each lactation stage comparison) in Kashmiri cattle, only ME1, B4GALT6 and ESR1 showed protein interactions with major milk candidate genes (Fig 3) with interaction confidence > 0.4. Whereas in Jersey cattle, SLC27A6, BDH1 and KMO showed multiple interactions with various milk candidate genes (Fig 4). The details of string analysis are shown in supplementary file (S5 Table).

Gene ontology and pathway enrichment analysis of differentially expressed genes
According to lactation stage comparisons, a total of 12 (localization, biological regulation, response to stimulus, multicellular organismal process, endopeptidase activity, binding, receptor activity, signal transducer activity, catalytic activity, transporter activity, cell junction and macromolecular complex), 5 (cellular process, developmental process, biological adhesion, cell junction and macromolecular complex) and 4 (biological regulation, response to stimulus, multicellular organismal process and extracellular region) GO terms were significantly enriched (FDR < 0.05) by DEG at D15 vs D90, D90 vs D250 and D15 vs D250, respectively in Kashmiri cattle (Table 6 and S6 Table). Similarly, 7 (reproduction, developmental process, multicellular organismal process, locomotion, receptor activity, signal transducer activity and catalytic activity), 3 (signal transducer activity, synapse and macromolecular complex) and 10 (localization, reproduction, multicellular organismal process, metabolic process, immune system process, receptor activity, signal transducer activity, catalytic activity, cell part and organelle) GO terms were significantly enriched (FDR < 0.05) by DEG at D15 vs D90, D90 vs D250 and D15 vs D250, respectively in Jersey cattle (Table 6 and S6 Table). Only 6 GO terms (multicellular organismal process, receptor activity, signal transducer activity, catalytic activity, macromolecular complex and multicellular organismal process) were commonly enriched by DEG in Kashmiri and Jersey cattle. Pathway enrichment analysis at different stages of lactation indicated that 5 pathways (transcription regulation by bZIP transcription factor, Toll receptor signalling pathway, VEGF signalling pathway, CCKR signalling pathway and chemokine and cytokine signalling pathway) and 11 pathways (JAK/STAT signalling pathway, p38 MAPK pathway, B cell activation, Toll receptor signalling pathway, interleukin signalling pathway, apoptosis signalling pathway, inflammation mediated by chemokine and cytokine signalling, phosphatidylinositol-3-kinases (PI3) pathway, Platelet-derived growth factor (PDGF) signalling pathway, T cell activation and cholecystokinin receptor (CCKR) signalling pathway were significantly enriched (FDR < 0.05) by DEG of D15 vs D90 and D15 vs D250, respectively in Jersey cattle while only two pathways, purine metabolism (FDR = 0.095) and p38 MAPK pathway (FDR = 0.063) (D15 vs D90) tended towards significance in Kashmiri cattle (Table 7 and S7 Table).

Real time quantitative PCR validation of the RNA-seq expression results
Results of qPCR analysis of the expression of 8 DEG involved in different milk synthesis pathways (GPAM, BDH1, SLC2A1, SLC2A8, HK-2, FAS, SOS2 and XDH) and data obtained by RNA-Seq are shown in Fig 5. The expression levels of genes by qPCR and RNA-Seq were highly correlated (Pearson's correlation coefficient = 0.87) thus validating the RNA-Seq results.

Discussion
In this study, we investigated the transcriptome profile of bovine MEC at different stages of lactation in Kashmiri and Jersey cattle by the method of high throughput RNA sequencing. Purified RNA from MEC, which represents a non-invasive source of material for assessing gene expression in mammary gland [24] was used. The quality of RNA from milk isolated MEC can be sensitive to degradation (due to several long processing steps) resulting in a wide range of RIN values (4 to 9) [24,31,32]. Using RNA from MEC with RIN of 6, Canovas et al. [24] reported discrepancies in gene expression when compared with other sources of  [24]. Consequently, Boutinaud et al. [17] advised that the quality of isolated MEC RNA should be assessed before use in gene expression analysis. In this study, the RIN values of isolated MEC RNA ranged from 7.4 to 9.1 suggesting that the RNA was of high quality with minimal degradation. Therefore, the low RIN 6 reported by Canovas et al. [24] could explain the relatively low levels of CSN2, CSN3, CSN1S1 and CSN2S2 when compared with our data, suggesting that the antibody-captured milk MEC technique used by these authors was probably not optimal. The validity of gene expression results obtained by using purified MEC has been demonstrated in cows and buffalo [31, 33-36] and supported by our data. The advantages of purified MEC as a non-invasive source of RNA for mammary gland transcriptome analysis include but not limited to possibility for repeat sampling over a Milk epithelial cell transcriptome period of time on the same animals without causing damage to mammary tissue and ability to specifically study milk secreting cells.
The qPCR results of CSN2, KRT18, LSP1, HBA and CD18 suggest that the purified MEC in this study share characteristics with typical mammary gland epithelial cells and were minimally contaminated with other cell types like macrophages, leucocytes and red blood cells as we detected very low mRNA abundance of LSP1, HBA and CD18 in isolated MEC (Fig 1). However, our results contrast findings by Cánovas et al. [24] who showed the possibility of higher contamination of purified MEC by macrophages, further supporting our suggestion that the antibody-captured milk MEC technique was not optimally applied in that study.
The RNA sequencing results were validated by real time qPCR of eight genes which showed high correlation (r = 0.87) in their expression levels with RNA sequencing data. Further validation of DEG in milk synthesis related pathways between the two breeds in a larger population is necessary. Such validation could not be performed within this study due to the limited sample size. The transcriptome results revealed that the majority of genes were lowly expressed (FPKM<10) (S3 Table). This observation is in agreement with previous reports on the mammary gland transcriptome of Canadian Holsteins and transgenic goats [10,37,38]. The read counts of highly expressed genes ( Table 2) constituted about 70% of total read counts in both breeds. The caseins (CSN1S1, CSN1S2, CSN3, CSN2) and whey protein genes (LGB, LALB) were amongst the top expressed genes as expected. Our observation is in agreement with Ibeagha-Awemu et al. [38].
The DEG profile between lactation stages in this study followed the dynamics of a typical lactation curve. It was found that the highest numbers of genes were expressed in early lactation in both breeds. A higher protein and fat content observed in this study during the initial stages of lactation and supported by a previous report [39] could be a possible reason for this observation. Casein and whey protein gene expression remained almost constant throughout lactation in both breeds. However, the expression was higher in Jersey cattle as compared to Kashmiri cattle. It is likely that the casein and whey protein genes have been fixed through long term genetic selection for increased milk production and consequently accounted for  their increased frequencies in Jersey cattle. Apart from the casein and whey protein genes, other genes like RPLP1, RPS28, RPLPO and B2M were also highly expressed in the two breeds. RPLP1 gene encodes a ribosomal phosphoprotein that is a component of the 60S ribosomal subunit and plays an important role in the synthesis of proteins, protein folding and transport [40]. Deficiency of RPS28 and RPLPO proteins has been shown to cause cell death through reactive oxygen species (ROS) accumulation and MAPK1/ERK2 signalling pathway activation [41]. In this study, RPS28 was highly expressed during all three lactation stages in Kashmiri cattle while in Jersey, it was expressed only at early lactation. Similarly, RPLPO was highly expressed during early and late lactation in Kashmiri cattle and during early lactation in Jersey cows. The expression patterns of RPS28 and RPLPO suggests a possible higher antioxidant activity of milk from Kashmiri cattle as compared to Jersey cattle. B2M gene is among highly expressed genes (higher expression in Jersey cattle) and encodes for the beta-2-microglobulin protein, an integral component of the Fc receptor heterodimer involved in transferring IgG from blood into milk across mammary epithelial cells [42]. Some B2M haplotypes have been reported to be related to higher concentrations of IgG1 in bovine milk [43]. Increasing IgG    levels in milk could become important as IgG enhanced dairy products are in demand by consumers to obtain protective immunity [44]. CCL14, a small cytokine belonging to the CC chemokine family was highly expressed during mid and late lactation stages in Jersey cows. CCL14 is involved in cellular calcium homeostasis, immune response and positive regulation of cell proliferation [45]. Differential gene expression analysis indicated that more genes were differentially expressed in Jersey as compared to Kashmiri cattle. In Kashmiri cattle, only three genes, SNORD50, PEAR1 and TMEM232, were amongst top DEG between lactation stages. The membrane protein, PEAR1, shows specific expression in endothelial and platelets cells and plays significant roles in platelet activity and cardiovascular disease [46,47]. TMEM232 has been reported to be co-expressed with SLC25A46 and NAA10. Through genome wide association studies, TMEM232 along with SLC25A46 have been found to dysregulate IgE concentration [48] and are associated with various allergic conditions, while NAA10 was found to regulate mTOR pathway which regulates milk protein synthesis [49].

Table 6. Significantly enriched gene ontology (GO) terms associated with identified differentially expressed genes in mammary epithelial cells in Kashmiri and
In Jersey, TMSB4X, which was highly up-regulated at D90, is involved in T-cell activation [50], pathogen clearance and anti-inflammatory effects [51]. SLC27A6 was found to be highly up-regulated during early and late lactation in Jersey and is involved in fatty acid uptake by mammary epithelial cells. SLC27A6 is the major isoform of solute carrier group of genes expressed in bovine MEC and its expression was highly up-regulated with the onset of lactation [52,53].
The expression of all the top DEG between Kashmiri and Jersey cattle (Table 4) were higher in Jersey except SNORD50 and TMEM237, which were expressed at a higher rate in Kashmiri cattle during early lactation. Bos taurus microRNA-223 (bta-miR-223) was differentially expressed between Kashmiri and Jersey and has known roles in immunity, immune cell lineage differentiation, and granulopoiesis [53]. Bta-miR-223 has been implicated in cancer progression, HIV-1 infection, IL-17-induced inflammation [54,55], general delay in alternative NF-κB activation in innate immune cells [56] and negative regulation of proliferation and differentiation of neutrophils through down-regulation of the transcription factor, Mef2c, as well as up-regulation during bovine mammary gland infections like mastitis [57]. SNORD50 is involved in maturation of ribosomal RNAs (rRNAs) within the nucleolus [58,59] and its overexpression inhibits colony formation of human breast and prostate cancer cell lines in vitro, suggesting its role as a tumor suppressor [60,61]. PTX3 play roles in the regulation of resistance to pathogens, inflammatory reaction, clearance of self-components and female fertility [62,63,64]. GPR84 is expressed mainly in immune-related tissues and plays significant roles in inflammatory processes [65]. It is also expressed in adipose tissue [66]. It enhances insulin secretion from pancreatic β-cells [67,68] and increases the release of gut peptides, glucagonlike peptide 1 from intestinal neuroendocrine L-cells [69].
The candidate genes for milk quality and yield traits a were expressed at higher rates in Jersey cattle as compared to Kashmiri cattle (Tables 4 and 5). This observation is supported by milk yield, and milk fat and protein contents which were higher in Jersey at all stages of lactation as compared to Kashmiri cattle. Similar differential expression patterns were also reported by Lee et al., [70] in lactating yaks.
The relative expression of the major milk protein genes (CSN1S1, CSN1S2, LALBA CSN3, CSN2 and LGB) showed highest fold changes at D250 (late-lactation) in both Kashmiri and Jersey cattle. Our findings are in agreement with Sigl et al. [33] and Colitti and Farinacci [71]. Similar to our results, Colitti and Pulina [72] recorded higher expression for CSN1S1, CSN3 and CSN2 during late lactation in dairy ewes. Amongst the glucose transporter genes (SLC2A1, SLC2A4 and SLC2A8, and HK2), only SLC2A1 was up-regulated at mid-lactation in Kashmiri cattle which could be attributed to increased demand for glucose during mid-lactation period [73]. SLC2A8, another major solute carrier that plays a significant role in glucose transport in mammary gland indicated higher expression during mid-lactation (D90). SLC2A8 mRNA in mammary gland is developmentally regulated during lactation in both mouse and cows [74]. SLC2A4 expression, unlike SLC2A1 expression was found to be higher during late lactation and could influence the insulin action on mammary tissue during involution [75].
Data on protein-protein interaction indicated that ME1 protein had varying degrees of interactions with milk candidate genes (ACSS1, ACSS2, PPARGC1A, LPL and FSN) with critical roles in metabolic pathways, insulin signalling, fatty acid synthesis and metabolism. ME1 gene encodes a cytosolic NADP-dependent enzyme that generates NADPH used by FASN for long chain fatty acid synthesis [76]. B4GALT6 interacted (confidence score = 0.905) with UGCG milk candidate gene. B4GALT6, a galactosyltransferase gene of the sphingolipid metabolic pathway encodes for a lactosyl ceramide synthase enzyme that is required for cell apoptosis [77] and lactose biosynthesis (occurs exclusively in the mammary gland) [78]. ESR1 protein interacted with lipogenic proteins (LPL, ABCA1, FASN and PPARGC1A) with confidence scores varying from 0.5 to 0.61. ESR1 plays a key regulatory role in lipid biosynthesis in mammary epithelial cells through activation of various lipogenic genes (SREBF1, SREBF2, PPARG, INSIG1, and PPARGC1A) [79]. SLC27A6 interacted with ACSL1 and LPIN1 with confidence score of 0.63 and 0.66, respectively. SLC27A6 is an integral transmembrane protein that enhance the uptake of long-chain and very long chain fatty acids into cells by activating LCacyl-CoA primarily via ACSL1 [80]. BDH1 showed protein-protein interaction only with OXCT1 (confidence score > 0.9). Both BDH1 and OXCT1 are involved in ketone body utilization through synthesis and degradation of ketone bodies pathway, which utilize β-hydroxybutyrate for de novo fatty acid synthesis in mammary epithelial cells [81].
In mammals, the maintenance of lactogenic process is because of the balance between different processes like mammary development and involution pathways [82]. In relation to these physiological processes, many of the enriched GO terms in our data are related to developmental processes (GO:0032502). With respect to involution, this mechanism produces changes in mammary gland architecture through extra cellular matrix remodelling, collapse of alveoli and differentiation of adipocytes [83]. Cell junction (GO:0030054) and synapse (GO:0045202) which are related to cellular components of ECM were enriched in Kashmiri and Jersey cattle, respectively. The enriched genes for these terms (MEGF9, HAPLN3, ELN, LAMB3, GAS6, VCAN, TIMP1) have been associated with the onset of mammary gland involution and mammary gland morphogenesis [84,85]. Our results did not show a direct relationship with prolactin signalling, which is important for the process of lactogenesis. However, a significant GO term, response to stimulus (GO:0050896), is a parent term to insulin and growth factor stimulus. It is remarkable that both insulin and growth hormone are known to increase prolactin lactogenic effect [85]. Additionally, it is important to highlight that the significant GO term related to organelle (GO:0043226) (includes endoplasmic reticulum lumen) was significantly enriched at D250 (D15 vs D250) of lactation in Jersey. This organelle is linked to the lipid secretor mechanism of mammary epithelial cells [86]. The GO term, endopeptidase activity (GO: 0010950) was highly enriched in Kashmiri cattle (D15 vs D90) and may have special effects on the physico-chemical characteristics of milk [87] and flavour of dairy products [88] in this breed. Moreover, the immune system process (GO:0002376) GO term was significantly enriched by DEG in Jersey (D15 vs D250). Subclinical infections elicit elevated somatic cell counts (SCC) but other variables like breed have been shown to influence milk SCC levels [89].
The pathway enrichment analysis indicated that a total of 16 pathways were enriched (FDR<0.05) for DEG in Jersey and only two pathways (purine metabolism and p38 MAPK pathway) tended (FDR<0.1) to be significant in Kashmiri cattle. Among enriched pathways, the bZIP transcription factor regulates the transcriptional activity of various protein coding genes like GTF2B, CREB1, POLR2L which play critical roles in the regulation of mammary epithelial cell proliferation [90]. P13 kinase regulates mammary epithelial cell differentiation through prolactin action. The mammary differentiation due to P13K-AKT activation results in autocrine prolactin secretion which in turn activates JAK-STAT pathway [91]. Lemay et al.
[92] observed that P13K-AKT pathway was highly significantly enriched during lactation in mouse mammary gland. JAK-STAT pathway plays important roles in the regulation of milk protein synthesis in non-ruminants [93]. Besides protein synthesis, STAT3, JAK2 and JAK3 are important for mammary gland development [94]. In bovine, STAT3 responds to lactogenic factors and its activity increases during lactation [95]. The p38-MAPK pathway is another signalling pathway which plays a critical role in mammary epithelial cell development and enhances milk production by modulating alveolar cell proliferation and branching [96]. A number of immune related pathways (B cell activation, Toll receptor signalling pathway, T cell activation) were significantly enriched in Jersey cattle and may play an important role in milk production by protecting offspring and cell secreting organs [97].

Conclusion
This study represents a cohesive comparison of the milk epithelial cell transcriptome profiles at different stages of lactation between Kashmiri and Jersey cattle. The results revealed higher gene expression profiles of candidate genes for milk synthesis and yield traits in Jersey compared to Kashmiri cattle. More genes were differentially expressed between lactation days in Jersey cattle as compared to Kashmiri cattle. Sixteen pathways were significantly enriched by DEG in Jersey cattle while no pathway was found to be significantly enriched in Kashmiri cattle. On the other hand, varied numbers of GO terms were enriched between lactation stages in both Kashmiri and Jersey cattle. The presence of enriched GO terms like endopeptidase and antioxidant activity in Kashmiri cattle suggests special effects on the physico-chemical characteristics of milk from Kashmiri cattle. Such properties may lead to the development of certain niche products and thereby help in the conservation of this unique germplasm which has been diluted through extensive cross breeding programmes. The results provide a significant advance in our knowledge of Kashmiri cow lactating mammary gland gene expression and valuable information for future studies and breed improvement.