In-Depth Characterization of Sheep (Ovis aries) Milk Whey Proteome and Comparison with Cow (Bos taurus)

An in-depth proteomic study of sheep milk whey is reported and compared to the data available in the literature for the cow whey proteome. A combinatorial peptide ligand library kit (ProteoMiner) was used to normalize protein abundance in the sheep whey proteome followed by an in-gel digest of a 1D-PAGE display and an in-solution digestion followed by OFFGEL isoelectric focusing fractionation. The peptide fractions obtained were then analyzed by LC-MS/MS. This enabled identification of 669 proteins in sheep whey that, to our knowledge, is the largest inventory of sheep whey proteins identified to date. A comprehensive list of cow whey proteins currently available in the literature (783 proteins from unique genes) was assembled and compared to the sheep whey proteome data obtained in this study (606 proteins from unique genes). This comparison revealed that while the 233 proteins shared by the two species were significantly enriched for immune and inflammatory responses in gene ontology analysis, proteins only found in sheep whey in this study were identified that take part in both cellular development and immune responses, whereas proteins only found in cow whey in this study were identified to be associated with metabolism and cellular growth.


Introduction
Milk proteins are widely regarded as a functional food with many nutritional and health promoting benefits [1]. Worldwide, milk production from small ruminants such as sheep and goats is marginal in comparison with that of cow milk. However, in several countries, such as in the Mediterranean where the climate and environment is more suitable for small ruminants rather than cows [2], small ruminant milk production is significant. In addition, small ruminant milks appear to be potentially less allergenic and have become important as a substitute The whole gel lane containing the ProteoMiner enriched sheep acid whey fraction was excised from the gel and then divided into 6 segments prior to in-gel tryptic digestion. Each gel lane segment was subjected to in-gel digestion with trypsin using a robotic workstation for automated protein digestion (DigestPro Msi, Intavis AG, Cologne, Germany). The protocol for automated in-gel digestion was based on the method of Shevchenko, Jensen [24]. Eluted peptides were dried using a Savant Speed Vac (Savant, France) centrifugal concentrator, prior to LCMS/MS. OFFGEL Isoelectric Focusing. One half (50 μL) of the ProteoMiner enriched sheep acid whey protein fraction obtained after 2D Clean Up Kit desalting was diluted with 300 μL freshly prepared 100 mM ammonium bicarbonate and subjected to in-solution digestion with trypsin, 37°C, 16 h. The tryptic hydrolysate was desalted using a C18 SepPak cartridge (Waters, MA, USA) and then dried using a Savant Speed Vac (Savant, France) centrifugal concentrator and then dissolved in 3.7 mL MQ-water containing 4.8% (v/v) glycerol ready for OFFGEL IEF. OFFGEL IEF was conducted on a 3100 OFFGEL Fractionator (Agilent Technologies, CA, USA) according to manufacturer's recommendations with slight modification. A 24 cm pH 3-10L Immobiline Drystrip IEF strip (GE Healthcare, Auckland, NZ) was assembled with a 24 well hopper unit in the OFFGEL apparatus. The IEF strip was rehydrated by addition of 40 μL per well of MQ-water containing 4.8% (v/v) glycerol and 5% (v/v) pH 3-10L ampholine buffer concentrate (GE Healthcare) for 15 min. Paper wicks wetted in the same rehydration solution were applied to each end of the IEF strip. After IEF strip rehydration, aliquots (150 μL) of the above processed whey hydrolysate sample were loaded in each hopper well of the OFFGEL system. After addition of mineral oil to the end compartments of the OFFGEL system, the electrodes were attached and electrophoresis conducted until 100 kVh had been accumulated over 24 h. After electrophoresis, the solution in each well was harvested, retained as separate fractions, and then dried using a Savant Speed Vac (Savant, France) centrifugal concentrator, prior to LCMS/MS. LCMS/MS. Samples (6 from in-gel digests of 1D-PAGE gel segments, and 24 from OFF-GEL IEF fractions) were re-solubilized in 5% (v/v) acetonitrile, 0.2% (v/v) formic acid in water and aliquots were injected onto an Ultimate 3000 nano-flow uHPLC-System (Dionex, CA, USA) that was in-line coupled to the nanospray source of a LTQ-Orbitrap XL hybrid mass spectrometer (Thermo Scientific, San Jose, CA, USA). Peptides were separated on an in-house packed emitter-tip column (75 μm ID PicoTip fused silica tubing (New Objective, Woburn, MA) packed with C-18 material on a length of 8-9 cm), using 0.2% formic acid in MQ-water (buffer A) and 0.2% formic acid in acetonitrile (buffer B). Each of the peptide fractions was preliminary analyzed with a gradient developed from 5% (v/v) acetonitrile, 0.2% [v/v] formic acid to 99% [v/v] acetonitrile, 0.2% [v/v] formic acid in water at a flow rate of 200-500 nL.min −1 .
A typical instrument setting for the LTQ-Orbitrap was full MS in a mass range between m/z 300-2000, performed in the Orbitrap mass analyzer with a resolution of 60,000 at m/z 400 and an AGC target of 5e5. Preview mode for FTMS master scan was enabled to generate precursor mass lists. The strongest 5 signals were selected for CID (collision induced dissociation)-MS/ MS in the LTQ ion trap at a normalized collision energy of 35% using an AGC target of 2e4 and one microscan. Dynamic exclusion was enabled with 2 repeat counts during 30 sec and an exclusion period of 180 sec. Exclusion mass width was set to 0.01. After the initial LCMS/MS analysis of each of the 6 samples from 1D-PAGE and the 24 fractions from OFFGEL IEF, the LCMS/MS was repeated twice with different LC gradients (5% B hold 0-6 min, to 10% B over 6-9 min, to 27% B over 9-36 min, to 40% B over 36-41 min, to 99% B over 41-44 min) and (5% B hold 0-11 min, to 25% B over 11-65 min, to 40% B over 65-75 min, to 99% B over 75-80 min), to optimize peptide fractionation and maximize depth of peptide MS analysis by allowing for bias in hydrophobicity and complexity of the peptides in multiple fractions.
Data Analysis. Tandem mass spectral data were generated with Proteome Discoverer 1.3 software (Thermo Scientific) using default settings with the exception of a maximum peptide length of 8000 Da. Peak lists were searched against the SheepProt1 amino acid sequence database (NCBI amino acid sequence database subset matching the genus Ovis which contained 31,239 sequences and was downloaded on July 5 th 2013) using the three search engines Sequest HT, Mascot and MS Amanda. The search was set up for full tryptic peptides with a maximum of 2 missed cleavage sites. Dynamic modifications: carbamidomethyl C, oxidized M, deamidated N and Q were included. Search parameters specified an initial MS precursor mass tolerance of 10 ppm and an MS/MS fragment tolerance of 0.8 Da. The Percolator algorithm [25] was used for FDR calculation using a cutoff of q = 0.01. In addition the following search engine threshold scores were used: Mascot ion score 20, MS Amanda score 100 and Sequest HT charge state dependent score of 2.4 (2+), 2.8 (3+), 3.2 (4&5+), 3.6 (all others).
The full sequences of milk whey proteins were retrieved from the NCBI database from available accession numbers. In parallel, the bovine and human RefSeq data in GenBank and FastA format were downloaded from the NCBI database (ftp.ncbi.nlm.nih.gov/refseq/). An equivalent table of protein bovine and human accession number to gene symbol was generated. Subsequently BlastP was used to match the full sequences of the identified protein (query argument) to either the human or bovine RefSeq homologue (subject argument). The BlastP searches were conducted on the cluster at the Department of Biochemistry, University of Otago, using binaries from the NCBI website [26]. The BlastP output in tabular format (outfmt argument set to 6) was used to find the best match and to retrieve official gene symbol. Duplicates were then removed from the lists of gene symbols (sort | uniq Bash commands) and cross-tabulated twice in an Excel spreadsheet package (Microsoft Corporation, Redmond, WA). The list of genes was converted in a presence/absence table of each gene (Countif function), and summarized as a tally of the co-presence of the genes (concatenate function, pivot table tool).
Identified sheep whey proteins were analyzed with gene ontology (GO) through the Database for Annotation, Visualization and Integrated Discovery (http://david.abcc.ncifcrf.gov/) [27,28]. GO term enrichment is considered significant for a Benjamini corrected enrichment score of less than 0.05.

Results and Discussion
ProteoMiner treatment of sheep whey achieved significant enrichment of minor proteins Milk contains a large number of proteins with a significant dynamic range of abundance. The casein fraction (including α s1 -, α s2 -, βand κ-caseins) constitutes about 80% of the total protein in sheep milk. The major whey proteins, including β-lactoglobulin, α-lactalbumin, serum albumin, lactoferrin, lactoperoxidase and immunoglobulins (indicated in Fig 1 as reported previously [29]), account for the majority of the rest of the total milk protein content [2]. Previous proteomic studies employing different method strategies with cow milk have collectively identified, apart from the major proteins, over 700 low abundance proteins in cow whey [6,7,14,30]. The large dynamic range of protein abundance has hindered conventional protein identification approaches, such as ion exchange chromatography and 2D-PAGE coupled with mass spectrometry. A recent proteomic study of whey from sheep colostrum (milk produced during late pregnancy) using 1D-PAGE coupled with LCMS/MS identified 343 proteins [31]. In a previous proteomic study comparing human colostrum with late lactation human milk considerable differences in protein complement were found [18], indicating the important role played by colostrum in early infant development. More recently the use of the ProteoMiner enrichment procedure prior to conventional fractionation methods has been reported to increase the depth of protein identification in cow and human milk [17,19]. However, it has also been reported that some proteins may have a relatively low binding affinity for the ProteoMiner combinatorial peptide ligand library and may not be represented in the enrichment [32]. In our study it is apparent that some proteins, possibly many, bind to more than one hexapeptide ligand as it appears some proteins are still relatively over-represented after performing the Pro-teoMiner enrichment (Fig 1). However, it is clear that the ProteoMiner enrichment generates a significant compression of the dynamic range, enabling a much greater in-depth proteomic analysis. This is consistent with previous studies in which minor proteins in defatted milk such as galectin-3 binding protein, fatty acid synthase, and actin were significantly enriched, and an even greater enrichment of these minor proteins was achieved when skim milk was depleted of caseins [17,19].
Two electrophoresis fractionation methods following protein abundance normalization increased depth of identification of sheep whey proteins A 1D-PAGE separation of ProteoMiner enriched sheep whey in six segments, subjected to an in-gel digest/LCMS/MS workflow, generated 483 protein identities (S1A Table) after the LCMS/MS data were searched against the NCBInr sheep protein database. The 1D-PAGE gel lane was cut into segments in an attempt to maximise the fractionation by segregating apparent higher protein abundance regions of the gel away from those of lower abundance. It was appreciated that the limited resolution of the 1D-PAGE and also the processing of the gel lane would likely result in loss of some of the lower mass proteins, and would limit the extent of in-depth analysis. This was then followed by OFFGEL IEF separation of in-solution digested ProteoMiner enriched sheep whey into 24 fractions. After LCMS/MS of each of the 24 OFFGEL fractions the number of proteins identified was 654 (S1B Table). Combining results from 1D-PAGE and OFFEGL electrophoresis resulted in 669 identified proteins (S1C Table).
The 669 proteins in sheep whey identified here is significantly higher than that of previous individual reports of 149 proteins in cow milk whey [17], 415 in human milk [19], and 115 in human milk whey [18] in which protein abundance was normalized with a ProteoMiner kit but subsequent fractionation did not include OFFGEL fractionation. A study on sheep subclinical mastitis, employing 2D-PAGE followed by LCMS/MS of differentially expressed proteins, resulted in identification of 39 proteins in sheep milk whey and 140 in a milk fat globule membrane fraction [8,33].

Sheep whey proteins are involved extensively in immunological and inflammatory responses
Of the 669 proteins identified in sheep whey reported here, 606 were found to be from unique genes (S2 Table) and were subjected to gene ontology (GO) analysis through the Database for 1D-PAGE of sheep acid whey before and after ProteoMiner treatment. Sheep acid whey was subjected to ammonium sulfate precipitation, desalting by 2 kDa cut-off dialysis and concentration to 60 mg. mL −1 prior to treatment with a ProteoMiner kit according to the manufacturer's instructions. The ProteoMiner enriched protein fraction was desalted using a 2D Clean Up Kit and one quarter of the material was separated by 1D-PAGE using a 12 well Novex BOLT 4-12% bis-Tris electrophoresis gel. Novex Sharp Pre-Stained Protein Standards (Life Technologies, Auckland, NZ) were run in one lane for calibration. The gel was stained with Simply Blue SafeStain (Invitrogen, Auckland, NZ). After staining, the Proteominer enriched sample whole gel lane was excised, cut into six segments as indicated and subjected to an in-gel digest/ LCMS/MS workflow. Lf, lactoferrin; Sa, serum albumin; Cn, casein; β-Lg, beta-lactoglobulin; α-La, alpha-lactalbumin; and Lz, lysozyme. Annotation, Visualization and Integrated Discovery (DAVID) (http://david.abcc.ncifcrf.gov/ home.jsp), using the Homo sapiens genome annotation as background, to obtain statistically enriched biological processes and molecular functions in which the proteins are involved (S3A and S3B Table). Fig 2 summarizes proportions of proteins involved in the top 15 enriched biological process and molecular function terms.
Proteins in sheep whey appear to be overwhelmingly involved in responses to and regulation of immunity and inflammation, for example acute inflammatory response, complement activation and innate immune response, making up 12 of the top 15 most enriched GO biological process terms with P values (Benjamini corrected enrichment score) from 1.68×10 −20 to 3.63×10 −09 . These proteins are involved in a wide range of molecular functions including molecule binding, proteolysis, oxidation and antioxidant activities.
Although the protein content of milk has historically been viewed as primarily a nutritional source for infant growth and development, there has been an increasing awareness of additional health benefits. While the highly abundant casein protein group provides a principal nutritional role, the presence of antibodies, cytokines, lysozyme, lactoferrin, antimicrobial proteins and peptides, xanthine oxidase, and many other proteins in the whey fraction confer functions other than purely to nourish. Indeed, various proteins and peptides in milk have been shown to enhance solubility and absorbability of iron, zinc and calcium, benefiting bone health [34]. Furthermore, it is reported that milk provides subtle benefits throughout life, including reducing the risk of gastrointestinal infection [16] and lipid and lipoprotein metabolism complications [35].
It is also well established that milk proteins are precursors of bioactive peptides generated during digestion in the gastrointestinal tract [36]. This has resulted in growing momentum in the field of predictive bioactive peptide discovery [37,38], which can be enhanced by in-depth proteome analyses.
Another approach in studying putative protein functions in milk is to analyze protein-protein interactions in the complex protein mixture [39]. D'Alessandro and colleagues, by compiling an exhaustive list of 573 proteins in bovine milk and 285 proteins in human milk from previous proteomic and functional studies, were able to illustrate the overall inter-relationship, as well as potential individual interactive pathways between milk proteins [6,40]. Extensive proteomic data of sheep milk whey provides a means for further in-depth protein-protein interaction analysis of sheep milk in comparison to that of other ruminants including cow.
Due to an incomplete functional annotation of sheep proteins in the literature, functional annotation of homologous proteins in humans was used in this study with the aim of obtaining an overview of the biology of the whey proteome. However, it is appreciated that GO analysis excludes consideration of the biological significance of protein isoforms, protein complexes, or differences in post-translational modifications which have been observed in milk proteins of various species [2]. Furthermore, the use of homologous proteins from other species may lead to inaccurate reflection of the function of some proteins. Similar challenges have been acknowledged for research on other non-model organisms [39,41].
It has been pointed out that although substantial effort has been spent on characterizing the genes in the human genome, functional annotation of human milk proteins or their homologous counterparts in other species are likely representative of the biological activities they perform in other cells or locations [42]. For example, xanthine dehydrogenase/oxidase is known to convert hypoxanthine to xanthine in most cell types. However, investigation of expression of xanthine dehydrogenase/oxidase in the mammary gland suggested that the function of this enzyme in milk is likely to reverse the endocytotic process of milk fat globule secretion [43]. The challenge in understanding gene sharing between tissues is likely applicable to sheep. Hence, further functional investigation of sheep proteins is required to obtain a better understanding of the biology of the sheep milk whey proteome. Nonetheless, GO at the current time is still considered to be the best tool and the most used to analyze proteome inter-relationships, such as between a substantial collection of proteins in milk and milk whey, to obtain an overview of the biology system and further understanding of potential benefits for human consumption [39].

Comparison of the biology of the sheep and cow whey proteomes
Bovine milk is a major human food and agro-economical product that has attracted extensive research in the field of functional food and human nutrition. Cow milk protein-derived peptides and their precursors have been shown to provide numerous health-promoting benefits, such as regulation of the immune, cardiovascular, nervous, and gastrointestinal systems [36]. However, several cow milk proteins, e.g. β-lactoglobulin and α-casein, are considered to be more allergenic than the corresponding proteins in other species such as small ruminants. Milk protein composition is reported to vary with species, arising from differences in physiology and the varying nutritional requirements of offspring. Major proteins in sheep milk have been shown to possess differences in amino acid composition in comparison to those of cow milk [44]. Thus, in-depth comparison of the whey proteome of other species to that of cow has the potential to provide valuable information on the nutritive and health-promoting characteristics of milk proteins for development of functional foods.
In order to compare sheep and cow whey proteomes at an in-depth level, 606 sheep whey proteins identified in this study were compared to those of a list of 783 cow whey proteins from unique genes (S2 Table) that we compiled from the literature in this study [6,7,13,14,30]. This provided an opportunity to compare a significant number of proteins from each species that likely represented similar depth of protein identity. While the two whey proteomes share 233 proteins in common, a greater number (373 for sheep and 550 for cow) appeared to be unique to each species in the current data sets. GO analysis of proteins identified exclusively in sheep and cow proteomes respectively (Fig 3) revealed that while proteins unique to cow are significantly involved in cellular growth and metabolism, those in sheep are distributed in cellular establishment, signaling, protein maturation, and inflammatory and other immune responses. GO analysis of proteins commonly identified in both species (S1 Fig) showed these proteins are mostly associated with immunity.
The high number of unmatched proteins between the two data sets was unexpected. Consequently, possible sources of error were tested, including using a different reference set of gene names for the comparisons. Specifically, in order to coherently compare the list of peptides between different species and to utilize the superior annotations present for human genes, the gene symbols encoding the human homologues (RefSeq database) were identified for both sets of peptide hits. In light of possible copy-number changes and possible gene losses in the human genome compared to the genomes of the animals analyzed here, the analysis was repeated with the difference that the matching was done to the bovine RefSeq peptide database as opposed to human. Nevertheless, the resulting comparison did not differ from that performed with the human gene symbols. Consequently, no errors arising from this approach were identified, therefore, even though surprising, the high degree of mismatch appears not to be an artifact, but a biological characteristic.
Investigation of the distribution of proteins in the top 15 enriched biological process and molecular function GO terms shared by the two species (Fig 4) shows that a comparable number of sheep milk proteins are involved in most of the top 15 enriched biological process and molecular function GO terms. While sheep and cow milk whey exhibit many similar biological process and molecular function characteristics in which the same proteins can be found (S1 Fig), uniquely identified proteins associated with each of the top 15 common biological process and molecular function GO terms in cow and sheep milk whey respectively are listed in Table 1. A comparison of the major proteins in milk of five different ruminants revealed that while β-lactoglobulin is the major whey protein in some species, the absence of this allergenic protein is observed in others such as human and camel [45]. In addition, the authors detected the presence of different isoforms of kappa-casein, which has been shown to be a potential allergen in different species [46]. Thus, in-depth comparison of milk between species may provide insights to sources of allergenic components.
The current data set of milk whey proteins from the two ruminant species suggests that while a comparable number of milk proteins in sheep and cow are involved with inflammatory, immune and defense responses, the composition of the proteins involved in these biological   processes differ between the two species (Table 1). Of the top 15 biological process GO terms, 51 and 82 milk proteins unique to sheep and cow were identified, respectively. The identity and associated GO terms of proteins unique to sheep and cow whey are summarized in S4 and S5 Tables. The identity and associated GO terms of proteins common in whey of the two species are summarized in S6 Table. Of these proteins, those in sheep milk whey such as proteins of the complement system C1QA, C1QB, C1QC, and C8A, together with CRP, KLKB1, KRT1, MASP2, TLR4, and YWHAZ are associated with many of the top 15 biological process GO terms including acute inflammation and defense response. These proteins appear to be absent in cow milk whey. Findings in this study are consistent with previous studies in the literature in which cow milk was suggested to be almost devoid of complement in the absence of inflammation [47]. The presence of proteins of the complement system in sheep is therefore of interest for further investigation with regard to advantages of sheep milk in terms of reduced allergenicity. Although sheep milk protein fractions, and particularly the milk fat globule membrane (MFGM) fraction, have been investigated for protein markers of presentations such as mastitis [33] as far as we are aware a definitive list of proteins associated with such presentations has not been reported, it is therefore not possible at this stage to determine whether or not the sheep milk used in this proteomics study displays markers of presentations such as mastitis. The milk samples used in this study were obtained from three individual sheep, as mentioned earlier in the Experimental Procedures section, from a local small farm where all of the sheep have been closely monitored through their life and hence to the best of our knowledge we consider that the sheep were healthy. Evolution of milk is increasingly a subject of investigation as part of an effort to understand the function of milk constituents and their potential health benefits [48,49]. Milk protein composition diversity illustrates that milk is species specific. Lemay and colleagues, using 197 cow milk protein genes and the mammary genomes of various species, concluded that apart from differences in protein composition that are partly due to copy number and sequence variation, most divergent proteins in milk are associated with nutritional and immunological processes [50]. Furthermore, a study of the whey proteomes of five different ruminant species reported a differential expression pattern of 211 proteins [20]. Results from the data set of the milk whey proteomes of cow and sheep in the present study confirm that, while a common core protein set is present in the two species to ensure optimal immunity for new-born, an exclusive list of proteins appears to be present in each of the two species, that likely reflects differential development of offspring physiological requirements of the two species during the course of evolution.

Conclusions
This study contains, to our knowledge, the largest inventory of sheep whey proteins analyzed to date. Comparison of this sheep whey protein inventory with an inventory of cow whey proteins obtained from the literature, using proteomic and bioinformatic approaches, enabled a more in-depth analysis of similarities and differences in the composition of the sheep and cow whey proteomes. The significant increase in the number of proteins and protein complexes identified in sheep whey enabled a more extensive analysis of the biological significance of these proteins. Gene ontology analysis of the sheep and cow whey proteomes not only indicated that a comparable number of proteins in milk whey of the two species is involved in inflammatory, immune and defense responses, also enabled identification of unique proteins in sheep milk whey associated with various biological processes, substantiating a call for further investigation in benefits of utilization of sheep milk whey. This study therefore provides extended insight into the sheep whey proteome, in comparison to cow, with the potential to facilitate research in the design and manufacture of functional foods using sheep whey proteins, particularly in relation to utilization of whey obtained as a co-product of cheese manufacture.   Table. Proteins common in milk whey of the two species associated with the top 15 biological process GO terms that are shared by both species. (XLSX)