Characterization of an Equine α-S2-Casein Variant Due to a 1.3 kb Deletion Spanning Two Coding Exons

The production and consumption of mare’s milk in Europe has gained importance, mainly based on positive health effects and a lower allergenic potential as compared to cows’ milk. The allergenicity of milk is to a certain extent affected by different genetic variants. In classical dairy species, much research has been conducted into the genetic variability of milk proteins, but the knowledge in horses is scarce. Here, we characterize two major forms of equine αS2-casein arising from genomic 1.3 kb in-frame deletion involving two coding exons, one of which represents an equid specific duplication. Findings at the DNA-level have been verified by cDNA sequencing from horse milk of mares with different genotypes. At the protein-level, we were able to show by SDS-page and in-gel digestion with subsequent LC-MS analysis that both proteins are actually expressed. The comparison with published sequences of other equids revealed that the deletion has probably occurred before the ancestor of present-day asses and zebras diverged from the horse lineage.


Introduction
Horses are of minor importance in global dairy production, but mare's milk has traditionally been consumed in Mongolia, Kazakhstan, Kyrgyzstan or Tajikistan [1]. The global amount of production is not exactly known, but it has been estimated that approximately 30 million people worldwide are regularly consuming mare's milk [2]. Also in Europe, especially in Italy, Hungary, The Netherlands and Germany, the production and consumption of mare's milk have gained more and more importance; roughly 1 million kg of mare's milk are produced in Europe [3]. This increased interest is mainly based on positive health effects. The milk of horses and donkeys is e.g. tolerated by the majority of children suffering from cow's milk protein allergy, a condition that affects approximately 2% of infants when nourished with milk replacements on cow milk basis [4,5]. Moreover, positive effects of mare's milk consumption on diseases like atopic dermatitis [6], Morbus Crohn [7] or cardiovascular diseases [8] have been reported.
The composition of equine milk, and especially the milk protein fraction, is very different from that of cows' milk. It is lower in fat and protein, but has a high lactose content similar to what is found in human milk [9,1]. While in cattle the casein fraction accounts for the major part of the total milk protein, the casein to whey ration in horses is around 1.1:1, which more closely resembles human milk [9,1]. In fact, it has been reported that the balance between caseins and whey proteins is a major determinant of cow's milk allergenicity [10] possibly giving an explanation for the low allergenic potential of horse milk. However, there is also strong evidence that genetic milk protein variants affect the allergenicity of milk protein based on the presence or absence of particular epitopes [11,12]. While there has been intense research into the genetic variability of milk proteins in ruminants and especially in dairy cows [13], the knowledge about equine milk protein variation is scarce, especially for the caseins. However, in the donkey different variants of α S2 -casein have been described, also involving a large deletion exons 4-6 [14].
In the present study, we characterized a major protein variant arising from a 1.3 kb inframe-deletion covering two exons and proved the protein by means of LC-MS based analytics at the protein level.

Animals and Samples
Genomic DNA was extracted from hair samples of 193 domestic horses from 8 different breeds that are actually used for mare's milk production in Germany applying a modified Miller protocol [15]. The animals were selected to be as unrelated as possible. Hair samples were obtained from 14 different private studs with permission of and in cooperation with the owners by pulling out several hairs from the mane or the tail. Furthermore, individual milk samples were collected from four Haflinger mares with known genotype. These samples were taken by the owners during routine milking of the mares. In concordance with German Animal welfare legislation, these sampling procedures do not require a permission or approvement.

DNA Sequencing
Primer pairs were designed to amplify the coding exons contributing of equine CSN1S2 and adjacent intronic regions using the Primer 3 software [16] based on the genomic reference sequence of the casein gene CSN1S2 (Acc. No NC_009146.2). A further Primer pair (Forward: 5'-GGAAAAGATTTGTGAGCCATTTG-3', Reverse: 5'-GCTGGATAATTGCTCAACACT CA-3') was designed to specifically amplify the entire region of CSN1S2 encompassing the deletion. PCR amplification and DNA sequencing were done as previously described [17]. The obtained sequences were analyzed and compared with the genomic reference sequence (Acc. No. NC_009168.2) using the software Sequencher 4.9 (Gene Codes Corp., Ann Arbor, MI).

RNA Isolation from Milk Samples and cDNA Synthesis
Individual milk samples were obtained from four mares with known deletion genotype. An aliquot of 40 ml was centrifuged at 6,000 g for 10 minutes. The supernatant including the milk fat layer was discarded and remaining milk fat was thoroughly removed with alcohol wipes. The cell pellet was washed three times with 1x phosphate buffered saline. Cells were homogenized using QIAShredder columns (Qiagen, Hilden, Germany) and total RNA was isolated using the Qiagen RNeasyMini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. The isolated RNA was transcribed into cDNA using SuperScript 1 III First-Strand Synthesis SuperMix kit (Invitrogen) with oligo-dT primers. PCR amplification and sequencing was done with primers located in the untranslated regions (Forward: 5'-TGCCTGCACTTTC TTGTCTTCCA-3', Reverse: 5'-TGCACAGTCTTCATTTGGCTTGA-3').

Protein and Peptide Analysis
Individual milk samples of two mares with known genotype were used for protein analysis. These samples were dialyzed to remove lactose, subsequently freeze dried and stored at -18°C for 4 months. Lyophilized milk powder was dissolved in Laemmli buffer (1x) at a concentration of 2 mg/mL and 10 and 20 μg of crude protein was loaded onto a 12% SDS-PAGE gel (150V for 85 min). Gel bands were destained, reduced and alkylated and then subsequently in geldigested overnight with trypsin (60 ng) using standard protocols. Peptides were extracted from the gel, dried down using vacuum centrifugation and resuspended in 3% acetonitrile (ACN) and 0.1% trifluroacetic acid (TFA) before being analyzed by LC-MS.
MS scans were acquired in the mass range of 300 to 2,000 m/z at a resolution of 70,000. The ten most intense signals were subjected to HCD fragmentation using a dynamic exclusion of 15 s. MS/MS parameters-minimum signal intensity: 1000, isolation width: 3.0 Da, charge state: !2, HCD resolution: 15,000, Normalized collision energy of 25. Lock mass (445.120025) was used for data acquired in MS mode.
HCD spectra were searched using Proteome Discoverer 1.4 (1.4.0.288, Thermo Fisher Scientific) with the Sequest-HT search algorithm against the complete reviewed and unreviewed Equus caballus database (28,188 sequences, downloaded 2015.07.16) with common contaminants (ftp://ftp.thegpm.org/fasta/cRAP/) appended. The following database search settings were used: MS tolerance; ± 10 ppm, MS2 Tolerance; 0.02 Da, enzyme specificity; trypsin with up to three missed cleavages allowed. Carbamidomethylation on cysteine residues was set as a fixed modification while, oxidation on methionine, and phosphorylation on serine and threonine residues was set as a variable modification. Only peptides which were identified with medium confidence (FDR <5%) were included.The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://www.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD002834.

DNA Sequencing and Mutation Screening
The current annotation of the equine CSN1S2 gene (GeneID 100327035) is based on the mRNA reference sequence NM_001170767.2 containing 15 coding exons with an open reading frame of 645 bp. In an attempt to resequence the open reading frame using exon flanking primer pairs, we recognized that the PCR reactions for exons 8 and 9 consistently failed in particular horses. In order to unravel the possible cause for this phenomenon, we amplified a 2.6 kb fragment spanning the entire region. While the expected product was obtained from samples that had been successfully amplified before, the product obtained from initially unsuccessful samples was found to be approximately 1.3 kb shorter (Fig 1). Subsequent Sanger sequencing of the products revealed the presence of a 1,339 bp deletion in the short variant (Fig 2A), while the long product was found to completely correspond to the genomic reference sequence (NC_009146.2). Analysis of this sequence revealed the presence of a 309 bp duplication of the region encompassing exon 8 of the gene (Fig 2A). Because this duplication is located exactly at the boundary of the deletion, the exact position cannot be determined, i.e. it cannot be ruled out, whether the upstream or downstream duplicate is involved in the deletion.
A total of 193 horses belonging to 8 breeds (Table 1) were tested for the presence of the deletion by PCR and subsequent agarose gel electrophoresis (Fig 1). The deletion was found to be present in all analyzed breeds; the highest frequencies of 0.36 and 0.25 were observed in Haflinger and Icelandic horses, respectively. Notably, these breeds are common in mare's milk production, especially the Haflinger breed is widely used. This might possibly indicate an effect of the mutation or a certain casein-haplotype on milk yield as this is e.g. the case in cattle [18,19].

Analysis of Transcripts
The duplicated region within the 1.3 kb deletion contains a coding exon with a length of 24 bp. The two copies were found to be completely identical including intact splice sites. However, only one of the identical exons is present in the current RefSeq transcript NM_001170767.2. Thus, it was unclear which exons are transcribed and whether both variants are transcribed at all. Therefore, we purified total RNA from the skimmed milk of four mares, three of them being homozygote for the long and one for the short variant, respectively. After reverse transcription, the CSN1S2 transcripts were amplified using primers located in the untranslated regions. Agarose gel electrophoresis of the PCR products revealed a difference of approximately 50 bp between the alternatively homozygote animals (Fig 3) showing that both, a long and a short transcript, were actually expressed. Subsequently, the open reading frames of both transcripts were sequenced. The difference was found to be due to a 51 bp in-frame insertion/ deletion after exon eight encompassing the duplicated exon as well as a previously not annotated exon that perfectly aligns within the 1.3 kb deletion (Fig 2A and 2B). Exon numbering was consequently adapted counting the newly annotated exon as exon 9 and the duplicate of exon 8 as exon 10 [21,22], also revealed the presence of the long variant including the duplicated exon. Furthermore, the small read archive data sets of a Middle Pleistocene horse, the "Thistle Creek horse" sequenced by Orlando and colleagues [22] (BioProject Accession PRJNA205517), as well as of several asses and zebras sequenced by Jónsson and colleagues ( [23], BioProject Accession PRJEB7446) were checked for reads either falling into the deleted region (indicating the long variant) or being split at the boundaries (indicating the short variant). Only a single read almost perfectly aligning within the deletion was found in the Middle Pleistocene horse, which might point to the presence of the long variant. In the Somalian wild ass (E. asinus somalicus), we only identified the long variant, while both alleles were found in the Tibetian Kiang (E. kiang). The sequenced Onager (E. hemionus onager) was found to be homozygous for the deletion. These findings indicate that the duplication event giving rise to an additional coding exon as well the deletion might be specific to equids and must both have occurred before the ancestor of present-day asses and zebras dispersed into the Old World 2.1-3.4 Mya [23]. However, all analyzed zebras [23] (Hartmanns Mountain zebra, E. zebra hartmannae; Grevy zebra, E. grevyi; Böhm's plains zebra, E. quagga boehmi as well as the extinct Quagga, E. q. quagga) were found to be homozygous for the complete deletion. It seems also possible that the deletion initially occurred in horses and represents the result of a gene flow between horses and ass species. It has been shown that this has played a significant role in equid evolution [23].

Protein and Peptide Analysis
Lyophilized milk powder from two mares being homozygote for the long and short variant, respectively, were analyzed using SDS-PAGE resulting in different patterns of distinct bands (Fig 4). In-gel trypsin digestion and analysis by LC-MS revealed the presence of unique peptides only for the long form of α s2 -casein (A0A0C5DH76) in milk of the animal homozygous for the insertion, i.e., peptides FPTEVYSSSSSSEESAK, FPTEVYSSSSSSEESAKFPTER, FPTE-VYSSSSSSEESAKFPTEREEK and NINEMESAKFPTEVYSSSSSSEESAK. Interestingly, these peptides were identified in both phosphorylated (singly phosphorylated at different residues) and non-phosphorylated forms. Evidence for multiple phosphorylations on these peptides was also observed. Unique peptides for the short form of α s2 -casein (D2KAS0) were only identified in milk from the mare homozygous for the deletion, i.e., NINEMESAKFPTER, NINEMESAK FPTEREEK, NINEMESAKFPTEREEKEVEEK (Fig 5, Table 2). As commonly observed in MS based protein analytics, a 100% sequence coverage was not reached; however, the proteotypic peptides identified allowed clearly to distinguish the two equine α S2 -casein variants. Therefore, it can be concluded that both protein variants differing in length by 17 aa are expressed. The comparative analysis has shown that the long variant is probably the equid specific ancestral variant, but the deletion also seems to have been present before zebras and asses diverged from horses. Thus, we propose to term the long variant CSN1S2 Ã A and the short variant CSN1S2 Ã B. Generally, the milk proteome is very complex both due to the presence of genetic variants and posttranslational modifications [24]. Recent proteomic studies [25,26] have demonstrated  Combined sequence coverage (all bands excised) of α S2 -casein in mare with +/+ genotype (A) and Δ/Δ genotype (B). From the +/+ genotype only unique peptides were identified for the longer α S2 -casein form (Accession no. A0A0C5DH76, 231 AAs). Sequence which is unique to A0A0C5DH76 is underlined. For the Δ/Δ genotype only unique peptides were identified for the shorter alphaS2-casein form (Accession no. D2KAS0, 214 AAs). Sections in green represents parts of the protein which were identified by the sequest-HT algorithm.
doi:10.1371/journal.pone.0139700.g005 considerable microheterogeneity for equine caseins, especially κ-casein [25]. However, these studies did not report any findings regarding α S2 -casein, probably due to its very low concentration in horse milk [1]. However, Ochirkhuyag et al. [27]reported the presence of two distinct bands for this protein. In our study, both genetic variants as well as differently phosphorylated peptides have been detected for α S2 -casein.

Conclusion
Within the current study, we have characterized two major variants of equine α S2 -casein, which we named CSN1S2 Ã A and CSN1S2 Ã B. The variation is due to a 1.3 kb in-frame deletion involving two coding exons corresponding to 17 amino acid residues. One of those exons has Table 2. Bands excised from a mare with +/+-genotype (upper part) and Δ/Δ genotype (lower part), respectively, and major milk proteins identified are shown (only those with > 20 PSMs). AlphaS2-casein with accession no. A0A0C5DH76 is the long form (231 AA) while alphaS2-casein with accession no. D2KAS0 is the shorter form of alphaS2-casein (214 AAs). Unique peptides were only identified for the long form of alphaS2-casein (A0A0C5DH76) in the +/+-mare, while unique peptides for the short form of alhphaS2-casein (D2KAS0) were identified only in milk from the Δ/Δ-mare. arisen from a duplication that is probably specific to the equid lineage. We verified both genomic variants at the transcript as well as the protein level and were able to demonstrate that these variants are also segregating in asses, meaning that they are likely to have occurred before the first ancestor of present-day asses and zebras dispersed into the Old World 2.1-3.4 Mya.