Cryptosporidium parvum vaccine candidates are incompletely modified with O-linked-N-acetylgalactosamine or contain N-terminal N-myristate and S-palmitate

Cryptosporidium parvum (studied here) and Cryptosporidium hominis are important causes of diarrhea in infants and immunosuppressed persons. C. parvum vaccine candidates, which are on the surface of sporozoites, include glycoproteins with Ser- and Thr-rich domains (Gp15, Gp40, and Gp900) and a low complexity, acidic protein (Cp23). Here we used mass spectrometry to determine that O-linked GalNAc is present in dense arrays on a glycopeptide with consecutive Ser derived from Gp40 and on glycopeptides with consecutive Thr derived from Gp20, a novel C. parvum glycoprotein with a formula weight of ~20 kDa. In contrast, the occupied Ser or Thr residues in glycopeptides from Gp15 and Gp900 are isolated from one another. Gly at the N-terminus of Cp23 is N-myristoylated, while Cys, the second amino acid, is S-palmitoylated. In summary, C. parvum O-GalNAc transferases, which are homologs of host enzymes, densely modify arrays of Ser or Thr, as well as isolated Ser and Thr residues on C. parvum vaccine candidates. The N-terminus of an immunodominant antigen has lipid modifications similar to those of host cells and other apicomplexan parasites. Mass spectrometric demonstration here of glycopeptides with O-glycans complements previous identification C. parvum O-GalNAc transferases, lectin binding to vaccine candidates, and human and mouse antibodies binding to glycopeptides. The significance of these post-translational modifications is discussed with regards to the function of these proteins and the design of serological tests and vaccines.


Introduction
C. parvum infects humans and cows, while C. hominis only infects humans [1][2][3]. C. parvum was first identified as an opportunistic pathogen and cause of severe diarrhea in AIDS patients [4,5]. In 1993, C. parvum contaminated the municipal water supply in Milwaukee, Wisconsin, U.S.A and caused a massive outbreak of diarrhea among immunocompetent persons [6,7]. More recently C. parvum has been shown to be the second most important cause (after PLOS ONE | https://doi.org/10.1371/journal.pone.0182395 August 8, 2017 1 / 22 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 glycosyltransferases [52][53][54]. The resulting N-glycans, which are likely GlcMan 5 GlcNAc 2 and Man 5 GlcNAc 2 , are remarkable for their simplicity, as compared to the complicated N-glycans identified in other protists [55]. In this report, we used mass spectrometry to characterize tryptic glycopeptides of lysates of C. parvum oocysts and thereby directly determine the number and some of the positions of O-GalNAc residues on Gp40, Gp15, Gp900, and a previously uncharacterized glycoprotein with a predicted weight of 20-kDa (named here Gp20). Mass spectrometry of hydrophobic peptides also detected the addition of myristoyl and palmitoyl groups to the first and second residues, respectively, at the N-terminus of Cp23 [56,57].

Reagents and parasites
Freshly passaged C. parvum oocysts were purchased from Bunch Grass Farm (Deary, ID) and handled under BSL-2 protocols, with the approval of the Boston University Institutional Biosafety Committee. All reagents and chemicals were purchased from Sigma-Aldrich (St. Louis, MO), unless noted otherwise. Solvents used for LC-MS were Optima™ grade, procured from Fisher Scientific (Thermo-Fisher Scientific, Waltham, MA).

Protein extraction and trypsin digestion
Procedures for extracting proteins from C. parvum oocysts and digesting them with trypsin have recently been described in detail [54], and so a summary of the methods is presented here. Briefly, 10 9 C. parvum oocysts were concentrated by centrifugation, washed 3X with PBS, and resuspended with PBS containing EDTA-free cOmplete TM protease inhibitor (Roche, Basel, Switzerland). Oocyst walls were disrupted in a bead beater with 0.5-mm glass beads and centrifuged. The PBS supernatant was removed and saved, while the remaining insoluble materials and beads were extracted with a solution composed of 10 mM HEPES, 25 mM KCl, 1 mM CaCl 2 , 10 mM MgCl 2 , 2% CHAPS, 6 M guanidine HCl, 50 mM dithiothreitol, 1X protease inhibitor, pH 7.4. The resulting guanidine-DTT supernatant was combined with the PBS supernatant, and the insoluble material was discarded. The proteins were then precipitated, and the pellet was washed with methanol and vacuum dried. Alternatively, oocyst proteins were extracted with hot phenol, and phenol and interphase layers were kept, while the aqueous layer was discarded. Proteins were precipitated with methanol containing 100 mM NH 4 OAc and dried. The pelleted proteins were resuspended in 50 mM NH 4 HCO 3 , pH 8.0, reduced with 50 mM DTT, alkylated with iodoacetamide, and then digested with proteomics grade trypsin (Sigma-Aldrich, St. Louis, MO). Tryptic peptides were dried and desalted using C18 ZipTip concentrators following the manufacturer's protocol (EMD Millipore, Danvers, MA).

Mass spectrometry
The LC-MS/MS methodologies and the manual interpretation of MS/MS spectra of C. parvum glycopeptides containing O-glycans were performed using the methods described for C. parvum N-glycosylated peptides [54]. Desalted and dried peptides from three biological replicates were dissolved in 2% ACN, 0.1% formic acid (FA) and separated using a NanoAcquity Ultra Performance Liquid Chromatography (UPLC) system (Waters. Milford, MA), fitted with a nanoAcquity Symmetry C18 trap column and a BEH130C18 analytical column. Solvent mixtures for the mobile phase gradient were 99:1:0.1 HPLC grade water/ACN/FA and 99:1:0.1 ACN/HPLC grade water/FA. The UPLC was coupled to a TriVersa NanoMate ion source (Advion, Ithaca, NY), operated at 1.5 kV to introduce ions into either an LTQ-Orbitrap-XL or a QE Plus mass spectrometer (Thermo-Fisher Scientific, San Jose, CA). Both mass spectrometers were operated in the positive-ion mode. MS 1 spectra were recorded over the range m/z 350-2000. MS 2 HCD spectra were acquired by isolating the top 5 (LTQ-Orbitrap) or top 20 (QE+) precursor ions with a 2-m/z window and fragmenting the selected precursor ions with 15-45 V HCD energy. The lower energy MS 2 HCD spectra were scanned from m/z 100 to an upper m/z value, which was dependent upon the parent ion m/z. For the 45-V HCD spectra, ions below m/z 210 were excluded to avoid trapping the very abundant HexNAc oxonium ion. Manual interpretation of mass spectra. Data obtained from LC-MS/MS experiments were first examined using Qual Browser in the Xcalibur 2.2 software suite (Thermo-Fisher Scientific). Extracted ion chromatograms were generated from MS/MS spectra for oxonium ions of interest (HexNAc, m/z 204.0866; Hex-HexNAc m/z 366.1395; HexNAc 2, m/z 407.1670). Spectra containing one or more of these ion(s) were then manually interpreted [54]. Once a sequence was obtained, it was searched against the 3,803 entries within the C. parvum Iowa-II predicted proteome and cross-searched within the entire NCBI nr database, using the online NCBI BLASTP algorithm (https://blast.ncbi.nlm.nih.gov/Blast.cgi) [58][59][60]. The software, Glycoworkbench v2.1, release 146, was used to help calculate glycan compositions [61]. O-glycosylated peptides utilized HexNAc almost exclusively. Due to the labile nature of O-linked glycans, b and y ions containing one or more HexNAc residues typically had very low abundances. The charge-reduced molecular ion that had undergone the loss of one or more Hex-NAc residues was often observed. The information obtained from manual interpretations was then used for database searches, allowing for deeper sequencing of the data and higher throughput processing of samples. The peak list, the assigned ions, and their mass errors for manually annotated spectra shown in the figures are listed in (S2 Excel File).
Database searches for glycopeptides. Automated database searches were performed using the PEAKS software suite version 8.0 (Bioinformatics Solutions Inc., Waterloo, ON, Canada), using recently described methods for N-glycans with modifications [54]. The search criteria were set as follows: trypsin as the enzyme with two missed cleavages and one nonspecific cleavage, the error tolerances for the precursor of 6 ppm and 0.02 Da for fragment ions, carbamidomethyl cysteine as a fixed modification, and the dynamic modifications on Ser/Thr with (HexNAc to HexNAc 4 ) with six/peptide. The peptide match threshold (-10 logP) was set to 15, with estimation of the false discovery rate (FDR), a 5.6 FDR was calculated. A multi-round search was performed using the de novo only results from the first PEAKSDB search to find peptides with attached lipids. The second search parameters were identical to the prior PEAKSDB search, with the exception that myristate (N-term, Ser, Thr) and palmitate (Cys, Ser, Thr, Lys) were specified as dynamic modifications, and HexNAc modifications were removed. The results from the searches were exported into Excel and collated.
Re-annotation of glycopeptides from automated database searches. For multiple reasons, the PEAKS DB search algorithm failed to annotate the product ions appropriately. Therefore, the glycopeptide results from the PEAKSDB search were exported in mzIdentML 1.1 format [62], manually verified, then provided to GlycReSoft (a software package developed in-house for glycopeptide discovery and annotation). The code for GlycReSoft, which is currently in active development with periodic updates and improvements, is open source and freely available from the online repository: https://github.com/BostonUniversityCBMS/ glycresoft. All peptides listed in the mzIdentML document and all non-redundant theoretical tryptic digest peptides for each included protein were used as templates, upon which a database of theoretical glycopeptides was constructed. Glycosylation was permitted at up to 20 putative sites. All distinct combinations-with-replacement with the putative glycan compositions were generated. For each template peptide, theoretical glycopeptides were produced by assigning glycosylation events for combinations of between 1 and k glycosylation sites, where k is the total number of potential glycosylation sites. The combinatorial complexity was reduced by limiting the number of possibilities to the first 100 combinations, for glycopeptides having an excess of 100 possible placements.
Each dataset was deisotoped, charge state deconvolved, and searched independently against the database described above. Individual datasets of MS/MS scans in the range m/z 100-240 were filtered, and only tandem mass spectra for which the average ratio of oxonium ion signal to maximum signal exceeded 5% were considered. In addition to including normal peptide backbone fragments, the search considered spectra containing peaks that indicated either the presence of a HexNAc residue or its loss. The software also searched for the intact peptide backbone with zero or more partial losses of each potential glycan. Glycopeptide-spectrum matches were evaluated based upon joint binomial intensity-backbone coverage criteria, which included in a novel algorithm that is based in part on a binomial scoring function described previously [63]. The lists of ions assigned for each of these spectra are located in S1 Excel File. S4 Fig shows a representative spectrum annotated by GlycReSoft (one of 345) submitted to the ProteomeXchange Consortium [64].
Other bioinformatic methods. The furin-like protease site that separates Gp40/Gp15 was predicted by the online tool "ProP 1.0 Server", made available by the Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark (www. cbs.dtu.dk/services/ProP/) [65]. Signal peptides and transmembrane helices were predicted using the online tool Phobius (http://phobius.sbc.su.se/index.html) [66]. The GPI-anchor site of Gp15 was predicted using the BIG-PI prediction server (http://mendel.imp.ac.at/gpi/gpi_ server.html) [67]. Cartoon representations of proteins and protein features, which were mapped with all the peptides across all MS/MS experiments, were generated using the online software tool Protter v.1.0 (http://wlab.ethz.ch/protter/start/) [68]. The assigned peptides from the PEAKSDB search results were used to map to the proteins of interest. The protein features were mapped using the results from the bioinformatics searches.

O-linked glycan release and characterization
Ser-linked or Thr-linked glycans were released from the proteins via reductive alkaline β-elimination. Briefly, purified proteins from a total oocyst lysate were first lyophilized in glass conical vials. To the dried protein extract, an aqueous solution of 0.1 M NaOH + 1 M NaBD 4 was added. The loosely capped vials were placed into an oven and kept at 45˚C for 18 h. After the incubation period, the borate was removed by extensive washes with 10% acetic acid in methanol and then neat methanol. The released glycans were subsequently separated from the proteins by solid phase extraction columns. To the dried sample, LC-MS grade water containing 0.1% trifluoroacetic acid (TFA) was added; the tube was vigorously vortexed, and the contents were then passed through a C-18 Sep-Pak cartridge (Waters Corporation, Milford, MA). Three bed volumes of 0.1% TFA/water were subsequently passed through the column, and the eluent fractions were pooled and lyophilized. The released O-glycans were permethylated using previously described methods [69,70]. A slurry of powdered NaOH in DMSO was added to the dried, released O-glycans. An equal volume of methyl iodide was added, and the reaction mixture was agitated gently while protected from light. The process was repeated three times to ensure complete permethylation. The product was extracted with chloroform/ water, and the aqueous layer was removed and discarded. Washes with water were repeated until the pH of the solution was that of the LC-MS grade water being used for the washes. The chloroform layer was removed, placed into a new clean vial, dried in a speed vacuum and stored in a dissector at -20˚C until it was analyzed.
Monosaccharide composition determination by GC-MS. The permethylated sugars were identified using GC-MS with a Bruker Scion SQ interfaced to a 436-GC (Bruker Daltonics, Billerica, MA). Separation was performed using a (30 m x 0.25 mm x 0.25 μm) Restek™ Rxi™-5ms capillary column (Restek Corporation, Bellefonte, PA), using helium as the carrier gas. Samples were dissolved in hexane, and then 1 μl of the solution was introduced via an auto-injector, using a split/split-less injection program, maintaining a constant column flow rate of 1 ml/min. The injector temperature was set to 220˚C. The split-less injection sampling was set for 1 min before the split flow was started at 100 ml/min for 1 min. Then a split flow of 50 ml/min was used for the remainder of the program. The initial oven temperature of 60˚C was maintained for 1 min, then ramped at 4˚C/min to 250˚C, with a final ramp to 300˚C at 20˚C/min, and held there for 10 min. Ions generated by an electron impact (EI) ionization source (70 eV) were introduced into the mass analyzer after a 5-min solvent delay. Centroid mass spectra were acquired in the positive mode, scanning the range m/z 50-500, taking 500 ms/scan. An internal standard of permethylated myo-inositol was added to all samples to verify retention time repeatability. Four spectra were averaged, and background was subtracted. The retention times and EI spectra of the released and permethylated glycans were compared to those of deutero-reduced, permethylated monosaccharide standards. Data analysis was performed using the software MS Data Review 8.0 (Bruker). Retention times were compared using extracted ion chromatograms (XIC), for the ion signal at m/z 101, an ion common to GalNAc and GlcNAc. The EI spectra recorded for the standards and β-elimination products were compared at the same time points.

Results
The vast majority of peptides with O-HexNAc derive from Gp40, Gp15, and Gp900, which are vaccine candidates Peptides obtained from trypsin digestion of total proteins of C. parvum oocysts were separated using a UPLC reversed phase C18 column that was online with a mass spectrometer. Peptides were subjected to Higher-energy C-trap Dissociation (HCD), and O-glycosylated peptides were recognized by the observation of a sugar-oxonium ion signal at m/z 204.0866 in the MS/ MS spectra, corresponding to the fragmentation of a precursor containing a HexNAc residue. We reported previously the detection of larger sugar-oxonium ions (corresponding to Hex-HexNAc, HexNAc-HexNAc, and Hex-HexNAc-HexNAc) that all derive from N-glycans [54]. Since there was no enrichment for glycoproteins in the protein preparations (e.g. lectin chromatography), we identified the most abundant glycopeptides without selection bias. Glycopeptides with O-linked glycans originated from three C. parvum vaccine candidates (Gp15, Gp40, and Gp900), as well as Gp20, the immunogenicity of which is unknown (Fig 1, Table 1, and S1 Excel File). We also detected the presence of myristate and palmitate on an N-terminal peptide of the immunodominant antigen Cp23. In addition, we identified at least two peptides without O-HexNAc from each of 811 other C. parvum proteins. Information about these peptides and the glycopeptides described below has been deposited in the ProteomeXchange Consortium.

Dense arrays of O-GalNAc are present on the Ser-rich domain of Gp40
Gp40/Gp15 precursor (cgd6_1080) has an N-terminal signal peptide, a furin cleavage site that separates Gp40 (AA-22 to 220) from Gp15 (AA-221 to 324), and a C-terminal site for the addition of a GPI-anchor (Fig 1) [18-21]. A tryptic glycopeptide (AA-43 to 60) of Gp40, which contains 17 consecutive Ser residues followed by Thr-Ser-Thr, was found to be modified with 15 to 20 HexNAc residues (Table 1 and S1 Excel File). For example, the monoisotopic mass of the precursor ion m/z 1757.2272 [M + 4H] 4+ corresponds to the value calculated for the peptide (43) DVPVEGSSSSSSSSSSSSSSSSSTSTVAPANK (60) with the addition of 20 HexNAc residues (Fig 2 and S2 Excel File). The very abundant HexNAc oxonium ion (m/z 204.0866) and a very low abundance peak that fits the value for HexNAc 2 (m/z 407.1670) are present in . Mass spectrometry showed Gp40 has a Thr-rich domain (AA-43 to 60) with numerous O-linked HexNAc modifications (marked in green, with Ser and Thr residues marked in red). Gp15 contains a single domain (AA-221 to 240) that is glycosylated. Other peptides identified with mass spectrometry are marked in grey. Predicted N-terminal signal peptide is marked in orange, while GPI-anchor signal is marked in olive. (B) A 20-kDa glycoprotein (Gp20) contains two Thr-rich domains (AA-87 to110 and AA-135 to 160), which contain numerous HexNAc modifications. (C) Gp900 contains two very large Thr-rich domains (red brackets), one of which contains a peptide with three HexNAc residues (AA-609 to 623). The transmembrane helix near the C-terminus is encompassed by two horizontal lines, representing a membrane. (D) The N-terminus of Cp23 is modified with N-myristate (C 14 ) and S-palmitate (C 16 ). The start Met is absent (diamond).
the 30-V HCD MS/MS spectrum. The observed dimer could be an artifact generated in the gas phase from the high population of HexNAc monomers. To a very large extent, glycan loss occurs prior to fragmentation of the peptide, with the result that the observed b and y ions contain zero to four HexNAc residues. The only product ion that can be used to assign the Hex-NAc modification to a specific amino acid is the y 7 Ã ion, indicating the presence of HexNAc on the Thr closest to the C-terminus. In a second experiment, to avoid overpopulating the orbitrap analyzer with the less informative HexNAc oxonium ion, the start of the selection window was raised from m/z 100 to m/z 210, and, to ensure the generation of more peptide backbone fragments, the HCD energy was increased to 45 V (S1 Fig and S2 Excel File). The 45-V HCD MS/MS spectrum exhibited extensive fragmentation of the aglycon peptide, which resulted in product ions that composed a nearly complete b and y series (y2-y 25, y 30 ) and Because we saw little evidence for the presence of HexNAc-HexNAc, we assume that each of the 20 potential O-glycan sites is occupied with a single HexNAc residue. We were unable to localize site occupancy in glycopeptides with 15-19 HexNAc residues, due to the labile nature of the O-glycans. Quite likely, the peptides modified with 15 to 19 HexNAc residues are a mixture of components having different occupancies.
Release of O-glycans from C. parvum sporulated oocyst proteins by reductive β-elimination, followed by GC/MS monosaccharide analysis versus sugar standards, showed that the HexNAc residues in the Gp40 glycopeptide and in glycopeptides of the other vaccine candidates are likely GalNAc (S2 Fig). However, we cannot rule out a small amount of GlcNAc, which was suggested by Western blots of Gp15 [71]. In support of this assignment are the previous reports that C. parvum has four O-GalNAcTs, and patient sera recognize synthetic glycopeptides derived from Gp40 and Gp15 with O-GalNAc [32,33]. In summary, the Gp40 spectra

Isolated O-GalNAc residues decorate a glycopeptide of Gp15
A non-tryptic glycopeptide of Gp15 (AA-221 to 240), which results from cleavage of the Gp40/ Gp15 precursor by the furin-like protease, contained one to four HexNAc modifications ( Fig  1, Table 1, and S1 Excel File) [18][19][20][21]. For example, the precursor ion m/z 1326.6164 [M + 2H] 2+ of the most abundant Gp15 glycopeptide has a monoisotopic mass equal to that of the peptide (221) ETSEAAATVDLFAFTLDGGK (240) with the addition of three HexNAc residues (Fig 3 and S2 Excel File). Fragmentation with 30-V HCD yielded a prominent HexNAc oxonium ion (m/z 204.0868) and full series of b and y ions, some of which retained a single Hex-NAc modification (marked with an asterisk). The product ion series y 6 Ã to y 12 Ã indicates Thr-235 is modified, and the series y 13 Ã to y 15 Ã suggests that either Thr-228 or Thr-235 is modified. The b 3 Ã ion indicates that either Thr-222 or Ser-223 is modified. Thus there is evidence for distribution of the three HexNAc residues over the four available sites in this peptide. In the glycopeptide with four HexNAc modifications, all possible O-glycan sites are occupied. Analyses of the fragmentation patterns of numerous other peptides (S1 Excel File), both tryptic and non-tryptic, suggest that Thr-222 is preferentially modified over Ser-223, while Thr-228 and Thr-235 are nearly always modified. Post-translational modifications of Cryptosporidium vaccine candidates Dense arrays of O-GalNAc are present on Thr-rich glycopeptides of Gp20 Gp20 (cgd7_1280), is a small, acidic, secreted protein with four domains with consecutive Thr residues, two of which are described here (Fig 1). The first Gp20 glycopeptide (87) EGEETDE NTDETTTTTTTASPKPK (110) has 10 potential O-glycan sites and was found to be decorated with six to eight HexNAc residues (Table 1 and S1 Excel File). For example, the peak corresponding to the precursor ion of the most abundant Gp20 glycopeptide has a monoisotopic [M + 4H] 4+ m/z 1001.9305 equal to the value calculated for the peptide modified by seven Hex-NAc residues (Fig 4 and S2 Excel File). The 30-V HCD MS/MS spectrum includes a HexNAc oxonium ion (m/z 204.0868) and numerous b and y ions retaining zero to two HexNAc residues (marked with asterisks). Because the vast majority of HexNAc residues were lost prior to peptide fragmentation, it was not possible to define the seven occupied sites or to determine whether the occupancy was heterogeneous. A second Gp20 glycopeptide (135) SSTTTTTTT APVSSEDNKPEDSEDEK (160) with 12 potential O-glycan sites has a monoisotopic mass equal to that of the peptide with the addition of eight HexNAc residues (Table 1 and S1 Excel File). Again glycan loss prior to peptide backbone fragmentation made it impossible to localize the occupied O-glycans sites. Two other Thr-rich domains of Gp20 are present in a 55-amino acid tryptic peptide that was not identified. Regardless, the two Gp20 spectra show that the C. parvum O-GalNAcTs are capable of nearly saturating arrays of Thr residues.
A glycopeptide of Gp900 with consecutive Thr residues is lightly modified by O-GalNAc, while numerous Gp900 glycopeptides contain a single O-HexNAc residue Gp900 (cgd7_4020), which has an N-terminal signal peptide and a transmembrane domain near its C-terminus, is by far the largest of the C. parvum vaccine candidates (1912 amino acids minus the signal peptide) (Fig 1) [16, 17]. One reason for the large size of Gp900 is the presence of a vast array of consecutive Thr residues, which extends from AA-304 to 640. A second Thr-rich region extends from AA-797 to 908. Because of the paucity of tryptic sites in the Thr-rich arrays of Gp900, and the likelihood that the Thr stretches are also heavily O-glycosylated, these regions were not observed by mass spectrometry, with one exception (Table 1, S1 Excel File, and S3 Fig). The precursor ion m/z 732.0284 [M + 3H] 3+ has a monoisotopic mass equal to that calculated for the peptide (609) KPTTTTTTTTTTTTK (623) with the addition of only three HexNAc residues, despite the presence of 12 available sites (S3 Fig and S2 Excel File). The 30-V HCD MS/MS spectrum includes a HexNAc oxonium ion (m/z 204.0868) and numerous b and y ions containing zero to two HexNAc residues (marked with asterisks). Here again, because of the lability of the glycans, it was not possible to precisely define the occupied sites or to determine whether the occupancy was heterogeneous.
Many of the most abundant glycopeptides of Gp900 have a single HexNAc modification at an isolated Ser or Thr residue (Table 1 and S1 Excel File). For example, the precursor ion m/z 895.4646 [M + 4H] 4+ has a monoisotopic mass corresponding to the value calculated for the peptide (1712)NIVTEAAYGLPVDPK(1726) plus a single HexNAc residue (Fig 5 and S2 Excel File). The b 4 Ã , b 6 Ã , and b 7 Ã ions show that Thr-1715 is modified. The mass spectra of 12 unique peptides from Gp900, each with a single HexNAc modification, together with the spectra from Gp15, suggest that the C. parvum O-GalNAcTs are capable of modifying isolated Ser and Thr residues, in addition to stretches of consecutive Ser residues in Gp40 and Thr in Gp20 and Gp900.
At the N-terminus of Cp23 myristoyl modifies Gly1, while palmitoyl modifies Cys2 The immunodominant antigen Cp23 (cgd4_3620) contains no signal peptide but has an N-terminal sequence (2)GCSSSKPETK(11) similar to those modified by fatty acyl chains in the  Post-translational modifications of Cryptosporidium vaccine candidates host and other apicomplexans (Fig 1) [52,[72][73][74][75][76][77][78]. Consistent with this resemblance, numerous hydrophobic peptides were identified by mass spectrometry containing the N-terminus of Cp23 minus Met-1, with no modification, substituted by either myristate or palmitate, or both (S1 Table.) [56]. For example, the precursor ion [M + 2H] 2+ m/z 736.4573 has a monoisotopic mass equal to that calculated for the peptide GCSSSKPETK with the addition myristate and palmitate (Fig 6 and S2 z 1261.7001). The presence of a complete y -ion series and a partial b -ion series allowed us to assign myristate to the N-terminal Gly Post-translational modifications of Cryptosporidium vaccine candidates and palmitate to the Cys. We believe the example given is what is present on the native protein. The peptides where palmitate is absent and Cys is carbamidomethylated or palmitate modifies Ser residues arose during sample processing [56,57].

Discussion
Mass spectrometry here directly demonstrated that addition of O-linked HexNAc (presumably O-GalNAc) is a widespread modification of C. parvum vaccine candidates (Gp15, Gp40, and Gp900) [11,12]. Previous evidence for the addition of O-GalNAc to these proteins has been obtained through the use of synthetic glycopeptides, lectins, patient sera, or a monoclonal anti-carbohydrate antibody to C. parvum [25,33]. O-GalNAc modifications saturate Ser-rich sequences of Gp40, and they nearly saturate Thr-rich sequences of Gp20, a previously uncharacterized protein. Nearly all of peptides with O-glycans derive from just four proteins (Gp40, Gp15, Gp900, and Gp20), even though >800 proteins were identified by mass spectrometry. Limitations of our observations include 1) failure to observe most of the Thr-rich domains of Gp900 and two of the Thr-rich domains of Gp20, 2) inability to assign O-glycans sites on many of the peptides due to loss of the O-linked glycan residues during HCD fragmentation, and 3) limited sampling of glycoproteins with O-GalNAc. In particular, the GalNAc-binding Maclura pomifera agglutinin previously enriched six mucin-like glycoproteins in addition to Gp40, Gp15, and Gp900 from lysed oocysts [15]. Other glycoproteins with O-GalNAc beyond those described here and the six mucin-like glycoproteins are certainly likely.  These results show that the four O-GalNAcTs of C. parvum, each of which has a lectin domain in addition to its glycosyltransferase domain, efficiently continue to glycosylate regions of glycoproteins that are already glycosylated [32, 79,80]. Indeed the four O-GalNAcTs of C. parvum are able to make the same kind of modifications to arrays of Ser and Thr and to isolated Ser and Thr as the 20 O-GalNAcTs of the host. In studies beyond the scope of those performed here, the activity of each O-GalNAcT might be determined by 1) recombinant expression of enzymes with peptide substrates or 2) knockouts of the genes encoding these enzymes, a technology that is now available in C. parvum grown in mice [81]. O-glycans of C. parvum differ from those of the host in that O-GalNAc is not extended by other sugars [80]. C. parvum then is the equivalent of the "SimpleCell" lines engineered to express truncated O-Gal-NAc (knockout of cosmc gene), which have been used to map occupied O-glycan sites [82][83][84]. Similarly, the unextended C. parvum O-glycans are recognized by anti-Tn antibodies, which bind to unextended O-GalNAc [33,85].
Properties that distinguish C. parvum Gp15 and Gp40 include glycosylation (discrete O-GalNAc residues versus densely clustered O-GalNAc residues), localization on sporozoites (apically associated versus diffusely covering surface), localization on oocyst walls (outer surface versus inner surface), and structure (GPI-anchored versus secreted) [15-23, 25]. We infer that the densely clustered O-GalNAc residues make the Ser-rich regions of Gp40 and Thr-rich regions of Gp20 and Gp900 rigid and extended rather than unstructured [86][87][88]. These extended regions of O-glycosylation may contribute to the tethering function of Gp40 and Gp900, which attach sporozoites to the inner layer of the oocyst wall [15]. O-glycosylation on these glycoproteins that coat the sporozoite surface may also affect host cell invasion and/or the innate and acquired immune responses to infecting parasites [25, 26, 28-30]. For example, the C. parvum galactose/GalNAc lectin binds to O-GalNAc on Gp40 and Gp900 [24]. In contrast, addition of myristate to N-terminal Gly and palmitate to Cys likely directs Cp23 from the cytosol to membranes of C. parvum and thus is likely important for its function, as has been extensively studied in Toxoplasma, Plasmodium, and the host [50,[72][73][74][75][76][77][78]. Chemical biology experiments or mutation of sites for addition of myristate and palmitate on Cp23 would be useful to test the roles of fatty acyl modifications in C. parvum. It is not clear how vaccination with recombinant Cp23, which is presumably a cytosolic protein, even if membrane bound, produces a protective immune response.
Serological screens for C. parvum use recombinant proteins made in bacterial systems that fail to add O-GalNAc (Gp40 and Gp15) or fatty acyl chains (Cp23) [26, 28-30, 36-38]. Because the host antibody response includes antibodies to glycopeptides with O-glycans including some with O-GalNAc on sites described here [33], these serological screens are likely lacking sensitivity and might be improved by expressing C. parvum proteins in SimpleCells that add unextended O-GalNAc to glycoproteins [82][83][84]. Similarly, vaccination with recombinant proteins produced in bacteria produces an immune response to the unmodified peptides, whereas acquired immunity to C. parvum infections includes responses to the O-glycans on Gp40 and Gp15 [33] and possibly lipid-modifications of Cp23. Again, the cosmc knockout might be used to produce recombinant Gp40 or Gp15 coated with unextended O-GalNAc for vaccination. Production of Cp23 in mammalian cells that add fatty acyl chains might increase the sensitivity of serological screens for this antigen and generate a better vaccine.

S1 Excel
File. An Excel spreadsheet, summarizing the 358 spectra identified containing HexNAc(s). The list of unique peptides was used to make Table 1 in the main text, as shown as the representative peptides (sheet 1). The Gp40 peptides differ in the number of HexNAc