Degradation graphs reveal hidden proteolytic activity in peptidomes

doi:10.1371/journal.pcbi.1013972

Fig 1.

Overview of proteolysis and peptide-based degradomic analysis.

a Proteins are hydrolyzed into peptides by proteases with distinct cleavage modes. Endoproteases cleave internal peptide bonds, whereas exoproteases trim residues sequentially from peptide termini. Both contribute to continuous peptide turnover. b During host–pathogen interactions, proteolysis shapes immune responses on both sides. Host proteases can release antimicrobial and immunomodulatory peptides, while bacterial proteases degrade host defense and immune proteins, generating immune-inactive fragments. The balance of these processes determines the outcome of infection. c Peptidomics and degradomics use mass spectrometry to characterize proteolys......is from complementary angles. Peptidomics quantifies endogenous peptides directly, whereas degradomics, using methods such as TAILS, blocks native termini to enrich for neo-termini generated by protease activity. Peptides are analyzed by LC–MS/MS in DDA or DIA mode, searched against protein databases, and visualized as peptide ladders. d Growth of peptidomics and degradomics research over time, shown as PubMed term counts by year, illustrating the expansion of both fields, especially peptidomics. This figure was made with BioRender.

More »

Expand

Fig 2.

Definition and probabilistic formulation of the degradation graph.

a A biological sample is analyzed with the mass spectrometer followed by peptide identification and quantification, resulting in a peptide distribution. b A degradation graph represents proteolysis as a directed acyclic graph in which each node corresponds to a peptide (including the intact protein, Ω), and each directed edge denotes a proteolytic event where peptide v is cleaved into a shorter peptide u. The graph thereby encodes sequential cleavage relationships that describe how a protein is progressively degraded into smaller fragments. b Each edge is associated with a transition probability describing the likelihood that v degrades into u. The probability that v remains intact, its absorption probability, is modeled as − , where is the set of child nodes. This defines a Markov chain in which transitions correspond to cleavage events and self-loops to peptide stability. c The overall peptide distribution, , can be obtained by propagating probability mass from the protein node Ω through the graph in a forward pass. At each node, a fraction of the mass is absorbed, and the remainder distributed according to the outgoing transition probabilities, yielding a marginal distribution that reflects the steady-state abundances of peptides generated by the degradation process.

More »

Expand

Fig 3.

Inferring degradation graphs from observed peptidomes.

a The degradation graph is a latent structure that describes the proteolytic relationships giving rise to the observed peptidome. Observed peptide abundances are used to infer the most plausible graph structure and transition probabilities. b Inference is formulated as an optimization problem in which the modeled marginal peptide distribution is fitted to the measured distribution. Edge transition probabilities are iteratively updated by gradient descent to minimize the loss , or equivalently inferred as a linear-flow system solved by linear programming under mass-conservation constraints. c Example of graph optimization by gradient descent. The mean-squared-error loss (blue) decreases over iterations as the inferred graph converges toward the true degradation graph, measured by the total deviation of edge weights (orange). d The coefficient of variation of edge weights when applying gradient descent to graphs of increasing sizes. The shaded region shows ± 1 standard deviation. e Experimental validation using in vitro trypsin digestion of human pharyngeal epithelial cells (Detroit 562; dataset PXD037803 [14]). Peptides were identified and quantified by LC–MS/MS and analyzed to reconstruct the underlying degradation graph. f Peptides mapped onto the β-actin backbone and colored by their number of descendants in the inferred graph, illustrating hierarchical fragmentation patterns. g Relationship between peptide abundance and total inflow (inflow) for each peptide. The solid line indicates where the modeled inflow and measured abundance are equivalent, demonstrating quantitative agreement between the inferred degradation flow and experimental intensities.

More »

Expand

Table 1.

Runtime of gradient descent and linear programming solvers across increasing graph sizes.

More »

Expand

Fig 4.

Why degradation graphs improve protease quantification and enable predictive modeling.

a Conventional peptidomic analyses assume that each peptide arises directly from the parent protein, neglecting that peptides can themselves undergo further degradation. This omission leads to systematic underestimation of upstream proteolytic activity. In degradation graphs, downstream cleavages (e.g., ) are explicitly modeled, revealing the true effective transition weight () and correcting activity quantification. b By representing peptide abundances as probabilistic flow through the graph, degradation graphs enable flow quantification, where the total upstream activity can be mapped to specific proteases using databases such as MEROPS or TopFIND. c Branch quantification summarizes local subgraphs, sets of related peptides sharing a degradation ancestry, into stable and interpretable biological units. Bottlenecks and branch points identify sites of proteolytic control or preferential cleavage. d The explicit graph topology also permits machine learning applications. Degradation graphs generated in silico for four proteins (beta actin, hemoglobin subunit beta, thrombin and apolipoprotein a1) digested by trypsin or elastase were encoded as graph structures and classified using a GraphConv neural network trained on node (position, abundance, length) and edge (transition weight) features. e Left: receiver operating characteristic (ROC) curves showing high classification performance across proteins (overall ROC–AUC = 0.915). Right: Kernel density estimates of predicted probabilities for trypsin- and elastase-derived graphs in the validation set, illustrating accurate separation of protease-specific degradation patterns. f Peptidome plots of β-actin in cell lysates digested with different enzymes. g ROC of the GraphConv network trained to identify the enzyme.

More »

Expand

Fig 5.

Degradation graph analysis of urinary peptidomes from diabetic and healthy individuals.

a Peptidomic data from urine samples of individuals with diabetes (n = 15) and healthy controls (n = 15) [51] were analyzed without enzymatic digestion using LC–MS/MS. b For each sample, degradation graphs were reconstructed by gradient descent to fit the modeled peptide distribution to observed abundances. c Comparison of total generated versus observed peptide abundance revealed that conventional quantification underestimated proteolytic activity by approximately 3.7-fold in both groups. d Mapping total inflow along the uromodulin sequence showed a localized increase in degradation flow within the biomarker-associated region, highlighting differential proteolysis between healthy and diabetic samples. This figure was made with BioRender.

More »

Expand

Fig 6.

Degradation graph analysis of porcine wound fluid peptidomes from bacterial infections.

a Peptidomic data from porcine wound fluids infected with Staphylococcus aureus (n = 38) or Pseudomonas aeruginosa (n = 33) [11] were analyzed without enzymatic digestion using LC–MS/MS. b For each sample, edge transition probabilities were optimized by gradient descent to reproduce measured peptide distributions. c The ratio between generated and observed peptide abundance showed that neglecting sequential degradation led to an average 3.5-fold underestimation of total proteolytic activity. d Visualization of total inflow along the hemoglobin subunit alpha backbone revealed differential degradation flow between infection types, with pronounced variation in the N-terminal biomarker region and an additional differential site around residues 60–80. This figure was made with BioRender.

More »

Expand

Fig 7.

Robustness of graph identification.

Parameter sweeps across learning rates and training epochs were used to assess the stability of degradation graph inference for the UMOD dataset (a) and the HBA infection dataset (b). Heatmaps show the underestimation ratio (Δ), final mean squared error, and the stability of the top peptides and edges when ranked of inflow and flow respectively, measured as the percentage of overlap across replicate runs.

More »

Expand

Table 2.

Variable definitions.

More »

Expand