Skip to main content
Advertisement

< Back to Article

Fig 1.

a) Sequence-based approaches aim to identify linear amino acid motifs that are phosphorylated by certain kinases. This is done based on known motif preferences of kinases, their groups or families. Each site and substrate is examined in isolation. Only limited numbers of well-studied kinases can typically be associated with substrates this way, and network context is largely ignored in such predictions. b) The LinkPhinder approach aims at learning regular patterns in a knowledge graph that represents the known kinase-substrate links as motif-based abstractions of the associated consensus sites. Based on the global, latent properties of the knowledge graph, the system can predict unknown, site-specific interactions between any kinase and substrate present in the input data.

More »

Fig 1 Expand

Fig 2.

The model is first trained on phosphorylation network data that has been converted to a knowledge graph representation.

Such a representation can be readily processed by link prediction algorithms (contrary to the original phosphorylation data). In the training stage, an optimal combination of model parameters is found and computationally validated. The optimal model is then trained on full phosphorylation network data and used for providing probabilistic ranking scores for all possible predictions that can be made using the input. Finally, reverse conversion technique is applied to the computed predictions to present them to users as residue-specific kinase-substrate relationships.

More »

Fig 2 Expand

Fig 3.

The average precision-recall and ROC curves as per the experimental results reported in Table 1 (left and right part of the figure, respectively).

More »

Fig 3 Expand

Table 1.

Comparative validation results.

AU-PR, AU-ROC refer to the area under the precision-recall and ROC curve, respectively. These metrics are widely used for validating predictive models based on ranking across their whole operating range [18]. P@K refers to the precision at K metric that gives the ratio of true positive statements ranked among top K results (e.g., P@10 refers to precision at 10; precision at 10 equal to 0.9 would mean that the corresponding tool typically returns 9 true positives among the top 10 results).

More »

Table 1 Expand

Table 2.

Coverage of the tools in per cents.

Total, positive and negative coverage is given in the first three columns with data, respectively. The last column gives the percentage of missed negatives (i.e., negatives that are assigned the default zero score).

More »

Table 2 Expand

Fig 4.

Coverage of the human kinome and kinase families as per PhosphoSitePlus.

The “not_processed” category reflects the number of kinases for which a tool cannot produce any predictions. Note that NetPhorest and NetworKin only differ in scores assigned to predictions, while the set of phosphorylations they can produce scores for is identical. Therefore, they are grouped under a common KinomeExplorer [10] in the plot.

More »

Fig 4 Expand

Fig 5.

Complementary statistics of the coverage of different systems in terms of number of kinases, substrates, sites per substrate, etc.

More »

Fig 5 Expand

Table 3.

Complementary computational validation of LinkPhinder using the recent dataset published in [19] as a benchmark independent of the primary training dataset (i.e. PhosphoSitePlus [14]).

More »

Table 3 Expand

Fig 6.

Experimental validation of model predictions.

A) HEK293 cells were transfected with non targeted siRNA (Scr) of the indicated siRNA against LATS1. Phosphorylation of CREB or p53 was measured using specific antibodies and normalised to the level of expression of the corresponding proteins. The graph shows the fold change of the phosphorylation of the specific residues with respect to the Scr control. B) HEK293 were transfected with empty vector (EV) or GAG-AKT or treated with AKTi IV (10μM) for 1 hour. Phosphorylated proteins were immunoprecipitated using an anti-AKT antibody and the immunoprecipitates were blotted with anti-MST2. The bars show the fold change with respect to the control. The experiments were repeated at least 2 times. Error bars represent standard variations.

More »

Fig 6 Expand

Fig 7.

Mass-spectrometry validation of a subset of LinkPhinder predicted phosphorylations.

A) Overview of the experimental design. B) Mass-spectrometry result: Specific LATS1 interactors and their phopshorylations. Bold rows indicate phosphorylation that were predicted by LinkPhinder. (*There is a risk that ZMYM2 binding might be unspecific. Some samples show high intensities in the GFP1 control, see panel D.) C) LinkPhinder predictions for the results in panel B. D) Mass-spec raw intensity values (dots) of the detected phosphorylation sites in GFP-LATS1 associated proteins under the indicated conditions (n = 6 replicates), and corresponding box plots indicating median (red line), upper and lower quartile (grey box), whiskers (most extreme values not defined as outliers), and outliers (plus marks) defined as values outside 1.5 times the interquartile range.

More »

Fig 7 Expand

Table 4.

Sensitivity (S) of LinkPhinder substrate predictions per each of the kinase assay.

More »

Table 4 Expand

Fig 8.

The LinkPhinder web interface.

Shown is a typical search and browse interaction.

More »

Fig 8 Expand

Table 5.

Phosphorylation network components statistics.

More »

Table 5 Expand

Fig 9.

High-level workflow of generating predicate labels for the phosphorylation knowledge graph based on motifs extracted from the context sequences of phosphorylation sites by means of the MEME tool.

More »

Fig 9 Expand

Table 6.

Knowledge graph components statistics.

More »

Table 6 Expand

Table 7.

Statistics of the coverage of the different predictive systems and their overlap with the [19] gold standard.

The letters S and K in the column headers denote substrates and kinases respectively.

More »

Table 7 Expand

Table 8.

Hyperparameters space used by grid search to identify the best model (L1, L2 stand for Manhattan and Euclidean distance norms, respectively).

More »

Table 8 Expand

Table 9.

LinkPhinder performance compared to other systems on our benchmark with 1:10 positive to negative ratio in the testing split where the training/testing splits are 90% and 10% respecitvely.

More »

Table 9 Expand

Table 10.

Relative LinkPhinder performance across different training-testing splits where the positive to negative ratio of the testing set is 1:10 (the relative performance results were substantially less variable for the 1:1 ratio, therefore we do not report them here).

More »

Table 10 Expand