Zero-shot prediction of drug responses using biologically informed neural networks trained on phosphoproteomic timeseries

doi:10.1371/journal.pcbi.1014100

Fig 1.

Extension of the LEMBAS framework to time-resolved phosphoproteomics.

A. Illustration of a minimal signaling network including ligand (L), receptor (R), kinases (K1–K4), non-kinase signaling proteins (P1, P2) and kinase inhibitor (D). B. The model architecture mirrors this structure, ligand and drug inputs drive an RNN whose signal propagation is constrained by the prior-knowledge network (encoded as an adjacency matrix with trainable weights); the hidden state is feed into a layer that maps signal states to phosphosites, which in turn is mapped to specific timepoints. C. The phosphosite mapping layer connects each signaling node only to its own phosphosites that are identity-coded with a trainable a low-dimensional embedding (five dimensions), and a multilayer perceptron is used to (non-linearly) transform the signal back to a single phosphosite output. D. The time-point mapping layer: a set of learnable anchors is constrained to be positive, monotonic and with the first fixed at the first RNN step. Soft indexing of the two nearest integer RNN steps interpolates the value at each non-integer anchor. E. Generated phosphosite intensities across time points for EGF-stimulation data, only displaying variable sites (with standard deviation > 0.001). F. Ablation results across model configurations. Circles are the five cross-validation folds; X’s correspond to zero-shot evaluation. Average training times are also displayed.

More »

Expand

Fig 2.

Model performance on interpolation tasks for time-resolved phosphoproteomics data.

A. EGF-stimulation phosphoproteomics data and model fits across time points, values clipped between -3 and 3. B. Distribution of absolute distances between consecutive timepoints for the experimental data (median = 0.27, mean = 0.38) and model fits (median = 0.05, mean = 0.17). Dashed lines indicate the medians. The difference between both distributions is statistically significant (permutation test, 10,000 resamples, p = 0.0001). C. Comparison of interpolation performance on held-out time-points between linear interpolation, anchor estimation using data from a single phosphosite (GAB1:S266s), and one-to-one mapping. Black boxes on the X-axis indicates the time points held out during training. D. Selected examples of time series data compared to model fits and predictions at the 4 min time point. Lines indicate means; ribbons standard deviations.

More »

Expand

Fig 3.

Zero-shot prediction of drug-specific phosphoproteomics responses.

A. Model predictions versus experimental differential phosphorylation in a zero-shot setting after EGF control subtraction. Top row: biologically informed model using the OmniPath prior-knowledge network; middle row: model without prior-knowledge network (fully connected signaling layer); bottom row: naïve baseline estimated as the mean response of the remaining drugs. B. ROC curves for identifying up- and downregulated phosphosites using the biologically informed model (dark lines) and the model without prior knowledge (light lines). AUC values are reported for each drug.

More »

Expand

Fig 4.

Network analysis of non-canonical signaling.

A. Volcano plot of differentially expressed phosphosites between EGF-stimulation and SHP099 inhibition. Relevant phosphosites (FOXO3:S7, RASH:S35, MAPK14:T180, and PIK3CA:T508) are highlighted. B. Phosphorylation levels of FOXO3:S7 across two replicates. Full time series shown for EGF-stimulation; static data at 12 min for DMSO and SHP099 inhibition. Line indicates means; ribbon shows standard deviations; dashed line indicates DMSO-baseline; x shows the model prediction for the perturbed sample. C. Network visualization of the canonical RAS/MAPK pathway containing PTPN11 (SHP099 target), the PI3K/AKT pathway with FOXO3, and the potential crosstalk via p38 MAPK.

More »

Expand

Fig 5.

Kinase–substrate inference.

A. Heatmap showing the overlap between predicted kinase–substrate pairs and validated pairs from OmniPath. Red indicates validated overlaps, while grey marks pairs predicted by the model but not present in OmniPath. Predictions are defined as phosphosites differentially expressed in in silico perturbations that are directly connected to the drug target in the OmniPath network. B. Overlap between differentially expressed phosphosites in the experimental data and validated kinase–substrate pairs for two drugs with available measurements.

More »

Expand