Fig 1.
A schematic visualisation of the TPP-TR assay and the GPMelt framework.
(A) Temperature-Range Thermal Proteome Profiling (TPP-TR) protocol: 10 aliquots of cells or cell lysates are heated to a range of temperatures for three minutes. Subsequently the formed aggregates are removed by filtration or centrifugation. The soluble fractions is digested with trypsin and labeled with tandem mass tags and then analyzed by mass spectrometry (MS). Acquired data, which are on the peptide-level, can be combined to obtain the protein-level melting curves, by averaging over the measurements of all tryptic peptides corresponding to a single protein entry in the protein database. Permission to use and modify the tubes and MS spectrum icons was kindly granted by Isabelle Becher. (B) The GPMelt framework consists in fitting simultaneously all replicates of all conditions independently for each protein, using a hierarchical Gaussian process (HGP) model (1). This model, denoted by , is called the full model. The full model is fitted via type II maximum likelihood estimation (type II MLE), and estimated parameters
are plugged into the so-called joint models (3). A joint model
corresponds to a model in which at least two conditions are jointly modeled as one (these two conditions are represented in grey in
and
). The joint models are used to generate an approximation of the null distribution of the GPMelt statistic Λ (4). The observed statistics ΛCtrl vs Cd1 and ΛCtrl vs Cd2 (2) are compared to this null distribution approximation to compute empirical p-values (5). Additionally, an effect size is computed using the predicted fits obtained from
(6).
Table 1.
Notation.
Fig 2.
Implementation of the hierarchical GP model using the multi-task GP regression framework.
The three-level hierarchical model (Eq (8)) is illustrated on a hypothetical protein presenting different numbers of replicates in the control and treatment conditions, under the simplifying assumption of synchronous observations for all replicates of all conditions. (A) A visualization of the model. Melting curves are fitted to observations of each replicate (bottom level). The condition-wise melting curves (second level) captures the underlying melting behaviours common to replicates of a condition. These condition-wise melting curves can be seen as deviations from the protein-wise melting curve depicted on the top of the hierarchy. (B) Schematic visualisation of the resulting covariance matrix Σ, expressed as a special matrix product between Ky, the sum of the index kernels, and the correlation matrix Kt,λ(T, T), evaluated at the set of temperatures T = (T1, …, T10). This decomposition of the matrix links the hierarchical GP model to the multi-task GP regression framework. Under the simplifying assumption of synchronous observations, the matrix product is a kronecker product. This product is easier to visualize than the Hadamard product obtained in case of asynchronous observations (see Appendix B in S1 File for details).
Fig 3.
Principle of GPMelt in presence of multiple conditions.
(SI: Scaled Intensity) Fitting the full model (A, B) is enough to access all the information required to test all possible null hypotheses. The illustration is based on protein SFRS9 of the Dasatinib dataset [1]. The aim of this experiment is to determine changes in melting behaviours upon dasatinib treatment, a BCR-ABL inhibitor. In the experimental set up, the control condition (no treatment) is compared to two treatment concentrations, 0.5μM and 5μM. For clarity in the figure, treatment concentration of 0.5μM is referred to as condition “C1” and treatment concentration of 5μM is referred to as “C2”. Control condition is abbreviated by “Ctrl”. (A,B and D): Full model . (A) Hierarchical model corresponding to Eq (8), in which each condition (middle row) is assumed to present a distinct melting behaviour, this behaviour being a deviation from the main protein-wise melting behaviour (top row, blue curve). The fitting of the observations (last row) under this model provides estimated values for output-scales σh, σg and
. Similarly as in Fig 2, the estimated output-scales can be represented using the index kernels of the multi-task regression framework, as depicted in panel (B). Under this model, the likelihood of the observations is given by
(D) and further detailed in Appendix C in S1 File. (A.C1 to E.C1): Comparing treatment C1 with control. We aim to compare the melting behaviour of this protein in the control condition (green curve) vs the treatment condition C1 (orange curve). A visualisation of this comparison is provided in panel (C.C1). Under the proposed testing framework, the joint model
assumes that treatment C1 and control conditions have the same melting dynamic, and group them into a “joint” condition (grey curves in (A.C1, C.C1)). (B.C1) Mathematically, this joint model is obtained by changing the structure of the index kernel corresponding to the condition level. More precisely, the “joint” condition, grouping “C1” and “Ctrl”, is represented by the upper block in the matrix. Importantly, the values of the output-scales σh, σg and
remain unchanged: there is no need to re-estimate the parameters of this model. Moreover, the modelling of condition “C2” is not affected by this joint model, as can be seen in (A.C1) and (B.C1). (D.C1) The likelihood of the observations under this model is given by
. (E.C1) The statistic Λ used to statistically assess the significance of melting behaviour changes is given by ΛCtrl vs C1. (A.C2 to E.C2): Comparing treatment C2 with control. Similarly, we illustrate the procedure to compare the protein’s melting behaviours between treatment C2 (pink curve) and control (green curve) conditions. Under this model, conditions “C2” and “Ctrl” are grouped together in the “joint” condition, while condition “C1” is unaffected (A.C2, B.C2). Melting behaviours changes are depicted in panel (C.C2). The likelihood of the observations under this model,
(D.C2), and the associated statistic ΛCtrl vs C2 (E.C2) are given.
Table 2.
Description of the benchmarking datasets.
Fig 4.
Including non-sigmoidal melting curves with GPMelt improves the quality of the discoveries for protein-level TPP-TR datasets.
(A) Approximate receiver operator characteristic (ROC) curves comparing the results of NPARC [6], the Bayesian sigmoid and Bayesian semi-parametric models [17] and GPMelt with a three-level HGP model on the Staurosporine 2014 [1] and ATP 2019 [19] datasets. The set of proteins expected to be targeted by the treatments are defined using the Gene Ontology (GO) Consortium annotations curated in Uniprot [37]. For the Staurosporine 2014 dataset, 176 out of 4505 proteins present a kinase activity (annotations downloaded in march 2023). 573 out of 4772 proteins are annotated as ATP binding proteins (using annotations provided as supplementary data in [19]). The points on the curves correspond to the sensitivity and specificity of NPARC and GPMelt at an α-threshold of α ∈ {0.001, 0.005, 0.01, 0.05} on the BH adjusted p-values, resp. a threshold of 1 − α on the posterior probabilities of the alternative model for the Bayesian sigmoid and Bayesian semi-parametric models. Panels B to D discuss results on the ATP 2019 dataset. (B,left) Overlap of the hits obtained with an α-threshold of 0.05 on the adjusted p-values of NPARC and an α-threshold of 0.001 on the adjusted p-values of GPMelt. (B,right) Among the 55 hits uniquely selected by GPMelt, eight of them are annotated to be part of membrane-less organelles. The GO cellular compartment terms are provided as supplementary data from [19]. The enrichment analysis is performed with the R package clusterProfiler [38] (v4.8.3), with background defined by the set of proteins identified in the experiment. (C) Comparison of proteins ranking considering NPARC (x-axis) vs GPMelt (y-axis) analysis of the ATP 2019 dataset (for the top 200 proteins of each method). Points are colored according to the Residual Sum of Square of NPARC fits for the alternative model, denoted by RSS1. Crosses represent proteins for which RSS1 is above the 95th-percentile (computed across proteins). (D) Examples of proteins low-ranked by NPARC due to non-conventional melting behaviours (see panel C). The melting curves of these proteins are miss-fitted by NPARC due to the inherent sigmoidal assumption. Fig P in S1 File presents additional examples.
Fig 5.
Including non-sigmoidal melting curves in peptide-level TPP-TR datasets largely increases the number of discoveries.
Functionally relevant phosphosites are expected to induce a change in melting behaviour by influencing, among others, protein conformations and protein-protein interactions. Mono-phosphorylated peptides functionality can be predicted using the functional score, a machine-learning based score [36], ranging from 0 to 1, with larger values indicating more functionally relevant phosphosites. To detect functionally relevant phosphosites, the melting behaviour of phosphorylated peptides are compared to the melting behaviour of the non-phosphorylated peptides associated to the same entry in the protein database. GPMelt with a three-level HGP model is used to reanalyse the phospho-TPP dataset [11]. (A) Considering non-conventional melting curves in the analysis makes it possible to include almost twice (1.78) as many phospho-peptides compared to the published melting point (Tm) analysis. (B) By increasing the phospho-peptides coverage, GPMelt captures about five times more (4.9) mono-phosphorylated peptides than the published Tm analysis, and captures phosphosites associated with significantly higher functional scores than non-captured phosphosites (one-sided Wilcoxon signed-rank test). GPMelt hit selection: any phospho-peptide for which the associated Λ value is so extreme that it is strictly above any values belonging to the null distribution approximation (S = 1e4 samples per protein). 443 mono-phosphorylated peptides are selected by GPMelt, among which 388 have an associated functional score. The Tm analysis selects 90 mono-phosphorylated peptides, with 85 presenting a known functional score.
Fig 6.
Panels A to C present results from the Staurosporine 2014 dataset [1], panels D to F from the ATP 2019 dataset [19]. (A-B) Detection of outliers (A) Subset of proteins presenting a BH adjusted p-value superior or equal to 0.8 according to NPARC and GPMelt methods, but an alternative model posterior probability larger than 0.99 according to the Bayesian semi-parametric model. For each of these proteins, the plot represents on a log scale the estimated values of the output-scale parameters obtained from the three-level HGP model, with the shape corresponding to the replicate (r) and the color to the condition (c). The shaded area represents output-scale values larger than the 95th percentile, and the dotted line is the 99th percentile. (B) Replicates with associated
above the 95th percentile are likely to correspond to replicates presenting either one to multiple outlier observations. (C-D) Area Between Curves as a new metric To replace the previously used ΔTm as measure of the discrepancy between the fitted curves, we propose a new metric, denoted the Area Between the Curves (ABC). The ABC can be computed by considering the median of the observations in each condition, and computing the area between these medians. As complement to this ABCmedian metric, we propose a refined computation of the ABC using the output of the HGP model, denoted by ABCGPMelt (see Appendix D in S1 File). (C) A protein with at least one value
above q75 + 1.5 × IQR is defined as presenting at least one outlier replicate (q75 being the 75th percentile, and IQR the interquartile range). Comparing the differences in ABC estimated using either ABCmedian or ABCGPMelt, shows that ABCmedian likely overestimate the ABC for proteins presenting at least one outlier replicate (Wilcoxon signed-rank test). (D) ABCGPMelt as a valid metric to replace ΔTm: considering the solubility effects reported by Sridharan et al [19], a positive, resp. negative, ABCGPMelt is correctly computed for desolubilized, resp. solubilized, proteins (Wilcoxon signed-rank test). (E-F) Introduction of a new scaling factor. The broadly used Fold Change, in which intensities in a replicate are scaled to the intensity at the lowest temperature, is compared to a newly proposed scaling, named the mean scaling. This scaling consists in scaling intensities in a replicate to the mean intensity of this replicate. (E) Scaling comparison. Considering the differences in scaled abundances at each temperature between replicates of a condition, the x-axis represents the median difference and the y-axis the variance of the differences. Results are divided by condition (control and treatment) and by scaling (fold change vs mean scaling). The panels are split in four, with the left bottom corner corresponding to reproducible observations between replicates of a condition. The three other panels reveals a lack of reproducibility between replicates. (F) Examples of proteins falling outside of the left bottom corners in panel (E). For these proteins, the results of the mean scaling and the fold change are compared.