GPMelt: A hierarchical Gaussian process framework to explore the dark meltome of thermal proteome profiling experiments
Fig 4
Including non-sigmoidal melting curves with GPMelt improves the quality of the discoveries for protein-level TPP-TR datasets.
(A) Approximate receiver operator characteristic (ROC) curves comparing the results of NPARC [6], the Bayesian sigmoid and Bayesian semi-parametric models [17] and GPMelt with a three-level HGP model on the Staurosporine 2014 [1] and ATP 2019 [19] datasets. The set of proteins expected to be targeted by the treatments are defined using the Gene Ontology (GO) Consortium annotations curated in Uniprot [37]. For the Staurosporine 2014 dataset, 176 out of 4505 proteins present a kinase activity (annotations downloaded in march 2023). 573 out of 4772 proteins are annotated as ATP binding proteins (using annotations provided as supplementary data in [19]). The points on the curves correspond to the sensitivity and specificity of NPARC and GPMelt at an α-threshold of α ∈ {0.001, 0.005, 0.01, 0.05} on the BH adjusted p-values, resp. a threshold of 1 − α on the posterior probabilities of the alternative model for the Bayesian sigmoid and Bayesian semi-parametric models. Panels B to D discuss results on the ATP 2019 dataset. (B,left) Overlap of the hits obtained with an α-threshold of 0.05 on the adjusted p-values of NPARC and an α-threshold of 0.001 on the adjusted p-values of GPMelt. (B,right) Among the 55 hits uniquely selected by GPMelt, eight of them are annotated to be part of membrane-less organelles. The GO cellular compartment terms are provided as supplementary data from [19]. The enrichment analysis is performed with the R package clusterProfiler [38] (v4.8.3), with background defined by the set of proteins identified in the experiment. (C) Comparison of proteins ranking considering NPARC (x-axis) vs GPMelt (y-axis) analysis of the ATP 2019 dataset (for the top 200 proteins of each method). Points are colored according to the Residual Sum of Square of NPARC fits for the alternative model, denoted by RSS1. Crosses represent proteins for which RSS1 is above the 95th-percentile (computed across proteins). (D) Examples of proteins low-ranked by NPARC due to non-conventional melting behaviours (see panel C). The melting curves of these proteins are miss-fitted by NPARC due to the inherent sigmoidal assumption. Fig P in S1 File presents additional examples.