Fig 1.
The level of phosphorylation for each phosphorylation sites in the proteome are quantified in time-course by mass spectrometry. First, time-course profiles of phosphorylation sites are partitioned into clusters using a k-means clustering-based algorithm for a range of values for k. Next, the clustering result, for each k, is evaluated based on the correct clustering of known substrates of kinases, as annotated in the PhosphoSitePlus database [53], and an enrichment score is computed. The clustering with the highest enrichment score is reported as the optimal clustering along with kinases whose substrates are enriched within each cluster.
Fig 2.
Temporal profile templates used in simulation studies.
Fourteen temporal profiles templates, each with seven time points and a unique time-course pattern, were defined for generating simulation datasets. For each time point, a random variable with a defined Gaussian distribution is used to generating the temporal profile for the simulation datasets.
Fig 3.
Comparison of CLUE with alternative approaches.
Raw scores, representing the quality of clustering result for each k, for each method were normalized to be between 0 and 1 (y-axis). The higher the score, the more informative the resulting clustering is. The methods were evaluated based on how accurately they can recover the true number of clusters within a simulated dataset. The yellow line represents the true number of clusters in the simulated dataset, and the red dot denotes the predicted number of clusters in each case.
Fig 4.
The effects of completeness/accuracy of known kinase-substrate annotations on CLUE's performance.
CLUE's performance as a function of number of kinases annotated to have substrates in g out of the k clusters. The panels (from left to right) show six scenarios with true number of true simulated clusters highlighted in yellow. The scenario g = 0 resembles the situation when no existing knowledge is available for use by CLUE. CLUE's ability to accurately predict the true number of clusters improves dramatically as g increases. CLUE's performance as a function of percentage of incorrect kinase-substrate annotations (noise). We set g = 5 for testing different levels of noise (denoted as s). The panels (from left to right) show six scenarios with true number of true simulated clusters highlighted in yellow.
Fig 5.
Optimal clustering and analysis of hES cell phosphoproteomics data.
CLUE's estimation of number of clusters. The number of clusters evaluated ranges from 2 to 20 and the optimal number of clusters, as estimated by CLUE, is highlighted in red. Visual representation of temporal profiles of phosphorylation sites within each cluster. Membership scores of all phosphorylation sites within a cluster is used to create color gradient from green to red correspond to lower to higher clustering confidence. Size: number of phosphorylation sites that have membership in that cluster. Bar plot showing kinases whose substrates are enriched within each cluster (p-value < 0.05; Fisher’s exact test). Principal component analysis of the temporal profile of phosphorylation sites within clusters 3, 6, and 7. Known substrates of p70S6K and ERK kinases are highlighted as x and *, respectively. Motif enrichment analysis. Phosphorylation sites from each cluster are scored against the PSSMs of p70S6K and ERK1, respectively. The cluster with the highest motif enrichment scores (median) are highlighted in yellow.
Fig 6.
Optimal clustering and analysis of adipocytes phosphoproteomics data.
CLUE's estimation of number of clusters. The number of clusters evaluated ranges from 2 to 36 and the optimal number of clusters, as estimated by CLUE, is highlighted in red. Visual representation of temporal profiles of phosphorylation sites within each cluster. Membership scores of all phosphorylation sites within a cluster is used to create color gradient from green to red correspond to lower to higher clustering confidence. Size: number of phosphorylation sites that have membership in that cluster. Bar plot showing kinases whose substrates are enriched within each cluster (p-value < 0.05; Fisher’s exact test). Principal component analysis of the temporal profile of phosphorylation sites within clusters 2, 7, 9 and 17. Known substrates of Akt1 and mTOR kinases are highlighted in x and *, respectively. Motif enrichment analysis. Phosphorylation sites from each cluster are scored against the PSSMs of Akt1 and mTOR, respectively. The cluster with the highest motif enrichment scores (median) are highlighted in yellow.
Table 1.
Comparison of CLUE with alternative approaches on the two phosphoproteomics datasets.