The authors have declared that no competing interests exist.
Wrote the paper: YZ PCA AK. Conceived and initiated the project: PCA YZ AK HKK. Designed and developed the post-identification pipeline, and performed the analysis: YZ. Performed the mass spectrometry experiments: HKK. Contributed biological samples and performed experimental validation: CS. Discussed interpretation of biological results: YZ HKK CS AK PCA.
Reversible phosphorylation is one of the major mechanisms of signal transduction, and signaling networks are critical regulators of cell growth and development. However, few of these networks have been delineated completely. Towards this end, quantitative phosphoproteomics is emerging as a useful tool enabling large-scale determination of relative phosphorylation levels. However, phosphoproteomics differs from classical proteomics by a more extensive sampling limitation due to the limited number of detectable sites per protein. Here, we propose a comprehensive quantitative analysis pipeline customized for phosphoproteome data from interventional experiments for identifying key proteins in specific pathways, discovering the protein-protein interactions and inferring the signaling network. We also made an effort to partially compensate for the missing value problem, a chronic issue for proteomics studies. The dataset used for this study was generated using SILAC (Stable Isotope Labeling with Amino acids in Cell culture) technique with interventional experiments (kinase-dead mutations). The major components of the pipeline include phosphopeptide meta-analysis, correlation network analysis and causal relationship discovery. We have successfully applied our pipeline to interventional experiments identifying phosphorylation events underlying the transition to a filamentous growth form in
Signal transduction is a ubiquitous and essential mechanism regulating cellular functions, including responses to environmental stress. Dysfunction of signaling pathways results in a variety of diseases, including cancer, diabetes, and cardiovascular disease. Phosphorylation regulates the activity of signaling and target proteins at different cellular locations and controls activation and inactivation of signal pathways. Here, we provide an analysis of phosphoproteome datasets from yeast, utilizing kinase mutants versus wild type strains. In order to provide an objective approach to identify candidate proteins involved in the transition to a filamentous growth form, we proposed and applied a comprehensive pipeline incorporating statistical and mathematical methods to investigate the phosphoproteome data from multiple perspectives. This included phosphorylation variation in response to a single mutant, phosphorylation variation patterns over multiple mutants, and the relationships represented by these patterns. We make an effort to discover the components and targets of the signaling network, infer the network structure, and to find the relationships of changes of protein phosphorylation to cellular functions, specifically in response to stress in the context of filamentous growth.
Cells exchange and receive information from the environment through signaling pathways, which are crucial for cells to maintain normal functions and properly respond to stress and stimuli. Dysregulation of these processes is a major factor in the emergence of many diseases, including cancer, diabetes, and cardiovascular disease. Reversible phosphorylation is one of the major forms of signal transduction and can affect protein function and gene expression
Large-scale phosphoproteomics studies on a number of organisms have been carried out using mass spectrometry (MS)-based approaches (reviewed in
The general scope of this manuscript encompasses a comprehensive pipeline, incorporating statistical and mathematical methods for investigating and evaluating quantitative phosphoproteomic data, the elucidation of candidate proteins, and the identification of processes to be pursued in subsequent molecular biology and genetic studies. The phosphoproteome data utilized in this analysis was obtained from interventional experiments of a subset of yeast kinases involved in filamentous growth. Filamentous growth is a developmental transition observed in
The molecular basis of filamentous growth in
The ellipses are proteins; the rectangles are genes; and the triangles are metabolites. The linkage between shapes: sharp-end arrows indicate stimulation, T-end arrows indicate inhibition, and wavy lines indicate association. The information were extracted from
We have generated phosphoproteomic datasets indicating kinase-dependent phosphorylation events underlying the filamentous growth transition. Specifically, we generated kinase-dead mutations (also called kinase-inactivating mutations) for a set of eight kinases that we have identified as components of the yeast filamentous growth response: Ksp1p, Kss1p, Sks1p, Ste20p, Snf1p, Tpk2p, Elm1p and Fus3p
The broad focus of the filamentous growth kinase networks in particular has made it difficult to tease out important kinase targets (direct or indirect). Bioinformatics methods provide a promising avenue with which local kinase signaling relationships can be identified. While traditional cluster analyses associated with functional enrichment analysis are useful tools, their performance might be affected by the missing value issue. We need to deal with it in order to obtain reliable clusters and enriched functions. Furthermore, a more integrative and extensive analysis is necessary to find new components of the pathways, uncover relationships between the pathway components, and to elaborate the signaling network structure. Thus we propose this comprehensive quantitative analysis pipeline customized for SILAC data, and partially compensate the missing value issue. The major building blocks include phosphopeptide meta-analysis, correlation network analysis, causal relationship discovery, and validation by literature mining. We have successfully applied the pipeline to analyze our current yeast data. Candidate proteins predicted to contribute to the filamentous growth response were selected by phosphopeptide meta-analysis and correlation network analysis. Causal relationship discovery was performed on candidate proteins identified from our analysis and validated proteins from the literature. The inferred causal relationships, along with the interactions inferred from phosphorylation changes in response to individual mutants, have suggested potential proteins that can be further intervened and studied in the future.
An overview of the analytical workflow is shown in
Summary | Number of phosphopeptides | Number of proteins |
Identifications in the whole dataset | 3,312 | 1,063 |
Identifications common among all 8 kinase-dead mutants (KDs) | 73 | 66 |
Identifications common among 4–8 KDs | 882 | 486 |
Identifications that are significant in at least 1 KD | 863 | 452 |
28(5 from complete measurements – high-confidence) | 26(5 from complete measurements – high-confidence; 17 have inner connections supported by STRING |
|
High-confidence hub proteins identified from the stringent correlation network | - | 19 |
Proteins known to be involved in filamentous growth from literature mining, and detected in our dataset | - | 20(15 of them are significant in at least 1 KD) |
The relationships of the eight kinase mutants and their effects on global phosphorylation patterns were subjected to correlation analysis (see
The hierarchical clustering tree using Spearman correlation as the similarity metric is drawn along the left side of the heatmap.
We need to be cautious when interpreting the correlations for partially multiplexed data, such as in triplex SILAC experiments. Because a peptide quantified for one sample is highly likely to be quantifiable for the other two samples in the same triplex, the identification and quantification of phosphopeptides in a triplex experiment tend to be linked. In other words, the overlap within a triplex run should be near 100% but the overlap between different triplex runs will be lower due to instrument sampling limitations. A high number of replicates may help minimize missing data, and compensate for the possible bias introduced by tied identification and quantification; but it is rarely performed due to the high cost of these analyses.
A total of 882 phosphopeptides representing 486 proteins were commonly identified in 4–8 kinase-dead (KD) mutants. After the missing values were imputed, the tight clustering method
Cluster | Proteins (traced back from phosphopeptides) | Enriched terms |
1 | YRO2, BUG1, VPS74, HXK1, PIL1, FBP26, PTK2, NPA3, BIR1, MYO3, UTP14, ARE2, DBP5, RUD3 | Nucleotide phosphate-binding region:ATP (P-value = 6.54E-04, Benjamini = 3.4E-2) |
Nucleotide-binding (P-value = 1.8E-3, Benjamini = 4.2E-2) |
||
ATP-binding (P-value = 6.0E-3, Benjamini = 9.3E-2) |
||
2 | VMA2, SEC31, GLY1, PEA2, VTC2, KEM1, UFD1, TIF4631, BCY1, SPA2, MFT1, NEW1, KRE6 | - |
3 | NUP60, SLA1, STU1, YCL020W, VBA4, HOM2, YDR365W-B, VPS74, PSP1, CHD1, NUP145, SPT6, HSE1, ABF1, MEH1, CKI1, YLR413W, SPT5, HRB1, LCB4, CAF20, MRL1 | Endosome (P-value = 1.6E-3, Benjamini = 6.6E-2) |
RNA polymerase II transcription elongation factor activity (P-value = 1.4E-3, Benjamini = 9.6E-2) |
||
Transcription elongation regulator activity (P-value = 2.8E-3, Benjamini = 9.9E-2) |
||
4 | FAP7, ITR1, LSB3, LEU1, FLC3, SPT6, YGR125W, CRP1, KEL1, LCB3, YBT1, BDF1, YMR031C, DDR48, YMR295C, GPD2, ZEO1, CAF20, SNF2 | - |
5 | PIN4, CYC8, BUD3, LYS20, CDC34, MAK21, BFR2, SUM1, GLY1, NUP145, PRP43, SPT6, ENP2, YOR1, SSZ1, NUP2, YLR345W, SUB1, ESC1, BDP1, DCP2, RPC31, SLA2, NOP8, ALE1, MSB1, SNU66 | Nucleus (P-value = 1.0E-4, Benjamini = 3.4E-3) |
Nuclear lumen (P-value = 3.4E-4, Benjamini = 2.7E-2) |
||
6 | SIF2, PPH22, VAC8, HSP12, RTF1, RSC30, TRA1, LCB3, NAP1, SIC1, RPN13, YMR196W, MRE11, MCK1, LEM3, FPK1, LSP1 | - |
7 | IST2, AIM3, RPC53, YDR186C, ECM32, MIG1, HXK2, VHS2, RNR2, UTR1, FBA1, EAP1, YLR257W, PFK2, PFK2, ACC1, YOR052C | Fructose and mannose metabolism (P-value = 3.0E-3, Benjamini = 3.9E-2) |
Glycolysis (P-value = 1.6E-3, Benjamini = 4.3E-2) |
||
Glycolysis/gluconeogenesis (P-value = 9.8E-3, Benjamini = 6.2E-2) |
||
8 | AKL1, IST2, MAK5, FEN1, LHP1, RPC53, SAS10, SHS1, MAK21, DOP1, GCD6, GUK1, CHO1, PDA1, LEU1, NOP7, SPT6, TFG1, HXT1, AIM21, URA2, CDC11, MAK11, VPS13, CBF5, VTA1, CRN1, YMR031C, EFR3, ADE4, NOP12, MAM3, CAF20, PEX25, TIF5 | Ribosome biogenesis (P-value = 1.0E-4, Benjamini = 5.0E-3) |
Functional enrichment P-value and Benjamini-Hochberg corrected p-value (Benjamini) were calculated with DAVID Functional Annotation Tool
Benjamini <0.1,
Benjamini <0.05,
Benjamini <0.01.
All the clusters are highly enriched in the term “phosphoprotein” (not listed above).
We observed examples of multiple phosphorylation domains on the same protein that share similar phosphorylation change patterns and thus end up in the same cluster. For example, “_KGS(ph)FTTELSCR_” (position of the phosphorylated serine: 520) and “_RSS(ph)YISDTLINHQMPDAR_” (position of the phosphorylated serine: 238 or 239) on Psp1p in Cluster 3. It is possible that those phosphorylation sites are co-regulated by the same biological process. They might be closely located in protein tertiary structure or share sequence similarities that allow them to be phosphorylated by the same kinase. Another example where two phosphorylation sites are in the same domain and thus physically close in the protein sequence, “_DQDQSSPKVEVTS(ph)EDEK_” (position of the phosphorylated serine: 495) and “_VEVT(ph)SEDEKELESAAYDHAEPVQPEDAPQDIANDELK_” (position of the phosphorylated threonine: 494) on Leu1p in Cluster 4. Both of these phosphorylation sites were identified in a WT/SNF1/TPK2 experiment, where the serine (S) at position 495 in the former has phosphorylation probability 0.999 (reported by MaxQuant), while the threonine (T) at position 494 in the latter has phosphorylation probability 0.96. These two sites might be alternative phosphorylation sites having similar effects; or the dominancy of either site might be affected by protein cellular localization or kinase activity patterns.
On the other hand, we also found examples of the same protein (e.g., Spt6p) to be clustered in multiple functional groups. Those different sites do not necessarily change phosphorylation in a similar pattern, since they might have different functions and be regulated by different biological processes. All the above observations are worth further investigation.
A total of 863 unique phosphopeptides representing 452 proteins were identified to have significant phosphorylation changes in at least one kinase-dead mutant. We can then infer the downstream proteins regulated by the kinases and the inferred regulation might be direct or indirect. A total of 1,299 significant kinase-phosphopeptide regulation pairs were identified (Dataset S2). We incorporated the corresponding proteins and generated an extended pathway map (
The extended filamentous growth pathway map integrating the known knowledge (
A total of 28 phosphopeptides representing 26 proteins from the entire dataset were found to have globally significant phosphorylation changes (Dataset S3). These candidates were picked out without using prior knowledge. The Fisher's probability test
ENSEMBL ID |
Standard name | Name description |
Modified sequence | Stress response |
YDR001C | NTH1 | Neutral trehalase;Alpha,alpha-trehalase;Alpha,alpha-trehalose glucohydrolase | _RGS(ph)EDDTYSSSQGNR_ | Nth1p is a multiple stress responsive protein |
YNL015W | PBI2 | Protease B inhibitors 2 and 1;Proteinase inhibitor I(B)2 | _HNDVIENVEEDKEVHT(ph)N_ | Pbi2 gene deletion leads to decreased resistance to hyperosmotic stress |
YOR220W | RCN2 | Regulator of calcineurin 2;Weak suppressor of PAT1 ts protein 1 | _NKPLLSINT(ph)DPGVTGVDSSSLNK_ | Rcn2p is Induced in response to DNA-damaging agent methyl methanesulphonate |
YPL058C | PDR12 | ATP-dependent permease PDR12 | _HLSNILS(ph)NEEGIER_ | Pdr12 is strongly induced by weak acid stress |
YDR171W | HSP42 | Heat shock protein 42 | _KS(ph)S(ph)SFAHLQAPSPIPDPLQVSKPETR_ | Protein expression is induced by stresses such as heat shock, salt shock and starvation |
Annotated with MaxQuant.
Nth1p is a key enzyme in the trehalose pathway which plays a crucial role in glucose homeostasis and stress responses
We also searched the STRING database
All possible pairs among the 73 common phosphopeptides with complete measurement were tested using the Pearson correlation. A total of 45 strongly correlated phosphopeptide pairs were identified, each satisfying the following criteria: the correlation test p-value < 0.05, and the stringent requirement of |Pearson correlation coefficient| ≥ 0.9. Detailed information on the 45 pairs of phosphopeptides is provided in Dataset S4. Twenty-seven of the pairs have positive correlations, while 18 pairs have negative correlations. A stringent protein correlation network containing 35 proteins (
Red lines indicate positive correlations, while black lines indicate negative correlations. The larger the node size, the greater the degree of connectivity.
In the protein correlation network, proteins with the highest degrees of connectivity are considered core components in the network. The 19 proteins having degrees greater than 1 (protein self-connection ignored) in the stringent protein correlation network were predicted to be core components of the network. Detailed descriptions and evidence of the proteins are summarized in Table S2 in
Gpd2p and Lys21p are two self-connected proteins. The self-connection was built up by two distinct phosphorylation sites on the protein. Gpd2p has not been related to filamentous growth in
In addition to the candidate proteins predicted from our dataset, we retrieved from the literature and authoritative databases
The interactions retrieved from the differentially phosphorylated proteins in individual kinase-dead mutants (the dashed edges in
Mutated kinases | Globally significant (high-confidence) | Hub proteins (high-confidence) | From literature mining and detected in our dataset |
(also see |
(also see Table S2 in |
(also see Table S1 in |
|
KSP1 | NTH1 | SEC21 | BCY1 |
KSS1 | PBI2 | ABF1 | BMH1 |
SKS1 | RCN2 | ARE2 | BUD2 |
STE20 | PDR12 | DCP2 | CDC28 |
SNF1 | HSP42 | KEM1 | CYR1 |
TPK2 | NUP145 | DIG1 | |
ELM1 | SPA2 | DIG2 | |
FUS3 | CHO1 | FLO8 | |
GLY1 | GPR1 | ||
HSP42 | KEM1 | ||
PWP1 | NRG1 | ||
PUF6 | PEA2 | ||
SPT6 | RAS2 | ||
SSD1 | SFL1 | ||
SUM1 | SNF1 | ||
NUP2 | SPA2 | ||
PBI2 | STE20 | ||
PBP1 | STE50 | ||
UFD1 | TPK3 | ||
TPM1 |
Bayesian network modeling identified causal influences for 22 protein pairs (44 phosphopeptide pairs) (Table S3 in
Each edge indicates a potential causal influence between proteins, which might be a direct or indirect influence. It does not distinguish activation and inhibition. The thicker the edge, the higher the posterior probability.
Through another inspection of the phosphorylation change patterns of the peptide pairs detected with relatively strong causal influences (posterior probability higher than 0.7), we observed that: Ste20p has opposing phosphorylation changes compared to Are2p, Pdr12p and Sec21p; two phosphopeptides (the same amino acid sequence but different phosphorylation sites) on Hsp42p present opposing phosphorylation changes compared to Ste20p; and Pbp1p presents consistent phosphorylation change compared to Ste20p. With caution we predict that the opposing pattern implicates an inhibitive influence of Are2p, Pdr12p and Sec21p to Ste20p; and similarly, inhibition of Hsp42p to Ste20p; while Pbp1p shed activating influence to Ste20p. Again, we emphasize that the influence might be quite indirect and even be influenced by multiple pathways.
Our computational analyses highlight the proteins Nth1p, Pbi2p, Pdr12p, and Rcn2p as undergoing globally significant phosphorylation changes. To determine if these proteins do in fact impact filamentous growth in
The genes
In summary, all four deletions result in differential invasive growth compared to the wild type control, providing prospective validation for our approach to identification of candidate proteins in this biological system from phosphoproteomics data alone.
In this study, we demonstrate that interventional phosphoproteome studies can provide new insight into signaling pathways involved in biological processes such as yeast filamentous growth. In order to increase sensitivity to smaller changes in phosphorylation relative to previous yeast global phosphoproteome studies
Each of the kinases mutated in this study had previously been implicated in filamentous growth. Many of these kinases are known to also affect pathways that are not involved directly in filamentous growth. However, the proteins which change phosphorylation level in response to multiple mutants are reasonable candidates involved in filamentous growth. The sensitivity of such detection is constrained by the degree of overlap between pathways, the coverage of pathways by the mutants, and the extent of missing data. Upstream components of isolated pathways may be missed, while downstream core components are more likely to be identified.
A remaining challenge for quantitative phosphoproteome analysis arises from the sampling limitations and resolution of current mass spectrometers
This analysis pipeline has been developed to study yeast filamentous growth pathways; however, the methodology is not limited to yeast or this biological process. It can be applied to other complex organisms to facilitate investigation into various biological processes. We anticipate the methodology to be applicable as well to other interventional studies via different experiment platforms.
Tandem mass spectrometry data were generated from a series of triplex SILAC
All strains were auxotrophic for Lys and Arg, and were grown on defined medium supplemented with the appropriate isotopic forms of Lys and Arg. The cultures were grown to log phase, and treated with 1% (vol/vol) butanol to induce filamentous growth
Protein levels were determined by the Bradford protein assay and the proteins from the triplex labeling were then pooled, and were digested by trypsin. The digest was separated into fractions using strong cation-exchange (SCX) fractionation, followed by selective enrichment of phosphorylated peptides using titanium dioxide
In the meta-analysis, we contrast and combine the results from different KD-versus-WT experiments, so that to find the correlations between kinase-dead mutants, categorize peptide phosphorylation patterns over experiments, and identify differentially phosphorylated peptides.
The relative phosphorylation level obtained for each phosphopeptide is represented as a ratio for each of the 8 kinase-dead mutants (KD) versus wild type (WT) under filamentous growth conditions. Two examples of phosphopeptides identified in all 8 kinase-dead mutants are shown in
Phosphopeptide | Phosphorylation fold-changes in following KD-vs-WT conditions | |||||||
Sks1-KD vs. WT | Ste20-KD vs. WT | Snf1-KD vs. WT | Tpk2-KD vs. WT | Elm1-KD vs. WT | Fus3-KD vs. WT | Kss1-KD vs. WT | Ksp1-KD vs. WT | |
ADDEEDLS(ph)DENIQPELR | 0.72 | 0.71 | 0.70 | 0.52 | 1.0 | 0.88 | 0.83 | 0.86 |
ADGTGEAQVDNS(ph)PTTESNSR | 2.3 | 3.7 | 2.1 | 2.2 | 0.33 | 0.58 | 0.75 | 0.69 |
Phosphorylation level of each phosphopeptide is represented in a list of ratios. We used the peptide ratios provided by the MaxQuant output, which have been normalized for each LC-MS/MS run
For the purpose of evaluating similar or reciprocal effects on phosphorylation changes in response to different kinase mutations, we generated a correlation heatmap of the kinase-dead mutants (see
Our goal of this cluster analysis is to find the groups of phosphopeptides sharing similar phosphorylation change patterns, which are likely to be involved in similar functional pathways. The phosphopeptides commonly identified in 4–8 KD-versus-WT conditions were selected, and the missing values were imputed (on log2 scale) using 5-nearest neighbor averaging
We also attempted several traditional clustering methods, including hierarchical clustering methods
Note that the cluster analysis was performed at the peptide level rather than the protein level, because many proteins contain multiple phosphorylation domains whose responses may correlate or not, depending on the function of phosphorylation at those sites and the physiological conditions examined. Protein identities were traced back from the peptide identifications while accounting for protein isoforms.
The functional terms were annotated for the proteins in top tight clusters to survey functional enrichment. The Functional Annotation Tool on DAVID v6.7
The phosphopeptides that change phosphorylation level significantly in each individual KD-versus-WT experiment were selected by the significance B value < 0.05.
The kinases selected to be dead mutated are all known to be involved in filamentous growth. The proteins which have globally significant responses in the mutants versus WT controls are potential components involved in filamentous growth or expression products of the gene targets. Detecting globally differentially phosphorylated peptides combining the results from all the KD-versus-WT experiments is a multiple testing problem
In the framework of Fisher's method, the two-tailed p-value
We also adapted an adaptively weighted statistic-based method (missing values not allowed)
A correlation network of all the 73 common phosphopeptides with complete measurements was generated based on their phosphorylation changes under all 8 KD-versus-WT conditions. The Pearson correlation coefficient between each pair of distinct phosphopeptides was calculated. Strong correlations meet the following criterion: p-value of the Pearson's correlation test < 0.05, and a stringent requirement of |Pearson correlation coefficient| ≥ 0.9. The protein identifications can be traced back from the phosphopeptides.
The correlation network among proteins is an undirected network. Degrees of connectivity for each protein in the network can provide an assessment of importance of the protein. The higher the degree, the more frequently the protein is involved in interactivities with other proteins in the network. From this measurement, we predict core-components in the correlation network.
In addition to the candidate proteins predicted by global differential phosphorylation and the core-components identified from the correlation network, we also retrieved a list of proteins reported as known or potential components involved in filamentous growth from literature as well as authoritative databases, such as SGD
The correlation network is intuitive; however, it is not directed, and direction information for networks is quite useful for interpretation. For this reason we went beyond correlation analysis to causal Bayesian network modeling. Because different phosphopeptides from the same protein do not definitely change phosphorylation level in the same direction, the network modeling must be performed on peptide level, and then traced back to their parent proteins.
If a phosphopeptide was detected more than once in a specific mutant, the median of the fold-changes was taken as a representative of the response in this mutant. The phosphorylation fold-changes of peptides were discretized into three states based on the 2-fold change criterion
A causal Bayesian network is a Bayesian network in which a directed edge is interpreted as a causal influence from the parent node to the child node
Non-informative prior distribution of the model structures is used. For given data,
The analyses were implemented in R v2.15.1 and MATLAB R2012a. The causal Bayesian network structure learning was performed in MATLAB using BNT (Bayes Net Toolbox for MATLAB) v1.0.7
(XLSX)
(XLSX)
(XLSX)
(XLSX)
(PDF)
The authors acknowledge useful discussions with Santiago Schnell, Alexey Nesvizhskii, George Michailidis, Gilbert S. Omenn and Rajasree Menon.