Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species

Analysis of the phosphoproteomes and the gene interaction networks of divergent yeast species defines the relative contribution of changes in protein phosphorylation pathways to the generation of phenotypic diversity.


Dataset quality assessment Coverage
Coverage was estimated by leaving out, one at a time, previously published in vivo phosphorylation datasets for S. cerevisiae [1][2][3][4] and S. pombe [5] (see Supplementary Table 1). It is important to note that three of the S. cerevisiae studies should contain many phosphorylation sites that are condition specific, as the yeast cells were growing in the presence of MMS [1] or in the presence of the mating pheromone [3,4]. The phosphorprotein coverage of our study ranges from 51% to 71% for growth in rich media. The coverage for detection of phosphorylated peptides ranges only slightly lower, from 43% to 62%. The identification of the exact phosphorylation site within a phosphorylated peptide by mass-spec is a harder challenge since the computational analysis of the peptide might attribute to different residues similar likelihood scores for the phosphosite. This is the main reason why the coverage values for phosphosite detection is lower than for phosphopeptides, ranging from 20% to 30%.
Supplementary Table 1 -Coverage estimates for phosphoproteins and phosphosites in S. cerevisiae and S. pombe. The estimated coverage of our phosphorylation sets ranges from 51% to 71% for detection of phosphoproteins, 43% to 62% for detection of phosphorylated peptides (10 amino-acid peptide) and 20% to 31% for correct detection of previously known phosphosites.

Abundance bias
We tested the potential effect of an abundance bias in the determination of phosphoproteins taking advantage of experimentally determined concentration values for S. cerevisiae proteins [6]. We binned S. cerevisiae proteins according to their protein abundance determined experimentally and we calculated the fraction of phosphoproteins or proteins not yet detected to be phosphorylated in these abundance bins (see Supplementary Figure 1). Although, phosphoproteins are on average 3 times more abundant than non-phosphorylated proteins (p-value= 6.3×10 -13 with a t-test) this difference is small compared to the 8 orders of magnitude spanned by abundance values. It is therefore unlikely that changes in protein abundance across different species might determine the changes in phosphorylation enrichment detected by mass-spec. To address this further we have analyzed the correlation between phosphorylation enrichment and the average protein abundance of different gene ontology groups. If protein abundance was a strong factor determining the observed phosphorylation enrichment of a group of proteins we would expect to see a high correlation between the phosphorylation enrichment and the average protein abundance of gene ontology groups. On the contrary we found a poor and not significant negative correlation between the average protein abundance and the fraction of phosphoproteins (R=-0.21, p>0.1) or the fraction of phosphosites (R=-0.17, p>0.1).
Supplementary Figure 1 -Abundance bias in detection of phosphorylated proteins by mass-spec

Gene duplication
In order to calculate the phosphorylation enrichment for functional groups across the three species studied here (S. cerevisiae, C. albicans and S. pombe) we propagated Gene Ontology annotations from S. cerevisiae to the other two species using one-to-many orthology assignments has determined by the Synergy algorithm [7]. It is therefore possible that differences in phosphorylation enrichment might reflect changes in number of proteins assigned to the functional group instead of differences in number of phosphosites or phosphoproteins. For this reason from the GO groups that show a significant change in phosphorylation enrichment, we determined those that also show a significant change in size (number of proteins). We determined the number of proteins assigned, in each species, to each functional group (gene ontology or complexes from MIPS). A very high cross-species correlation was observed for the number of proteins assigned to each functional group across species (R>0.99). We defined for each functional group the relative number of proteins for each species as the contribution to the sum across the 3 species. As expected from the high-cross species correlation most of the functional groups show very similar number of proteins across species with an average contribution to the sum near 0.3 for the three species. We then defined as a significant change in total size when a functional group had, for at least one species, a z-score greater than 1.6 or smaller than -1.6 corresponding to p-value<0.05. In total, six functional groups and one complex were observed to have significant differences in phosphorylation and number of proteins (see Supplementary Figure 2). The highest observed significant difference was detected for the complex "Other Respiratory Chain" where the C. albicans complex is predicted to have 69 subunits while only having 59 subunits in S. pombe. This could explain why the observed phosphorylation enrichment of this complex in C. albicans is lower than the other two species but cannot account for example for the higher phosphorylation of the S. cerevisiae complex in relation to the S. pombe complex (see Figure 1 in the main article). Also, the differences in number of proteins (up to a maximum of 1.7 times relative change) is much smaller than the observed changes in phosphorylation sites (up to 7 times the relative change).
Supplementary Figure 2 -Significant changes in total number of proteins assigned to function or complex. For each function or complex with significant change in phosphorylation enrichment with determined those that also had a significant change in total number of proteins across the three species studied.

Predictors for phosphorylation propensity and kinase-substrate interactions
Two different approaches were used to predict phosphorylation propensity from sequence as described in the main article: 1) Likelihood ratios for kinase motif enrichment and spatial clustering; 2) Phosphosite propensity predictions from GPS 2.0 [8].

Comparative phosphorylation enrichment
In order to indentify functions and complexes with a significant change in the enrichment of phosphorylated residues we have used the functional annotations of S. cerevisiae to define orthologous groups of genes in S. cerevisiae, C. albicans and S. pombe according to Gene Ontology groups or protein complex membership. For each functional group (GO function or complex) we determined the number of phosphosites per group in the three species. We then normalized by the average number of phosphosites per protein in the proteome to take into account potential differences in coverage between species. For each group we calculated the contribution to the sum of the normalized phosphosite per protein across the three species. Most groups showed very similar levels of phosphorylation across species with an average contribution near 0.3 for the three species. Finally we searched for groups with significant changes by calculating the Z-score of each group within each species. We show below the relative change in phosphosite enrichment and the Z-score for each Gene Ontology group (supplementary figure 4) and protein complex (supplementary figure 5). Supplementary Figure 5 -Relative phosphorylation of protein complexes across S. cerevisiae, C. albicans and S. pombe. For each complex we determined the relative change in phosphosite enrichment across the three yeast species studied. These changes were converted to Z-scores to highlight the significant changes. Z-score above 1.6 or below -1.6 are show in bold and correspond to the lower and upper tail p-value<0.05.

Rates of change of transcription factor-promoter interactions
In order to calculate the rate of change of interactions between transcription factors (TF) and promoters we have collected recent data on TF-promoter interactions obtained with chip-Chip methods for different species [5,9,10]. We calculated the TF-promoter interactions turnover comparing S. cerevisiae to C. albicans, K. lactis, S. bayanus and S. mikatae (see supplementary table 2) and human to mouse (see supplementary table 3). We assumed a TF-promoter interaction observed in C. albicans, K. lactis, S. bayanus or S. mikatae that was not observed in S. cerevisiae was considered an interactions gain or loss after the divergence of these species from S. cerevisiae. Similarly TFinteractions observed in mouse but not in human were considered as a gain or loss of interaction after the mouse-human split. Coverage estimates were only available for the study by Tuch and colleagues and these were taken into account to correct the observed changes by the expected changes due to lack of full coverage [9]. The estimate for the turnover of TF-promoter interactions ranges from 2×10 -04 to 4×10 -04 . The studies mentioned above claim very low false discovery rates (on the order of a few percent). Still, if there are potential a high number of false-positive interactions due to the technical difficulties associated with the chip-Chip assay, it is possible that we are over-estimating the true rate of change. If this is correct then the difference between the kinase-substrate interaction turnover and TF-promoter interactions turnover should be even smaller than the reported in the main article.
Supplementary Average

Analysis of divergent protein complexes
For each protein complex with diverged enrichment of phosphorylation we have predicted the kinases more likely to be responsible for the observed changes. In this section we list of each complex studied the top 5 kinases predicted to be associated with the complex in S. cerevisiae ranked by how well the kinase specificity predicts the phosphorylation pattern detected for the three species as described in the main methods section. We also provide, for each complex, the prediction of phosphorylation propensity across 10 ascomycota species (S. cerevisiae, S. bayanus, S. paradoxus, S. castellii, K. lactis, K. waltii, D. hansenii, C. albicans, Y. lipolytica and S. pombe). This phosphorylation propensity was obtained either using the GPS method or likelihood ratio method (LR) as described in the methods section. For each complex we selected the method that best predicted the phosphorylation pattern in S. cerevisiae , C. albicans and S. pombe as measured by the area under the ROC curve. The experimentally derived phosphorylation data in combination with the computational methods used provide with many novel testable hypothesis on kinase-substrate interactions across different yeast species.

Pre-replication complex
Supplementary Figure 6 -Evolution of phospho-regulation of the pre-Replication complex. A) For S. cerevisiae, C. albicans and S. pombe proteins found to be phosphorylated experimentally are marked with "P". For each protein in the species studied phosphorylation propensity was predicted based on sequence and represented in a color intensity gradient where darker colors represent increasing predicted phosphorylation likelihood. The AROC value for the prediction of the phosphorylation pattern the 3 species is 0.67 using the LR method. B) The top 5 kinases predicted to be associated with this complex in S. cerevisiae were ranked according to how well their binding specificity predicts the phosphorylation pattern in the three species with available data. C) The same as in B) but restricted to the MCM and ORC sub complexes.

Clathrin-associated protein (AP) complex
Supplementary Figure 7 -Evolution of phospho-regulation of the Clathrin-associated protein (AP) complex. A) For S. cerevisiae, C. albicans and S. pombe proteins found to be phosphorylated experimentally are marked with "P". For each protein in the species studied phosphorylation propensity was predicted based on sequence and represented in a color intensity gradient where darker colors represent increasing predicted phosphorylation likelihood. The AROC value for the prediction of the phosphorylation pattern the 3 species is 0.76 using the GPS method. B) The top 5 kinases predicted to be associated with this complex in S. cerevisiae were ranked according to how well their binding specificity predicts the phosphorylation pattern in the three species with available data. C) The same as in B but restricted to the AP-1, AP-2 and AP-3 sub complexes.

Outer kinetochore complex
Supplementary Figure 8 -Evolution of phospho-regulation of the Outer Kinetochore complex. A) For S. cerevisiae, C. albicans and S. pombe proteins found to be phosphorylated experimentally are marked with "P". For each protein in the species studied phosphorylation propensity was predicted based on sequence and represented in a color intensity gradient where darker colors represent increasing predicted phosphorylation likelihood. The AROC value for the prediction of the phosphorylation pattern the 3 species is 0.85 using the LR method. B) The top 5 kinases predicted to be associated with this complex in S. cerevisiae were ranked according to how well their binding specificity predicts the phosphorylation pattern in the three species with available data. C) The same as in B) but restricted to the DASH sub complex.

TFIID complex
Supplementary Figure 9 -Evolution of phospho-regulation of the TFIID complex. A) For S. cerevisiae, C. albicans and S. pombe proteins found to be phosphorylated experimentally are marked with "P". For each protein in the species studied phosphorylation propensity was predicted based on sequence and represented in a color intensity gradient where darker colors represent increasing predicted phosphorylation likelihood. The AROC value for the prediction of the phosphorylation pattern the 3 species is 0.78 using the LR method. B) The top 5 kinases predicted to be associated with this complex in S. cerevisiae were ranked according to how well their binding specificity predicts the phosphorylation pattern in the three species with available data.

v-ATPase complex
Supplementary Figure 10 -Evolution of phospho-regulation of the v-ATPase complex. A) For S. cerevisiae, C. albicans and S. pombe proteins found to be phosphorylated experimentally are marked with "P". For each protein in the species studied phosphorylation propensity was predicted based on sequence and represented in a color intensity gradient where darker colors represent increasing predicted phosphorylation likelihood. The AROC value for the prediction of the phosphorylation pattern the 3 species is 0.78 using the GPS method. B) The top 5 kinases predicted to be associated with this complex in S. cerevisiae were ranked according to how well their binding specificity predicts the phosphorylation pattern in the three species with available data.

RNA polymerase I complex
Supplementary Figure 11 -Evolution of phospho-regulation of the RNA polymerase I complex. A) For S. cerevisiae, C. albicans and S. pombe proteins found to be phosphorylated experimentally are marked with "P". For each protein in the species studied phosphorylation propensity was predicted based on sequence and represented in a color intensity gradient where darker colors represent increasing predicted phosphorylation likelihood. The AROC value for the prediction of the phosphorylation pattern the 3 species is 0.78 using the GPS method. B) The top 5 kinases predicted to be associated with this complex in S. cerevisiae were ranked according to how well their binding specificity predicts the phosphorylation pattern in the three species with available data.

rRNA processing complex
Supplementary Figure 12 -Evolution of phospho-regulation of the rRNA processing complex. A) For S. cerevisiae, C. albicans and S. pombe proteins found to be phosphorylated experimentally are marked with "P". For each protein in the species studied phosphorylation propensity was predicted based on sequence and represented in a color intensity gradient where darker colors represent increasing predicted phosphorylation likelihood. The AROC value for the prediction of the phosphorylation pattern the 3 species is 0.72 using the GPS method. B) The top 5 kinases predicted to be associated with this complex in S. cerevisiae were ranked according to how well their binding specificity predicts the phosphorylation pattern in the three species with available data.

Other respiratory complex
Supplementary Figure 13 -Evolution of phospho-regulation of the other respiratory complex. A) For S. cerevisiae, C. albicans and S. pombe proteins found to be phosphorylated experimentally are marked with "P". For each protein in the species studied phosphorylation propensity was predicted based on sequence and represented in a color intensity gradient where darker colors represent increasing predicted phosphorylation likelihood. The AROC value for the prediction of the phosphorylation pattern the 3 species is 0.53 using the GPS method. B) The top 5 kinases predicted to be associated with this complex in S. cerevisiae were ranked according to how well their binding specificity predicts the phosphorylation pattern in the three species with available data.