Fig 1.
A: sequence collection and redundancy reducing. B: feature representation by physicochemical and embedding methods. C: feature selection based on genetic algorithm with offspring competition and model construction.
Table 1.
The statistics of number of proteins and phosphorylation S/T/Y sites for different organisms.
Fig 2.
Comparison results of four feature selection strategies for S. cerevisiae S phosphorylation site.
OriDi: original feature dimension.
Fig 3.
Accuracy values of the model constructed with embedded features with different k-mer length and window size.
Fig 4.
Performance comparison of feature selection strategies for S. cerevisiae S phosphorylation site.
Table 2.
The prediction performance for the fungi phosphorylation S/T/Y site in seven organisms.
Fig 5.
Feature intersection and sequence patterns for S. cerevisiae S site of five fungal species.
The enrichment and depletion bias of amino acid was calculated by Two-Sample-Logos (http://www.twosamplelogo.org/).
Fig 6.
Feature importance, contribution and dependency analysis.
A: the 20 most important features. B: summary plot for feature value contribution. The x-axis represents the SHAP values, representing the impact that feature had on the model’s performance. C–H: SHAP dependence plots. These plots show the effect that a single feature has on the model and the interaction effects across features.
Table 3.
Performance comparison of MFPSP with existing predictors on independent test data.