GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features

doi:10.1371/journal.pone.0200283

Fig 1.

The computational framework of GPSuc.

More »

Expand

Fig 2.

Sequence logos illustrating the amino acid appearance in the sequences surrounding the succinylation sites (http://www.twosamplelogo.org/).

Nine species: H. sapiens, H. capsulatum, M. musculus, E. coli, M. tuberculosis, T. gondii, S. cerevisiae, S. lycopersicum, and T. aestivum were used.

More »

Expand

Fig 3.

Distribution of AAF in the surrounding succinylation (gray color) and non-succinylation (red color) sequences for nine species.

The columns represent AAF, while the rows show each of amino acid residues.

More »

Expand

Table 1.

AUC values of different combination of feature scores for training and test dataset in a generic predictor.

More »

Expand

Table 2.

Performance of generic and species-specific succinylation site prediction on the training dataset.

More »

Expand

Table 3.

Performance of exiting generic tools on the test dataset.

More »

Expand

Fig 4.

Performance evaluation using single five features and the ‘combined model’ for prediction succinylation sites in nine species.

Gray colors represent the AUC value of training dataset while red colors show that of the test dataset. ‘Combined’ indicates the performance by the combined five encoding features. The final H. sapiens model was given as a linear combination of the five AAC, AAindex, binary, PSSM, and pCKSAAP features with LR coefficient values of 0.142, 1.566, 0.665, 0.342 and 0.667, respectively. In the same way, the combined H. capsulatum, M. musculus, E. coli, M. tuberculosis, S. cerevisiae, T. gondii, S. lycopersicum and T. aestivum were given with (0.102, 0.466, 0.462, 0.242 and 1.367), (0.155, 1.077, 0.575 and 0.761), (0.121, 0.473, 0.763, 0.230 and 1.214), (0.127, 0.358, 0.404, 0.109 and 1.066), (0.320, 0.391, 0.553, 0.182 and 1.122), (0.117, 0.331, 0.734, 0.139 and 1.014), (0.113, 0.417, 0.818, 0.103 and 1.172), and (0.112, 0.462, 0.723, 0.164 and 1.299), respectively. The LR constant terms for each species were set to zero.

More »

Expand

Fig 5.

ROC curve of nine species-specific predictors of GPSuc.

(A)Training data performances over a 10-fold cross-validation test. (B) Test dataset performances.

More »

Expand

Table 4.

Performance comparison of a species-specific predictor using the test dataset.

More »

Expand