Figure 1.
The data process pipeline of LAceP.
The dataset was derived from SysPTM 2.0 (http://lifecenter.sgst.cn/SysPTM/) and PhosphoSitePlus (http://www.phosphosite.org/). After eliminating redundancy, the non-redundant sites were obtained. Independent dataset was selected from positive dataset and negative dataset randomly at first. Then the remaining positive items and the same number of negative items, selected randomly from the whole negative dataset, were combined to construct training datasets. The selection process was iterated 10 times. After encoding three types of features, the logistic regression algorithm was utilized to build the classifier. After parameter optimization and performance evaluation, the best model was created. Finally, a web server of LAceP was established for biologist to use the prediction model.
Figure 2.
Compositional distribution of amino acids between acetylated and non-acetylated peptides.
The composition of amino acids in acetylated and non-acetylated peptides was displayed with the Two Logo software. It showed that for a position, composition of amino acids had a wide disparity between acetylated and non-acetylated peptides, especially those located in the positions of −7∼ −1 and 1∼7.
Table 1.
The impact of window sizes on the performance of LAceP.
Table 2.
The performance of models trained with different types of features.
Table 3.
The comparison of performance between LAceP and existing methods.
Figure 3.
The web interface of LAceP.