Fig 1.
The working flow of data collection.
Fig 2.
Illustration of the experiment-split method.
En represents the data from the nth experimental source. Test n represents that the En data is used for the independent test and the rest experimental data for the training.
Fig 3.
The graph representation of the CNNOH model.
(A) The input sequence consists of 61 amino acids. (B) In the input layer, the input sequence is represented by a binary matrix using the One-Hot encoding. (C) The convolution layer contains two convolution sublayers and two max-pooling sublayers. D) Fully connected layer. The output matrix from the convolution layer is nonlinearly transformed to 128 representative features. E) Output layer. The modification score is calculated based on the 128 features. The details are described in the Methods section.
Fig 4.
Graph representation of the LSTMOH.
A) The input sequence consists of 61 amino acids. B) In the input layer, the sequence is represented by a 61×21 matrix through the One-Hot encoding. C) The LSTM layer includes seven LSTM sublayers. Every sublayer contains 61 sequentially connected LSTM cells, each of which contains 32 hidden neuron units. The output data from the former LSTM sublayer are fed to the latter LSTM sublayer. D) Output layer. The output from the LSTM layer is used to calculate the modification score.
Table 1.
The comparison between evaluated performances of GPS-MSP and MusiteDeep and their self-reported performances.
Fig 5.
Performance of GPS-MSP and MusiteDeep assessed using different experimental sources.
It included the GPS-MSP prediction performances for Kme1 (A), Kme2 (B), Kme3 (C) and Kme (D), and the MusiteDeep performance for Kme (E).
Fig 6.
The performances of different DL models for the prediction of Kme1 sites using ten-fold cross-validation.
Table 2.
Performance comparison of CNNOH models between cross-validation and experiment-split test.
Table 3.
Comparison of experiment-split performances for the models.
Fig 7.
The CNNOH performances were assessed by the experiment-split method.
The performances of the CNNOH model for Kme1 (A), Kme2 (B), Kme3 (C) and Kme (D) were evaluated using various independent experimental sources, respectively.