Development of an experiment-split method for benchmarking the generalization of a PTM site predictor: Lysine methylome as an example

doi:10.1371/journal.pcbi.1009682

Fig 1.

The working flow of data collection.

More »

Expand

Fig 2.

Illustration of the experiment-split method.

En represents the data from the nth experimental source. Test n represents that the En data is used for the independent test and the rest experimental data for the training.

More »

Expand

Fig 3.

The graph representation of the CNN_OH model.

(A) The input sequence consists of 61 amino acids. (B) In the input layer, the input sequence is represented by a binary matrix using the One-Hot encoding. (C) The convolution layer contains two convolution sublayers and two max-pooling sublayers. D) Fully connected layer. The output matrix from the convolution layer is nonlinearly transformed to 128 representative features. E) Output layer. The modification score is calculated based on the 128 features. The details are described in the Methods section.

More »

Expand

Fig 4.

Graph representation of the LSTM_OH.

A) The input sequence consists of 61 amino acids. B) In the input layer, the sequence is represented by a 61×21 matrix through the One-Hot encoding. C) The LSTM layer includes seven LSTM sublayers. Every sublayer contains 61 sequentially connected LSTM cells, each of which contains 32 hidden neuron units. The output data from the former LSTM sublayer are fed to the latter LSTM sublayer. D) Output layer. The output from the LSTM layer is used to calculate the modification score.

More »

Expand

Table 1.

The comparison between evaluated performances of GPS-MSP and MusiteDeep and their self-reported performances.

More »

Expand

Fig 5.

Performance of GPS-MSP and MusiteDeep assessed using different experimental sources.

It included the GPS-MSP prediction performances for Kme1 (A), Kme2 (B), Kme3 (C) and Kme (D), and the MusiteDeep performance for Kme (E).

More »

Expand

Fig 6.

The performances of different DL models for the prediction of Kme1 sites using ten-fold cross-validation.

More »

Expand

Table 2.

Performance comparison of CNN_OH models between cross-validation and experiment-split test.

More »

Expand

Table 3.

Comparison of experiment-split performances for the models.

More »

Expand

Fig 7.

The CNN_OH performances were assessed by the experiment-split method.

The performances of the CNN_OH model for Kme1 (A), Kme2 (B), Kme3 (C) and Kme (D) were evaluated using various independent experimental sources, respectively.

More »

Expand