Prediction of DNA Methylation based on Multi-dimensional feature encoding and double convolutional fully connected convolutional neural network

doi:10.1371/journal.pcbi.1011370

Fig 1.

Chemical structure of 5mC,6mA,4mC.

More »

Expand

Table 1.

Number of gene sequences in the training and test sets.

More »

Expand

Fig 2.

The workflow of the proposed MEDCNN model for the prediction of DNA methylation sites.

(a) Dataset collecting (b) Feature encoding (c) Predictive model construction (d) Model performance evaluation.

More »

Expand

Fig 3.

Visualization of the multidimensional matrix of MEDCNN inputs and the attention graph of multidimensional features extracted before and after applying the attention module.

More »

Expand

Fig 4.

The results of MEDCNN prediction of independent datasets.

(a), (b), (c), (d) and (e) represent the predicted values of SN, SP, ACC, MCC and AUC for the three methylation types and the distribution of the results, respectively.The size of its contour represents the degree of concentration or clustering of the results. (f), (g) and (h) illustrate the prediction indexes for identifying methylation types of 5hmC, 4mC and 6mA by using independent datasets.

More »

Expand

Table 2.

Predicting the mean value of different DNA methylation types.

More »

Expand

Fig 5.

Results of identifying different feature codes for 5hmc/4mC/6mA methylation types in 17 datasets.

(a), (b) and (c) represent the comparison of ACC values of predicted methylation types of 5hmC,4mC and 6mA with different coding methods, respectively. (d), (e), (f) and (g) represent the SN, SP, ACC and AUC values of 17 datasets with different encoding methods combined with CNN to identify 5hmC/6mA/4mC sites, respectively.

More »

Expand

Table 3.

The results of the Wilcoxon test for each feature code.

More »

Expand

Fig 6.

Cross-species validation of 5 species of 6mA types.

The heatmaps (a), (b), (c) and (d) show the cross-species predicted SN, SP, ACC, and AUC values for the five species for which the 6mA methylation type was determined. Once a species has built a model on its training dataset, it was tested on data from other species. The horizontal coordinates are the different species as the training set and the vertical coordinates are as the testing set.

More »

Expand

Table 4.

Information on each comparison model.

More »

Expand

Fig 7.

Performance comparison between MEDCNN and other existing methods on 17 benchmark independent datasets.

(a) and (b) represent the comparison results of SN values for each model. (c) and (d) represent the comparison results of SP values for each model.

More »

Expand

Table 5.

Statistics on the distribution of results for each comparison model in 17 datasets.

More »

Expand

Fig 8.

Comparison of ACC and MCC values of the proposed MEDCNN and other existing models on 17 independent datasets.

More »

Expand

Table 6.

The results of the Wilcoxon test for each model.

More »

Expand