Fig 1.
Chemical structure of 5mC,6mA,4mC.
Table 1.
Number of gene sequences in the training and test sets.
Fig 2.
The workflow of the proposed MEDCNN model for the prediction of DNA methylation sites.
(a) Dataset collecting (b) Feature encoding (c) Predictive model construction (d) Model performance evaluation.
Fig 3.
Visualization of the multidimensional matrix of MEDCNN inputs and the attention graph of multidimensional features extracted before and after applying the attention module.
Fig 4.
The results of MEDCNN prediction of independent datasets.
(a), (b), (c), (d) and (e) represent the predicted values of SN, SP, ACC, MCC and AUC for the three methylation types and the distribution of the results, respectively.The size of its contour represents the degree of concentration or clustering of the results. (f), (g) and (h) illustrate the prediction indexes for identifying methylation types of 5hmC, 4mC and 6mA by using independent datasets.
Table 2.
Predicting the mean value of different DNA methylation types.
Fig 5.
Results of identifying different feature codes for 5hmc/4mC/6mA methylation types in 17 datasets.
(a), (b) and (c) represent the comparison of ACC values of predicted methylation types of 5hmC,4mC and 6mA with different coding methods, respectively. (d), (e), (f) and (g) represent the SN, SP, ACC and AUC values of 17 datasets with different encoding methods combined with CNN to identify 5hmC/6mA/4mC sites, respectively.
Table 3.
The results of the Wilcoxon test for each feature code.
Fig 6.
Cross-species validation of 5 species of 6mA types.
The heatmaps (a), (b), (c) and (d) show the cross-species predicted SN, SP, ACC, and AUC values for the five species for which the 6mA methylation type was determined. Once a species has built a model on its training dataset, it was tested on data from other species. The horizontal coordinates are the different species as the training set and the vertical coordinates are as the testing set.
Table 4.
Information on each comparison model.
Fig 7.
Performance comparison between MEDCNN and other existing methods on 17 benchmark independent datasets.
(a) and (b) represent the comparison results of SN values for each model. (c) and (d) represent the comparison results of SP values for each model.
Table 5.
Statistics on the distribution of results for each comparison model in 17 datasets.
Fig 8.
Comparison of ACC and MCC values of the proposed MEDCNN and other existing models on 17 independent datasets.
Table 6.
The results of the Wilcoxon test for each model.