CpG Island Mapping by Epigenome Prediction
Figure 3
ROC Curves Comparing the Performance of Four Prediction Scores and Three Sequence Criteria against DNA Methylation and Promoter Activity
This figure compares the prediction performance of four CpG island scores that are based on epigenome prediction (upper legend box) and of three simple sequence criteria (lower legend box). In (A), (C), and (E), overlap with unmethylated regions is used for evaluation, and in (B), (D), and (F), overlap with experimentally determined transcription start sites (as an indicator of promoter activity) is used instead. All graphs plot the true positive rate against the false positive rate in the form of ROC curves [27]. The scales on top of the plots display the threshold values for the combined epigenetic score that correspond to the tradeoff between false positive rate and true positive rate at any one position. The thresholds for the combined epigenetic score are highlighted by triangles: 0.5 (balance between sensitivity and specificity), 0.33 (high sensitivity), and 0.67 (high specificity). Averaged across all six graphs, the ROC area under the curve performance measure (i.e., the percentage of the unit square that lies below the ROC curve [27]) amounts to the following values: predicted unmethylated score, 65.4%; predicted promoter activity score, 74.8%; open chromatin score, 72.2%; combined epigenetic score, 75.8%, GC content, 67.1%; CpG observed-to-expected score, 70.6%; and CpG island length, 75.5%.