Fig 1.
Actual and predicted labels generate four outcomes of the confusion matrix.
(A) The left oval shows two actual labels: positives (P; blue; top half) and negatives (N; red; bottom half). The right oval shows two predicted labels: “predicted as positive” (light green; top left half) and “predicted as negative” (orange; bottom right half). A black line represents a classifier that separates the data into “predicted as positive” indicated by the upward arrow “P” and “predicted as negative” indicated by the downward arrow “N”. (B) Combining two actual and two predicted labels produces four outcomes: True positive (TP; green), False negative (FN; purple), False positive (FP; yellow), and True negative (TN; red). (C) Two ovals show examples of TPs, FPs, TNs, and FNs for balanced (left) and imbalanced (right) data. Both examples use 20 data instances including 10 positives and 10 negatives for the balanced, and 5 positives and 15 negatives for the imbalanced example.
Table 1.
Basic evaluation measures from the confusion matrix.
Fig 2.
PRC curves have one-to-one relationships with ROC curves.
(A) The ROC space contains one basic ROC curve and points (black) as well as four alternative curves and points; tied lower bound (green), tied upper bound (dark yellow), convex hull (light blue), and default values for missing prediction data (magenta). The numbers next to the ROC points indicate the ranks of the scores to calculate FPRs and TPRs from 10 positives and 10 negatives (See Table A in S1 File for the actual scores). (B) The PRC space contains the PR points corresponding to those in the ROC space.
Table 2.
Example of basic evaluation measures on a balanced and on an imbalanced dataset.
Table 3.
Score distributions of positives and negatives for the performance simulations.
Fig 3.
Combinations of positive and negative score distributions generate five different levels for the simulation analysis.
We randomly sampled 250 negatives and 250 positives for Rand, ER-, ER+, Excel, and Perf, followed by converting the scores to the ranks from 1 to 500. Red circles represent 250 negatives, whereas green triangles represent 250 positives.
Fig 4.
Simple scheme diagrams on the generation of datasets T1 and T2.
T1 contains miRNA genes from miRBase as positives. Negatives were generated by randomly shuffling the nucleotides of the positives. For T2, the RNAz tool was used to generate miRNA gene candidates. Positives are candidate genes that overlap with the actual miRNA genes from miRBase.
Fig 5.
PRC is changed but the other plots are unchanged between balanced and imbalanced data.
Each panel contains two plots with balanced (left) and imbalanced (right) for (A) ROC, (B) CROC with exponential function: f(x) = (1 - exp(-αx))/(1 - exp(-α)) where α = 7, (C) CC, and (D) PRC. Five curves represent five different performance levels: Random (Rand; red), Poor early retrieval (ER-; blue), Good early retrieval (ER+; green), Excellent (Excel; purple), and Perfect (Perf; orange).
Fig 6.
Two PubMed search results show the annual number of papers found between 2002 and 2012.
The upper barplot shows the number of papers found by the term “ROC”, whereas the lower plot shows the number found by the term “((Support Vector Machine) AND Genome-wide) NOT Association”.
Table 4.
Literature analysis summarized by three main categories and six subcategories.
Fig 7.
A re-analysis of the MiRFinder study reveals that PRC is stronger than ROC on imbalanced data.
ROC and PRC plots show the performances of six different tools, MiRFinder (red), miPred (blue), RNAmicro (green), ProMiR (purple), and RNAfold (orange). A gray solid line represents a baseline. The re-analysis used two independent test sets, T1 and T2. The four plots are for (A) ROC on T1, (B) PRC on T1, (C) ROC on T2, and (D) PRC on T2.
Table 5.
AUC scores of ROC and PRC for T1 and T2.