Fig 1.
Schematic example of Precision and Recall calculation.
Real source locations are represented by “X” marks. Red shading represents a hypothetical source reconstruction. (a) If an algorithm designates a large area as active, it is likely to find all sources, but the Precision of the reconstruction will be low. (b) If an algorithm is more conservative, Precision is high, but Recall is low as not all sources are found.
Fig 2.
An example of Precision (blue) and Recall (red) plotted against strength threshold, for the sLORETA algorithm and 2 simulated sources.
The picture of the cortex on the left indicates what imposing a low (5%) strength threshold on the CSD result looks like, with widespread activity. The cortex on the right indicates what imposing a high strength threshold (95%) looks like, with sparse activity.
Fig 3.
Precision vs. Recall of each of the source localization algorithms (MNLS, sLORETA, LORETA 1, 1.5 and 2) for (a) one, (b) two, (c) three and (d) four simulated sources.
Each point indicates the Precision and Recall for a different strength threshold. For one source (a) sLORETA simultaneously has Precision and Recall values of close to 1, indicating perfect performance, while other algorithms have lower peak Precision. For two sources (b) all algorithms reach peak Precision only when Recall has been reduced to about 0.5, and peak Recall is reached only when Precision has dropped below 0.3. For three sources (c) LORETA 1.5 has higher Recall for many Precision values than other algorithms, LORETA 1 has lower peak Precision, while all other algorithms follow a similar pattern, with high Precision that drops off steeply with increasing Recall. For four sources (d) while peak Precision is still high for most algorithms, Recall drops slightly.
Fig 4.
Area under the Precision vs. Recall (PR) curve for each of the algorithms tested vs. the number of sources in the simulation.
The area under the PR curve serves as a summary of the overall performance of the algorithm. Note that while sLORETA is clearly the best algorithm when there is only one source present, its performance drops off as the number of sources increases. The performance of all the other algorithms appears steady for 1–4 sources, as Precision increases slightly for these algorithms while Recall drops.
Table 1.
Results of Friedman test of statistical significance of differences in AUC.
Fig 5.
Precision vs. Recall curves for two sources with varying strength ratios: (a) two sources with the same strength (b) two sources, one with twice the strength as the other (c) two sources, one with three times the strength as the other (d) two sources, one with four times the strength as the other.
Note that while sLORETA has highest peak Precision for all strength ratios, Precision also drops off more steeply with increasing Recall for this algorithm. This pattern is emphasized as the strength ratio increases. All other algorithms have similar performance to each other, with slightly decreasing Precision and Recall as the strength ratio of the sources increases.
Fig 6.
Area under the Precision vs. Recall curve for each of the algorithms tested vs. strength ratio.
As a summary of the performance of the algorithm, the area under the curve decrease slightly for all algorithms as the strength ratio of the sources increases.