Fig 1.
An example of how AdDReSS improves embedding by incorporating AL.
(a) The original embedding representation generated by SSDR. (b) A support vector machine classifier is used as an active learner. (c) samples within the low dimensional embedding found to be difficult to classify are selected as candidates for training. (d) SSDR trained on the labels queried by AL provide greater separation of object classes in the low dimensional embedding.
Fig 2.
(a) 3D Swiss Roll with all labels revealed. (b) 3D Swiss Roll with initial labels ℓ(Str) revealed. (c) Initial 2D embedding with labels. (d) Initial 2D embedding with initial labels ℓ(Str). (e) Ambiguous samples (in blue) are determined via active learning. (f) Region of the Swiss Roll at the class boundary (region is shown as a box in (e)). Note the selection of ambiguous samples (in blue) at the boundary between the two classes (in red and green). (g) Subsequent 2D embedding incorporating newly queried labels from the ambiguous samples. (h) Region near the class boundaries (shown as a box from (g)) revealing the increased separation between the two classes (in red and green) following application of the AdDReSS scheme.
Table 1.
Datasets used for evaluation.
Fig 3.
Selection of mitotic and non-mitotic nuclei from the MITOS2012 dataset.
A nuclei candidate detection algorithm is used and patches centered at each candidate centroid are extracted.
Table 2.
Strategies compared in this work.
Table 3.
Summary of Evaluation Measures.
Fig 4.
Evaluation of Classification Accuracy.
Number of instances for which labels were revealed versus mean ϕAcc for AdDReSS, SSAGE, GE, and the maximum empirically derived ϕAcc across all runs is shown for (a) , (b)
, (c)
and (d)
. Standard deviation of ϕAcc shown as error bounds at each l.
Fig 5.
Evaluation of Silhouette Index.
Number of instances for which labels were revealed versus mean ϕSI for AdDReSS, SSAGE, GE, and the maximum empirically derived ϕSI across all runs is shown for (a) , (b)
, (c)
and (d)
. Standard deviation in ϕSI shown as error bounds at each l.
Fig 6.
Evaluation of Variance for Classification Accuracy.
Variance of ϕAcc at selected numbers of instances for which labels were revealed for AdDReSS, SSAGE, GE are shown for (a) , (b)
, (c)
, and (d)
.
Fig 7.
Evaluation of Variance for Silhouette Index.
Variance of ϕSI at selected numbers of instances for which labels were revealed for AdDReSS, SSAGE, GE are shown for (a) , (b)
, (c)
, and (d)
. GE shows zero variance as labeled information does not affect the embedding for GE.
Fig 8.
Evaluation of Raghavan Efficiency.
ϕEff for k ∈ {2, 3} shows the comparative efficiency between AdDReSS and GE, SSAGE and GE, and AdDReSS and SSAGE for (a) , (b)
, (c)
, and (d)
.
Table 4.
Percent improvement in Raghavan efficiency via AdDReSS over SSAGE.
Fig 9.
Evaluation of Maximum Information Gain.
ϕMIG shows areas of maximum information gain (shown as a dashed black line) in terms of the difference in ϕAcc between AdDReSS and SSAGE for (a) , (b)
, (c)
, and (d)
.
Fig 10.
Evaluation of Maximum Query Efficiency.
ϕMQE describes the maximum efficiency in terms of queried labels given the same ϕAcc (shown as a dashed black line) between AdDReSS and SSAGE for (a) , (b)
, (c)
, and (d)
.
Fig 11.
Illustration describing Raghavan efficiency.
A refers to the area between the Active Learning curve and the empirically-derived maximum accuracy, and B refers to the area between the Random Sampling curve and the Active Learning curve.