Deep multiple instance learning versus conventional deep single instance learning for interpretable oral cancer detection

doi:10.1371/journal.pone.0302169

Fig 1.

Overview of the methodology.

(a) Workflow for acquisition of cytology images and (b) schematic diagrams of MIL and SIL approaches. Both approaches are trained end-to-end and provide bag (patient) category and interpretability at the instance (cell) level. We consider the MIL approach with the attention mechanism for obtaining instance weights to interpret the provided bag-level decision. The SIL approach provides instance scores, according to which we find the bag category.

More »

Expand

Fig 2.

Examples of images from datasets.

Top: OC; Bottom: PAP-QMNIST. Images of the digit “4” constitute key instances in the PAP-QMNIST dataset and appear only in positive bags (an example is the third image in (c)), whereas images of other digits appear in both positive and negative bags.

More »

Expand

Fig 3.

Used beta distributions, defining the fraction of key instances in positive PAP-QMNIST bags.

More »

Expand

Fig 4.

Test set performance for ABMIL with sampling and SIL on PAP-QMNIST.

Top: Accuracy at the bag level. Bottom: Precision at the instance level, Precision@K_i, for instances with the K_i top attention weights/prediction scores, where K_i is the number of key instances in bag i. Three mini-bag sizes, 500, 1200, and 2500, used for ABMIL with sampling are indicated on the x-axis. Larger values of accuracy and Precision@K_i are better. The box plots display minimum, first quartile, median, third quartile, and maximum (the five-number summary) over 9 folds.

More »

Expand

Table 1.

Accuracy at the bag level of the ABMIL with sampling and SIL methods on PAP-QMNIST test data.

Three mini-bag sizes, 500, 1200, and 2500, used for ABMIL with sampling are indicated in the column headings. The mean and standard deviation of the metric are computed over 9 folds. Bold face indicates the best performance for each dataset.

More »

Expand

Table 2.

Precision@K_i at instance level of the ABMIL with sampling and SIL methods on PAP-QMNIST test data.

Three mini-bag sizes, 500, 1200, and 2500, used for ABMIL with sampling are indicated in the column headings. The mean and standard deviation of the metric are computed over 9 folds. Bold face indicates the best performance for each dataset.

More »

Expand

Fig 5.

Percentage of instances identified as positive by SIL approach on PAP-QMNIST dataset.

Instances identified as positive by the SIL approach have softmax score>0.5. Percentage is shown per each test bag for versions of the PAP-QMNIST dataset with different key instance ratios. Standard deviation is indicated by error bars and calculated for the 3 different folds in which each bag appears. Bags 0–11 are negative, and bags 12–23 are positive.

More »

Expand

Fig 6.

Examples of instances detected by ABMIL with sampling and SIL approaches in the PAP-QMNIST dataset.

The top 25 instances with the highest attention weights/prediction scores for each of the four test bags with positive labels are shown, detected as positive in one of the folds of the PAP-QMNIST dataset. The percent of key instances per bag in the datasets is indicated under each subfigure. Red polygons delineate wrongly identified key instances—digits other than “4”. (a)-(e) ABMIL with sampling based on ResNet18 (similar results observed for other architectures) and with mini-bag size of 500 instances. (f)-(k) SIL based on ResNet18.

More »

Expand

Fig 7.

Accuracy at the bag level of the ABMIL with sampling and SIL methods on the test set of OC Dataset.

Three mini-bag sizes, 500, 1200, and 2500, used for ABMIL with sampling are indicated on the x-axis. The box plots display minimum, first quartile, median, third quartile, and maximum (the five-number summary) over 9 folds.

More »

Expand

Fig 8.

Percentage of instances (cells) identified as positive by SIL approach on the OC Dataset.

Instances identified as positive by the SIL approach have softmax score>0.5. Percentage is shown per each test bag (patient). Error bars indicate the standard deviation over the 3 different folds in which each bag appears. Bags 0–11 (blue) are negative, and the rest (yellow) are positive.

More »

Expand

Table 3.

Accuracy at the bag level of the ABMIL with sampling and SIL methods on the test set of OC Dataset.

The mean and standard deviation of the metric are computed over 9 folds. Bold face indicates the best performance.

More »

Expand

Fig 9.

Examples of instances detected by ABMIL with sampling and SIL approaches in the OC Dataset.

The top 36 instances with the highest attention weights/prediction scores on the test set of the OC Dataset are shown, together with the corresponding annotations by cytotechnologist.

More »

Expand

Table 4.

Qualitative evaluation of ABMIL with sampling and SIL methods on OC Dataset.

Agreement of methods’ prediction with judgments made by cytotechnologist on the top-ranked instances. See text for details.

More »

Expand