Open-source framework for detecting bias and overfitting for large pathology images

doi:10.1371/journal.pone.0341715

Fig 1.

Hidden artifacts in WSIs:

WSI from TCGA (left) and a modified version (right) contrast enhanced with equalized histograms and stripped of colors. This reveals scanner stripes and air bubbles, not usually visible to humans.

More »

Expand

Fig 2.

Pipeline overview.

(a) Training a model with contrastive loss: the model learns by minimizing the distance between samples that have similar features while maximizing the distance to other samples. (b) Process of extracting a embeddings from a trained model (inference). The output is an array of floating-point numbers that represent the underlying features captured by the model. (c) Feature-inspect pipeline. For a set of embeddings, the framework can be inspected using UMAPs, linear probing or both, using LP as a scoring function for UMAPS.

More »

Expand

Fig 3.

UMAP web user interface: The top-left drop-down menu allows the user to select a variable to cluster by.

Below are two common UMAP parameters that the users can drag and select to see different UMAPs below. The plots have common interaction tools such as zooming and selecting regions. To the right, there is a list to hide or show specific classes. The left UMAP plot shows a UMAP from embeddings, and the right shows data from the same tiles, but only using the raw pixels.

More »

Expand

Fig 4.

Linear probing: Embeddings from inference are labeled and then trained with a linear layer.

The linear layer has maps from n to m neurons, where n is the number of features from the model, and m is the number of classes to predict. The result will be the output neuron with the highest score, which can be used to measure prediction accuracy.

More »

Expand

Table 1.

Data splits used for training on TCGA.

Slides from one patient can only occur in one split. This can result in a different number of tiles for each train/validation/test stratification. The numbers are therefore averaged from three runs. n refers to the number of tiles. n₅ is the number of tiles after balancing (taking an equal number of samples) from the top 5 TSS. is the number of tiles from n₅, after an equal number of tiles per tumor stage (I, II, and III).

More »

Expand

Fig 5.

Sequential checkpointing: In a regular computation (above), information from the forward pass of the model is stored in memory.

To do a backwards pass, this information is passed on for calculations. For sequential checkpointing (lower part), not all information is kept in memory. Therefore, to get the information needed for the gray node, the latest checkpoint is found (red), and information from that checkpoint is recalculated (blue nodes).

More »

Expand

Table 2.

Linear probe accuracy scores with different numbers of samples from the dataset.

More »

Expand

Fig 6.

UMAP embeddings from Inception-V4 (left) and Phikon-v2 (right).

The top row uses 10 000 randomly sampled tiles from the top five TSSs. The bottom row uses 70 000 points. The left and the right sides use the same points for UMAP clustering. Each colored dots represent the following TSSs: • 22 • 39 • 60 • 66 and • 85.

More »

Expand

Table 3.

VRAM usage during training of MoCo v1, with sequential checkpointing (sc) and without.

For a batch size of 128 without sequential checkpointing, only the L40S had enough memory to run. For 256, both setups were out-of-memory (OOM).

More »

Expand

Table 4.

Time usage (seconds) for 10 epochs of MoCo-V1 training.

‘sc’ = sequential checkpointing; ‘no-sc’ = standard run. Percent differences are computed (sc – no-sc)/no-sc: negative values indicate that checkpointing reduced runtime. Blanks / N/A indicate runs that were not completed under the given configuration.

More »

Expand

Fig 7.

Speed & VRAM comparison UMAP on CPU vs GPU and VRAM usage for 1000, 10 000 and 100 000 points.

While the speedup for 100 000 points is better on the GPU, it also requires more VRAM.

More »

Expand