Skip to main content

Advertisement

PLOS Computational Biology

Browse
Publish
- Submissions
- Policies
- Manuscript Review and Publication
About

Search Search

advanced search

< Back to Article

Fig 1 — Fig 1.

Cell-DINO: self-distillation with no labels, vision transformers, and datasets.
A) Illustration of the Cell-DINO algorithm, which trains two neural networks: a teacher and a student network. The self-supervised objective aims to match the features produced by the teacher (which observes random global views) with those produced by the student (which observes random global and local views). B) Illustration of the processing pipeline using vision transformer networks trained with DINO. The image of a cell is first tokenized as a sequence of local patches, which the transformer processes using self-attention layers. The output are image embeddings used for downstream tasks.

More »

Fig 2 — Fig 2.

Datasets used in this study.
Example images of the datasets used in our evaluation: the Human Protein Atlas Field-of-View (FoV) and Single-Cell (SC) datasets, and a collection of Cell Painting datasets.

More »

Fig 3 — Fig 3.

UMAP visualizations of image-based embeddings obtained with Cell-DINO
Left column (A,C): Image-based embeddings of the HPA-FoV dataset. Right column (B,D): Image-based embeddings of the Cell Painting datasets. Top row (A,B): unprocessed embeddings. Bottom row (C,D): transformed embeddings for downstream analysis. A) Points are fields of view, and the colors reveal the cell line of the sample. B) Points are single cells, and the colors reveal the source study where the cells come from: LINCS [48] and LUAD [9] are A549 cells, while CDRP [49], TAORF [50], and BBBC022 [51] are U2OS cells. Samples from the five studies were used for training. C) Points are embeddings of fields of view after harmonization [52]; colors are protein localization labels. D) Points are well-level aggregated features after batch correction with sphering [73] from the LINCS dataset. Colors are mechanisms of action; names with a are antagonists and names with i are inhibitors).

More »

Fig 4 — Fig 4.

Performance evaluation of Cell-DINO on the HPA dataset.
All reported results are the F1-scores of the corresponding classification task. There are two classification tasks: protein localization (PL/green shades) and cell-line (CL/blue shades) prediction. Each task is evaluated in two versions of the HPA dataset: field-of-view (dark colors) images and single-cell images (light colors). See legend in the bottom-right. Results are presented in a grid of bar-plots colored according to the task and dataset, and organized by pretraining data (columns) and model initialization (rows). Pre-training data goes from no supervision with any labels (left column), to medium supervision with cell line labels (center column), and strong supervision with protein localization labels (right column). The rows consider random initialization (top row), pre-trained DINOv2 models without self-supervision with cell images (center row), and initialization with DINOv2 pre-trained weights with additional self-supervision with cellular images (bottom row). The average score in the top-right corner of the figure identifies each method with a colored icon according to the ranking from best to worst performance. The results of Cell-DINO fine-tuned are reported in Table C in S1 Text.

More »

Fig 5 — Fig 5.

Evaluation of Cell-DINO in the HPA Kaggle competitions.
Left: whole image protein classification competition [24]. Right: weakly-supervised, single-cell protein classification competition [25]. Both plots: distribution of performance scores in the second stage evaluation of valid competition submissions. The horizontal axis is the official competition score (F1-score on the test set), and the vertical axis is the frequency of submissions obtaining the corresponding score. The dotted line is the cumulative distribution. Vertical lines represent models evaluated in this work as follows: ImageNet pre-trained DINO (red), a single supervised CNN from the top competitors (green), Cell-DINO (blue).

More »

Table 1 — Table 1.

Comparison to other self-supervised algorithms.
Top and middle: F1-scores obtained in the HPA-FoV and HPA-SC datasets, respectively, on the protein localization (PL) classification and cell line (CL) prediction tasks. Bottom: Results using Cell Painting images on the LINCS mechanism-of-action (MoA) prediction task using the Area Under the Precision-Recall Curve (AUPRC) metric [62], and on treatment classification using a subset of the JUMP-CP dataset (accuracy) [63].

More »

Table 2 — Table 2.

Comparison to other state-of-the-art pre-trained models.
Top and middle: F1-scores obtained in the HPA-FoV and HPA-SC datasets, respectively, on the protein localization (PL) classification and cell line (CL) prediction tasks. Bottom: Results using Cell Painting images on the LINCS mechanism-of-action prediction task (MoA AUPRC) [62].

More »

Fig 6 — Fig 6.

Low shot performance on HPA datasets.
Top row: results on the HPA-FoV dataset. Bottom row: results on the HPA-SC dataset. Left column: results of the protein localization classification task. Right column: results of the cell-line prediction task. All plots: the horizontal axis is the percent of labeled images used for training in logarithmic scale. The vertical axis is the F1-score of the corresponding classification task.

More »

Fig 7 — Fig 7.

Performance evaluation of Cell-DINO on the CPG datasets.
A) The evaluation task is mechanism of action (MoA) prediction, where chemical compounds are used to treat cells and Cell Painting images are collected to assess their effect and infer their MoA with computational models. B) Illustration of the steps of the computational workflow to predict MoA from Cell Painting images. The only step that changes in these experiments is feature extraction (highlighted in bold). C) Box plots comparing the distributions of performance for the evaluated feature extraction methods. Horizontal axis: evaluated methods organized by feature type. Vertical axis: performance score according to the area under the precision-recall curve of the multi-class MoA prediction problem [62]. Highlighted numbers in the boxes are the median values of the distribution.

More »

Table 3 — Table 3.

Nearest neighbors search evaluation.
F1-scores obtained with the HPA-FoV and HPA-SC datasets on the protein localization (PL) classification and cell-line (CL) prediction tasks. The SSL Type column reports the self-supervised principle used by the method, and the Supervision column indicates the type of labels used by the model during training.

More »

Fig 8 — Fig 8.

Image-based profiling of cellular morphology.
A) Single-cell heterogeneity analysis on the HPA-SC dataset. Only U2OS cells were selected for this analysis to measure morphological differences for three proteins EFHC2, PSTPIP2, and NUSAP, which have been annotated as exhibiting no heterogeneity, spatial heterogeneity, and intensity heterogeneity, respectively. The matrices display similarity of single cells from images labeled for the corresponding proteins using Cell-DINO embeddings. Color images in the bottom illustrate the composite multi-channel visualization, and green images represent the corresponding protein channel. The arrows point to cells that differ in their protein localization activity, explaining the heterogeneity patterns observed. Larger version available in Fig D in S1 Text) Similarity matrix of treatment-level profiles of aggregated Cell-DINO embeddings for 1,571 compounds in the LINCS dataset. Two compound clusters are highlighted with colored mechanism-of-action annotations on the right-hand side. In all matrices, dark blue indicates high similarity, and light blue or white indicates low similarity.

More »

Fig 9 — Fig 9.

Patch-level feature maps obtained with Cell-DINO.
A) Example images and feature map visualizations obtained using principal component analysis (PCA) on patch-level features after masking the background. The first three principal components (PCs) are mapped to the RGB color space. B) Matching patch-level features across images using cosine similarity. A query token is selected from the input image and the cosine similarity is calculated against all patches of reference images.

More »

Table 4 — Table 4.

Model Performance Comparison With and Without Batch Correction (B.C.) on the LINCS Cell Painting dataset.
The downstream task is MoA classification and the performance metric is PR-AUC. The reported value is the mean of 10 experiments.

More »

Table 5 — Table 5.

Ablation study on the impact (F1 scores) of augmentations during pretraining on HPA-FoV.
RB: random brightness, RC: random contrast, and RCD: random channel dropping.

More »

Table 6 — Table 6.

HPA-FoV Kaggle Public scores of protein localization.

More »

Publications
PLOS Aging and Health
PLOS Biology
PLOS Climate
PLOS Complex Systems
PLOS Computational Biology
PLOS Digital Health
PLOS Ecosystems
PLOS Genetics

PLOS Global Public Health
PLOS Medicine
PLOS Mental Health
PLOS Neglected Tropical Diseases
PLOS One
PLOS Pathogens
PLOS Sustainability and Transformation
PLOS Water

Home
Blogs
Collections
Give feedback
LOCKSS

Privacy Policy
Terms of Use
Advertise
Media Inquiries
Contact

PLOS is a nonprofit 501(c)(3) corporation, #C2354500, based in California, US