Fig 1.
(a) Human U2OS cells treated with dimethyl sulfoxide (DMSO) and stained using the Cell Painting assay, which employs six dyes in five channels to label eight cellular compartments. The top row (from left to right) shows mitochondrial staining; actin, Golgi, and plasma membrane staining; and nucleolar and cytoplasmic RNA staining. The bottom row (from left to right) displays endoplasmic reticulum staining, DNA staining, and a montage of all five channels (from Cimini et al. [21]). (b) Thousands of features are extracted from each segmented cell in microscopy images of wells. A learned function f(x) (CytoSummaryNet) aggregates this data into a single feature vector: the sample’s profile. (c) An in-depth look at the model architecture used in this study. The model consists of three elements: a function φ(x), which maps the input data from ℝD to ℝL space, a summation, which collapses the cell dimension, and ρ(z), which maps the collapsed representation from ℝN to ℝL space. (d) During training, replicate compound profiles are forced to attract each other (green arrows) and simultaneously repel every other compound (red arrows) in the learned feature space. Here, all forces are drawn for a single profile of compound B.
Fig 2.
Stratification of the cpg0001 dataset.
The dataset is divided into four subsets–Stain2, Stain3, Stain4, and Stain5 –each corresponding to a specific set of assay conditions designed to optimize Cell Painting. Stain2, Stain3, and Stain4 contain training, validation, and test plates, while Stain5 consists solely of test plates, serving as an out-of-distribution test set with experimental conditions entirely different from the other subsets. Within Stain2 and Stain3, each plate had slight variations in assay conditions, resulting in strong batch effects. Although addressing batch or experiment effects is not the primary focus of this study, test set plates were deliberately selected to represent the most divergent conditions within Stain2, Stain3, and Stain4, ensuring their out-of-distribution nature. The dissimilarity between the test plates and the training and validation data was used as the basis for selecting the test plates for Stain2, Stain3, and Stain4. Fig E in S1 Text elaborates on the method used to measure this similarity, Table C in S1 Text provides the plate names for each dataset in this stratification, and Fig H in S1 Text describes the training and validation compound split for all plates.
Fig 3.
CytoSummaryNet profiles generally outperform average-aggregated profiles for sensitively identifying replicates of a given sample, and partially generalize to unseen experimental protocols (test plates).
The box plots illustrate the mAP of replicate retrieval for all validation compounds of Stain3 (each data point is the average mAP of a plate) by CytoSummaryNet (dark green) and average (light green) profiles. Note that although the panels are labeled “training plates” and “validation plates”, all data shown comes from validation compounds and therefore none of it has been directly seen during training (see description of stratification in Results for further details). Welch’s t-tests were used to compare the means between CytoSummaryNet and average mAP scores on corresponding data; their p-values are indicated as stars at the top of each plot (ns = not significant).
Fig 4.
CytoSummaryNet profiles partially generalize to unseen compounds and do not generalize to out-of-distribution batch data (Stain 5).
The box plots illustrate the mAP of replicate retrieval for training and validation compounds (each data point is the average mAP of a plate) from the test plates of Stain2, Stain3, Stain4, and Stain5; CytoSummaryNet profiles performance in dark blue and dark green respectively, and average profiles performance in cyan and light green respectively. Note (i) the boxplots corresponding to validation compounds in the second panel (“test plates Stain3”) are the same as the boxplots in the third panel of Fig 3, and (ii) although the boxes are labeled “training compounds” and “validation compounds”, all data shown comes from test plates and therefore none of it has been seen during training (see description of stratification in Results for further details). Welch’s t-tests were used to compare the means between CytoSummaryNet and average mAP scores on corresponding data; their p-values are indicated as stars at the top of each plot (ns = not significant). The limited number of data points, due to averaging mAP scores per plate, may impact the statistical significance of the comparisons.
Fig 5.
CytoSummaryNet profiles generally outperform average-aggregated profiles in mechanism of action (MoA) retrieval, although not for out-of-distribution batch data (Stain5).
The box plots illustrate the average mAP of mechanism of action retrieval for CytoSummaryNet- (dark purple) and average- (light pink) aggregated profiles (each data point is the average mAP of a plate). Welch’s t-tests were used to compare the means between CytoSummaryNet and average mAP scores; their p-values are indicated as stars inside each plot (ns = not significant). The limited number of data points, due to averaging mAP scores per plate, may impact the statistical significance of the comparisons.
Table 1.
Top panel: Absolute and relative average improvements in mAP of mechanism of action retrieval between CytoSummaryNet- and average-aggregated profiles for the cpg0001 dataset.
The improvements are calculated as mAP(CytoSummaryNet)-mAP(average profiling). The percentage improvements are calculated as (mAP(CytoSummaryNet)-mAP(average profiling))/mAP(average profiling). Bottom panel: The same mAP improvements but for the cpg0004 dataset, for CytoSummaryNet models that aggregate single-cell information (single-cell) and that transform population average information (population average).
Fig 6.
CytoSummaryNet profiles make readily identifiable mechanisms of action even easier to find and allow for the discovery of previously unfindable mechanisms of action when using average-aggregated profiles (cpg0004 dataset).
Mean average precision (mAP) of average and CytoSummaryNet-based profiling for mechanism of action retrieval of 3.33 μM (test) and 10 μM (training/validation) dose point compound perturbations. We highlight certain high-performing mechanisms of action. Data points are scaled in size based on the number of compounds labeled with that mechanism of action (MoA).
Fig 7.
CytoSummaryNet profiles show a better ability to distinguish similar samples than average-aggregated profiles (cpg0001 dataset).
UMAP of the average- (left column) and CytoSummaryNet- (right column) aggregated profiles of the top 15 mechanisms of action, based on CytoSummaryNet’s mAP scores for mechanism of action retrieval, from all used cpg0001 Stain3 plates (Table C in S1 Text). The UMAP was created using n_neighbors = 15 and cosine similarity as a distance measure. The profiles are colored based on their corresponding annotated mechanism of action, compound, plate, and well position from top to bottom, respectively. Same mechanism of action profile clusters that were visible for multiple n_neighbors values are annotated in green ellipses, respectively. Note that Compound2 and Compound5 are intentionally anonymized.
Table 2.
Top five CellProfiler features based on their positive and negative Pearson correlation coefficient with the SA and CPA combined relevance score.
The scores were calculated for a single test plate of cpg0001 Stain3 (200922_015124-V).
Fig 8.
Five-channel combined microscope image of one of the four fields of view for a well in plate cpg0001 Stain2 BR00112197binned (human U2OS osteosarcoma cells).
The most relevant cells are annotated with green boxes and the least relevant cells are annotated with purple boxes. Three cells characteristic for low relevance scores are explicitly labeled with purple arrows. One cell characteristic for high relevance scores is explicitly labeled with a green arrow. The other three images from this well are shown in Fig F and G in S1 Text.