Skip to main content
Advertisement

< Back to Article

Capturing cell heterogeneity in representations of cell populations for image-based profiling using contrastive learning

Fig 2

Stratification of the cpg0001 dataset.

The dataset is divided into four subsets–Stain2, Stain3, Stain4, and Stain5 –each corresponding to a specific set of assay conditions designed to optimize Cell Painting. Stain2, Stain3, and Stain4 contain training, validation, and test plates, while Stain5 consists solely of test plates, serving as an out-of-distribution test set with experimental conditions entirely different from the other subsets. Within Stain2 and Stain3, each plate had slight variations in assay conditions, resulting in strong batch effects. Although addressing batch or experiment effects is not the primary focus of this study, test set plates were deliberately selected to represent the most divergent conditions within Stain2, Stain3, and Stain4, ensuring their out-of-distribution nature. The dissimilarity between the test plates and the training and validation data was used as the basis for selecting the test plates for Stain2, Stain3, and Stain4. Fig E in S1 Text elaborates on the method used to measure this similarity, Table C in S1 Text provides the plate names for each dataset in this stratification, and Fig H in S1 Text describes the training and validation compound split for all plates.

Fig 2

doi: https://doi.org/10.1371/journal.pcbi.1012547.g002