Predicting drug polypharmacology from cell morphology readouts using variational autoencoder latent space arithmetic

doi:10.1371/journal.pcbi.1009888

Fig 1.

Our variational autoencoder (VAE) implementation framework, applied to determining the phenotype of cells.

One application is to predict the phenotype of cells treated with compounds that have two mechanisms of actions (MOA), given the phenotype of cells treated with compounds that have each of those single MOAs (bottom right). A VAE encodes input data into a lower-dimensional latent space and then decodes the representation back into the original data dimensions. Our data contained measurements for 588 morphology features, each averaged for each population of cells treated with a given chemical compound. Following a sweep to select optimal hyperparameters (see Methods), we set our latent space dimension to 10 dimensions. The vanilla VAE learns by minimizing a reconstruction and KL-divergence loss. The other VAE variants we tested minimize loss functions that encourage disentangled features that promote interpretability, data simulation, and enable meaningful LSA.

More »

Expand

Fig 2.

Two-dimensional UMAP embeddings of original, reconstruction, and simulated data for Cell Painting level 5 consensus signatures in the test set.

We fit UMAP using only the original test set data and transformed the reconstructed and simulated data into this space. We simulated data by sampling from a unit Gaussian with the same dimensions as the latent space, using the same number of points as samples in the test set.

More »

Expand

Table 1.

Mean squared error (MSE) and earthmoving distance for VAE’s ability to reconstruct Cell Painting and L1000 profiles.

We compare these values with results derived from shuffled models. Earthmoving distance is calculated by taking the mean of the earthmoving distance of each sample. We add the 95% percentile range of earthmoving distance in parenthesis (0.05 lowest, 0.95 highest). Note that since our models required that we normalize Cell Painting and L1000 input data differently (see Methods), the metrics cannot be compared across data modalities.

More »

Expand

Fig 3.

Investigating the contribution of CellProfiler feature groups (by compartments and image channels) on individual MMD-VAE latent space features.

The dendrogram represents a hierarchical clustering algorithm applied to both rows and columns. Each color represents the mean contribution of each CellProfiler feature group to the given latent space feature normalized by column (see Methods for complete details).

More »

Expand

Fig 4.

Mean L2 distance (lower is better) between real and predicted profiles annotated with known polypharmacology (“A ∩ B”) mechanisms of action (MOAs) for three different VAE architectures, PCA, and original input space.

We show results for real and shuffled data across the two LINCS datasets. To enable a more meaningful and interpretable view, we zero-one normalized the L2 distances for each dataset. Each dot represents the mean L2 distance (values are normalized within each dataset) when LSA is performed using a specific model on a specific dataset.

More »

Expand

Fig 5.

Scatterplot of L1000 vs Cell Painting MOA performance for MMD-VAE, with outliers (>3 stds) excluded.

Performance is determined by the test scores between the L2 distance between predicted and real profile and the distribution of L2 distances from shuffling MOAs 10 times. Top—all “A ∩ B” combinations; bottom—only the top 5 “A ∩ B” with labels.

More »

Expand

Table 2.

Hyperparameter combination of the top performing models for each dataset.

More »

Expand