Fig 1.
Formal framework adopted in this study, based on rate-distortion theory (RDT) [31].
( A) Trade-off between rate (or available resources) and distortion. ( B) The goal of RDT is to find the minimum distortion function given a constraint on the available resources. The loss of a variational autoencoder has the same shape as the Lagrangian of this minimization problem. ( C) Architecture of the network used in experiments: it is a classical
-VAE that can be optionally augmented with n classifiers. In most of our experiments n is equal 0 or 1. The classifiers can be both linear or non-linear. The loss of the classifiers is added to the loss of the
-VAE as described in panel B. ( D) Stimuli with higher utility are encoded more faithfully. Utility of a stimulus is related to its likelihood or to its relevance with respect to a task. This panel shows that, under strong resource constraints (small rate), stimuli with small probability of occurrence are ignored. ( E) This panel shows how faithfully a stimulus is reconstructed under different rate and relevance conditions: at small rates, two stimuli with small relevance are collapsed into the same representation while details about stimuli with high relevance are still preserved even at small rates.
Fig 2.
The figure shows 21 example images in the “corridors” dataset used for this study. Each figure comprises two white corridors, placed at the upper and at the lower parts of the image, with a white horizontal line that is common to all the images. Each corridor is a noisy vertical line of white pixels, whose true center is in one of 13 x positions, coded from 0 (left) to 12 (right). For example, in the first image, the upper corridor is in the position xUC = 0 and the lower corridor is in the position xLC = 12.. The corridor positions are reported on top of each image.
Fig 3.
Reconstruction loss, all the models.
For ease of reading, a brief description of each model along with its label is reported in the legend. ( A) Experiment 1. The figure illustrates that increasing capacity reduces reconstruction loss. The trend is similar for the baseline -VAE model that is trained with a balanced dataset and for the two models (E1M1–E1M2) that are trained with unbalanced datasets. ( B) Experiment 2. The figure illustrates that at any given capacity, the baseline model has a smaller reconstruction loss compared to the hybrid models that are additionally trained to solve classification tasks (E2M1–E2M5). Furthermore, at any given capacity, reconstruction loss changes across the different tasks and is worst for the E2M5 model, which addresses four classification tasks simultaneously. See the main text for explanation.
Fig 4.
Comparison of the latent representations of the baseline model and the model E1M1.
The E1M1 model is trained on an unbalanced dataset, in which images with (orange) are 10 times more frequent than images with xLC>6 (green). ( A, F) 2D projections of the 5D embeddings learned by the baseline model at high (
nats) and low (
nats) capacity, respectively. ( C, H) 2D projections of the 5D embeddings learned by the hybrid E1M1 model trained at high (
nats) and low (
nats) capacity, respectively. ( B, G) Activation patterns of the 5 latent channels of the baseline model at high and low capacity, respectively. Each of the five heat-maps is computed as in Section 4.4. ( D, I) Activation patterns of the 5 latent channels of the E1M1 model at high and low capacity, respectively. Each heat-map is computed as in Section 4.4. ( E, J) Measure of distortions in the latent representations of the hybrid E1M1 model compared to the baseline model at high and low capacity, respectively. ( K) Measure of distortions in the latent representation of the baseline model induced by the reduction of the encoding capacity from high to low. ( L) Measure of distortions in the latent representation of the E1M1 model induced by the reduction of the encoding capacity from high to low. Distortion matrices are computed as described in Section 4.3.
Fig 5.
Comparison of the latent representations of the baseline model and the hybrid model E1M2.
The E1M2 model is trained on an unbalanced dataset in which images with (orange) are 10 times more frequent than images with
(green). ( (A, F) 2D projections of the 5D embeddings learned by the baseline model at high (
nats) and low (
nats) capacity, respectively. ( C, H) 2D projections of the 5D embeddings learned by the hybrid E1M2 model trained at high (
nats) and low (
nats) capacity, respectively. ( B, G) Activation patterns of the 5 latent channels of the baseline model at high and low capacity, respectively. Each of the five heat-maps is computed as in Section 4.4. ( D, I) Activation patterns of the 5 latent channels of the E1M2 model at high and low capacity, respectively. Each heat-map is computed as in Section 4.4. (E, J) Measure of distortions in the latent representations of the hybrid E1M2 model compared to the baseline model at high and low capacity, respectively. ( K) Measure of distortions in the latent representation of the baseline model induced by the reduction of the encoding capacity from high to low. ( L) Measure of distortions in the latent representation of the E1M2 model induced by the reduction of the encoding capacity from high to low. Distortion matrices are computed as described in Section 4.3.
Fig 6.
For ease of reading, a brief description of each model along with its label is reported in the legend. ( A) We first trained the baseline -VAE model. Its frozen latent embeddings are then used to train separately 4 linear classifiers, one per task. This permits solving all tasks, except one. ( B) We trained a
-VAE and classifier jointly and we repeated it for each task. In this case the classifier can modify the latent representation of the
-VAE. In fact, with respect to case A, the new representation is optimized for the given task and as result the network is able to solve the task1 and learn faster and better in the other tasks. ( C) We trained only one
-VAE jointly with 4 classifiers (one per task). Such network is able to solve all the task simultaneously. ( D) In this plot we asses the overspecialization of the networks trained in B. Consider the network trained on task 0. We use its latent representation (specialized for task0) to train 3 classifiers on the other tasks. What we can see is that task1 can never be solved using the representation learnt from an other task while this is not true for task 0, 2 and 3 (nevertheless, performance is worse than case B).
Fig 7.
Comparison of the latent representations of the baseline model and the hybrid model E2M1.
The E2M1 model is assigned a binary classification task, in which images with are labeled as 1, 0 otherwise. ( A, F) 2D projections of the 5D embeddings learned by the baseline model at high (
nats) and low (
nats) capacity, respectively. ( C, H) 2D projections of the 5D embeddings learned by the hybrid E2M1 model trained at high (
nats) and low (
nats) capacity, respectively. ( B, G) Activation patterns of the 5 latent channels of the baseline model at high and low capacity, respectively. Each of the five heat-maps is computed as in Section 4.4. ( D, I) Activation patterns of the 5 latent channels of the E2M1 model at high and low capacity, respectively. Each heat-map is computed as in Section 4.4. ( E, J) Measure of distortions in the latent representations of the hybrid E2M1 model compared to the baseline model at high and low capacity, respectively. ( K) Measure of distortions in the latent representation of the baseline model induced by the reduction of the encoding capacity from high to low. ( L) Measure of distortions in the latent representation of the E2M1 model induced by the reduction of the encoding capacity from high to low. Distortion matrices are computed as described in Section 4.3.
Fig 8.
Comparison of the latent representations of the baseline model and the hybrid model E2M2.
The E2M2 model is assigned a binary classification task, in which images with xU<6 (regardless of the value of xL) OR AND
are labeled as 1, 0 otherwise. ( A, F) 2D projections of the 5D embeddings learned by the baseline model at high (
nats) and low (
nats) capacity, respectively. ( C, H) 2D projections of the 5D embeddings learned by the hybrid E2M2 model trained at high (
nats) and low (
nats) capacity, respectively. ( B, G) Activation patterns of the 5 latent channels of the baseline model at high and low capacity, respectively. Each of the five heat-maps is computed as in Section 4.4. ( D, I) Activation patterns of the 5 latent channels of the E2M2 model at high and low capacity, respectively. Each heat-map is computed as in Section 4.4. ( E, J) Measure of distortions in the latent representations of the hybrid E2M2 model compared to the baseline model at high and low capacity, respectively. ( K) Measure of distortions in the latent representation of the baseline model induced by the reduction of the encoding capacity from high to low. ( L) Measure of distortions in the latent representation of the E2M2 model induced by the reduction of the encoding capacity from high to low. Distortion matrices are computed as described in Section 4.3.
Fig 9.
Comparison of the latent representations of the baseline model and the hybrid model E2M3.
The E2M3 model is assigned a binary classification task, in which images with are labeled as 1, 0 otherwise. ( A, F) 2D projections of the 5D embeddings learned by the baseline model at high (
nats) and low (
nats) capacity, respectively. ( C, H) 2D projections of the 5D embeddings learned by the hybrid E2M3 model trained at high (
nats) and low (
nats) capacity, respectively. ( B, G) Activation patterns of the 5 latent channels of the baseline model at high and low capacity, respectively. Each of the five heat-maps is computed as in Section 4.4. ( D, I) Activation patterns of the 5 latent channels of the E2M3 model at high and low capacity, respectively. Each heat-map is computed as in Section 4.4. ( E, J) Measure of distortions in the latent representations of the hybrid E2M3 model compared to the baseline model at high and low capacity, respectively. ( K) Measure of distortions in the latent representation of the baseline model induced by the reduction of the encoding capacity from high to low. ( L) Measure of distortions in the latent representation of the E2M3 model induced by the reduction of the encoding capacity from high to low. Distortion matrices are computed as described in Section 4.3.
Fig 10.
Comparison of the latent representations of the baseline model and the hybrid model E2M4.
The E2M4 model is assigned a multiclass classification task. The possible positions of each corridor are grouped into 5 bins (as explained in Section 2.4) resulting in a partition of input images into 25 possible classes. ( A, F) 2D projections of the 5D embeddings learned by the baseline model at high ( nats) and low (
nats) capacity, respectively. ( C, H) 2D projections of the 5D embeddings learned by the hybrid E2M4 model trained at high (
nats) and low (
nats) capacity, respectively. ( B, G) Activation patterns of the 5 latent channels of the baseline model at high and low capacity, respectively. Each of the five heat-maps is computed as in Section 4.4. ( D, I) Activation patterns of the 5 latent channels of the E2M4 model at high and low capacity, respectively. Each heat-map is computed as in Section 4.4. ( E, J) Measure of distortions in the latent representations of the hybrid E2M4 model compared to the baseline model at high and low capacity, respectively. ( K) Measure of distortions in the latent representation of the baseline model induced by the reduction of the encoding capacity from high to low. ( L) Measure of distortions in the latent representation of the E2M4 model induced by the reduction of the encoding capacity from high to low. Distortion matrices are computed as described in Section 4.3.
Fig 11.
Comparison of the latent representations of the baseline model and the hybrid model E2M5.
The E2M5 model is required to solve all the previous tasks simultaneously. ( A, F) 2D projections of the 5D embeddings learned by the baseline model at high ( nats) and low (
nats) capacity, respectively. ( C, H) 2D projections of the 5D embeddings learned by the hybrid E2M5 model trained at high (
nats) and low (
nats) capacity, respectively. ( B, G) Activation patterns of the 5 latent channels of the baseline model at high and low capacity, respectively. Each of the five heat-maps is computed as in Section 4.4. ( D, I) Activation patterns of the 5 latent channels of the E2M5 model at high and low capacity, respectively. Each heat-map is computed as in Section 4.4. ( E, J) Measure of distortions in the latent representations of the hybrid E2M5 model compared to the baseline model at high and low capacity, respectively. ( K) Measure of distortions in the latent representation of the baseline model induced by the reduction of the encoding capacity from high to low. ( L) Measure of distortions in the latent representation of the E2M5 model induced by the reduction of the encoding capacity from high to low. Distortion matrices are computed as described in Section 4.3.