The geometry of efficient codes: How rate-distortion trade-offs distort the latent representations of generative models
Fig 6
For ease of reading, a brief description of each model along with its label is reported in the legend. ( A) We first trained the baseline -VAE model. Its frozen latent embeddings are then used to train separately 4 linear classifiers, one per task. This permits solving all tasks, except one. ( B) We trained a
-VAE and classifier jointly and we repeated it for each task. In this case the classifier can modify the latent representation of the
-VAE. In fact, with respect to case A, the new representation is optimized for the given task and as result the network is able to solve the task1 and learn faster and better in the other tasks. ( C) We trained only one
-VAE jointly with 4 classifiers (one per task). Such network is able to solve all the task simultaneously. ( D) In this plot we asses the overspecialization of the networks trained in B. Consider the network trained on task 0. We use its latent representation (specialized for task0) to train 3 classifiers on the other tasks. What we can see is that task1 can never be solved using the representation learnt from an other task while this is not true for task 0, 2 and 3 (nevertheless, performance is worse than case B).