Integrating unsupervised and reinforcement learning in human categorical perception: A computational model

Categorical perception identifies a tuning of human perceptual systems that can occur during the execution of a categorisation task. Despite the fact that experimental studies and computational models suggest that this tuning is influenced by task-independent effects (e.g., based on Hebbian and unsupervised learning, UL) and task-dependent effects (e.g., based on reward signals and reinforcement learning, RL), no model studies the UL/RL interaction during the emergence of categorical perception. Here we have investigated the effects of this interaction, proposing a system-level neuro-inspired computational architecture in which a perceptual component integrates UL and RL processes. The model has been tested with a categorisation task and the results show that a balanced mix of unsupervised and reinforcement learning leads to the emergence of a suitable categorical perception and the best performance in the task. Indeed, an excessive unsupervised learning contribution tends to not identify task-relevant features while an excessive reinforcement learning contribution tends to initially learn slowly and then to reach sub-optimal performance. These results are consistent with the experimental evidence regarding categorical activations of extrastriate cortices in healthy conditions. Finally, the results produced by the two extreme cases of our model can explain the existence of several factors that may lead to sensory alterations in autistic people.

1 Methods: further details of the model architecture, 1 learning and simulations 2 Figure S1 shows a detailed computational schema of the architecture components.
3 Figure S2 shows a graphical representation of the restricted Boltzmann Machine 4 training executed with the Contrastive Divergence algorithm [1].  March 4, 2022 2/11 neurons of top hidden layer and the reward contribution into the learning process). We 8 randomly changed these parameters, keeping the others fixed.   learning is initially facilitated with respect to models with a higher RL contribution as 22 the reward is initially erratic. Moreover, for all three category tasks the reward achieves 23 a maximum final performance for the L2 models having a balanced level of UL and RL. 24 Indeed, these models outperform the models with absent or very low RL (L0 and L1) 25 because these employ a lot of computational resources for non-task specific features; 26 moreover they outperform the models with very high or extreme RL (L3 and L4) 27 because these tend to incur in local minima.

28
March 4, 2022 3/11 Learning curves of models: colour category, low computational resources Figure S3. Reward per epoch in the task task involving the colour category and low computational resources, of the five models involving different UL/RL levels, averaged over the models using a given level. Shaded areas represent the curves standard deviations.
Learning curves of models: shape category, low computational resources Figure S4. Reward per epoch in the task task involving the shape category and low computational resources, of the five models involving different UL/RL levels, averaged over the models using a given level. Shaded areas represent the curves standard deviations.
March 4, 2022 4/11 Learning curves of models: size category, low computational resources Figure S5. Reward per epoch in the task task involving the size category and low computational resources, of the five models involving different UL/RL levels, averaged over the models using a given level. Shaded areas represent the curves standard deviations.
Learning curves of models: colour category, high computational resources Figure S6. Reward per epoch in the task task involving the colour category and high computational resources, of the five models involving different UL/RL levels, averaged over the models using a given level. Shaded areas represent the curves standard deviations.
March 4, 2022 5/11 Learning curves of models: shape category, high computational resources Figure S7. Reward per epoch in the task task involving the shape category and high computational resources, of the five models involving different UL/RL levels, averaged over the models using a given level. Shaded areas represent the curves standard deviations.
Learning curves of models: size category, high computational resources Figure S8. Reward per epoch in the task task involving the size category and high computational resources, of the five models involving different UL/RL levels, averaged over the models using a given level. Shaded areas represent the curves standard deviations.
Sorting rule: Colour, Computational Resources: Low

Reconstruction error and information stored 33
In this section we explain why the reconstruction errors of the DBN reported in the 34 main text can be considered a measure of the information on the input patterns    input and the consequent hidden layer activation [2,3]. This property makes these the main text had an n = 2 corresponding to the first two principal components. This 56 value is acceptable because it is almost always higher than the median cumulative 57 explained variance and at the same time allowed us to plot the components of the 58 reconstructed images. An interesting feature that emerges from the values is that with a 59 higher value of RL the 'elbow' of the curves represented by the numbers reported in the 60 tables become sharper. This is in line with the fact that with a higher RL contribution 61 the images tend to be increasingly clustered into groups corresponding to the actions to 62 be returned while the task-irrelevant features are discarded, thus needing less 63 components to be represented.

64
Tables S11, S12, S13 show the silhouette values of the k-means algorithm