Fig 1.
Stylized image of the four-layered VisNet neural network architecture. The architecture of the network shows a competitive, hierarchical organization broadly reflecting that observed in the primate dorsal visual system. Convergence through the network is designed to provide fourth-layer neurons with information from across the entire input retina. Layer 4 corresponds to the output layer, in which hand-centred visual neurons develop.
Table 1.
Table shows number of neurons per layer, number of synapses to preceding layer and size of receptive field from which the connections are received.
Table 2.
Lateral inhibition parameters.
Table 3.
The sigmoid parameters used to control the global inhibition within each layer of the model.
Fig 2.
Basic principle of CT learning illustrated in a simplified network with a single layer of bottom-up synaptic connections between two layers. This illustration provides an idealized example how presenting the same hand-object configuration across spatially overlapping retinal locations drives neurons in the output layer to respond to this configuration for two similar retinal locations (leftmost and middle illustration). This process can be continued for subsequent shifts, provided that a sufficient proportion of input cells stay active between individual shifts (rightmost illustration). Figure adapted from Stringer et al. [37].
Fig 3.
The top row displays the relative arrangements between the target object and the hand (for all simulations) as well as how neighbouring target locations begin to overlap when the density of training locations increases—thus essentially forming a continuum of hand-centred locations. The brightness of the circle varies for visualization purposes. The bottom rows shows how a particular hand-object configuration was seen in different transforms during training by shifting it across the retina.
Fig 4.
Response profiles of the same 4 output neurons before and after training in a simulation with 7 hand-object configurations.
The top two rows show the binarised activation of 4 example output neurons before training. A red field indicates a response to a particular stimulus while a blue field indicates no response. The abscissa displays the 7 different relative configurations between hand and object, while the ordinate indicates all 10 retinal locations of each configuration. The bottom two rows show the firing rates of the same 4 neurons after training. Visual inspection reveals that before training the 4 neurons exhibit random firing when being presented with all 7 hand-object configurations across different retinal positions. However, after training, each of the 4 neurons responded to only one particular hand-object configuration, and responded to that configuration across all 10 retinal locations. These neurons had thus learned to display perfect hand-centred response characteristics.
Fig 5.
Single and multiple cell information measurements.
The illustration shows the single and multiple cell information carried by neurons in the output layer for 8 separate simulations, each of which used a different number of hand-centred target locations ranging from 3 to 10. The upper plot compares the amount of single cell information carried by neurons in the output layer before training (dashed lines) and after training (regular lines) in ranked order. It can be seen that in all simulations training the network tuned at least 150 neurons to reach maximal single cell information after training, while there were no such neurons in the untrained condition. Visual inspection of the lower plot, which shows the multiple cell information for the same set of simulations, reveals that for all simulations training led to a large increase in the multiple cell information. In particular, for the simulations with up to 8 hand-centred target locations the multiple cell information approached the maximal value, which indicates that all hand-object configurations were represented by the output neurons.
Fig 6.
The ability of output neurons trained on a limited number of discrete hand-centred locations to generalise their responses to a spatial continuum of hand-centred locations.
Results are shown for 6 separate simulations in which the network was trained on N = 3, 5, 7, 8, 9, 10 hand-centred target locations, but tested on a (near) continuum of 20 highly overlapping hand-centred target locations. The figure shows how the receptive fields of the population of hand-centred output neurons are distributed across the 20 hand-centred target locations used to test the network. That is, for each simulation we plot the number of hand-centred neurons assigned to each of the 20 test locations, where, on the abscissa, 0 denotes the target object at the leftmost hand-centred position and 1 at the rightmost position. It can be seen that for all simulations in which the network was trained with 7 or more hand-centred target locations, the network developed hundreds of hand-centred output neurons. Moreover, these hand-centred output neurons were distributed fairly evenly across the 20 hand-centred test locations. These results confirm that, for simulations in which the network is trained with only a limited but sufficient number (i.e. 7) of hand-centred target locations, the hand-centred output neurons are able to generalise their responses to a spatial continuum of hand-centred locations.
Fig 7.
Effect of stimulus presentation order.
Comparison of simulation results obtained when the stimuli are presented in two different temporal orders during training. In the first set of simulations (blue) the stimuli are presented in the original order used in section 3.1, while in the second set of simulations (red) the order of the stimulus presentations is flipped. For each of these two training conditions, we compare the average number of output neurons that achieved perfect hand-centred firing characteristic when using a different number of hand-centred target locations ranging from 3 to 10. It can be seen that there is no substantial difference in the numbers of neurons developing hand-centred responses with the two different training conditions. This confirms that the CT learning mechanism operates very robustly with respect to changes in the presentation order of stimuli during training.
Fig 8.
Response profiles of 4 output neurons when trained on a continuum of hand-centred target locations.
In this simulation, the network was trained on 30 highly overlapping hand-object configurations in order to simulate a (near) continuum of target locations in a circular arc around the hand. The two left columns of this figure show the responses of 4 example neurons in the output layer before training. The two upper rows show the binary response matrices of the 4 neurons to the complete set of 30 hand-object configurations presented across all 10 retinal positions. The two lower rows present the same data, where the responses for each neuron have been averaged over the 10 retinal positions and then plotted with respect to the hand. The two right columns display the firing behavior of the same neurons following the same conventions but after training. It can be seen that before training the neurons initially responded randomly to different hand-object configurations presented in different retinal positions. However, after training, each neuron had learned to respond to a localised cluster of hand-centred target object locations, and responded to this localised region of hand-centred space across different retinal locations.
Fig 9.
Frequency of hand-centred neurons with receptive fields localised at different positions across the hand-centred space.
The illustration shows results for three separate simulations in which the network was trained with N = 15, 20 or 30 hand-centred target locations. The conventions used in this figure are the same like in Fig 6. It can be seen that the distribution of hand-centred neurons across the hand-centred space is approximately uniform for simulations with 15, 20 or 30 hand-centred target locations. These results confirm that CT learning produced an even visual representation of the space around the hand for all three simulations.
Table 4.
Parameters of the training image sets used for the simulations represented in Fig 10.
The stimulus sets used for the PCA were parameterized according to the rows (a) to (f). Each set consisted of images of hand-object configurations shown across different retinal locations.
Fig 10.
Eigenimage representations of stimulus sets used for six selected simulations.
The results of a PCA of the stimulus sets used for six different simulations are shown in rows (a) to (f) respectively. The stimulus set for each simulation consists of images of a number of alternative hand-object configurations presented across all retinal locations. These sets were parameterized according to Table 4. Columns 1 to 5 show the first five eigenimages of the stimulus set used for the simulation represented in each row. The sixth column shows the (cumulative) explained variance of the first 5 principal components. The simulations represented in rows (a), (b), (c) and (e) developed hand-centred output neurons, while the simulations shown in rows (d) and (f) did not develop hand-centred neurons. It is evident from the eigenimages shown in the figure that for those simulations that produced hand-centred output neurons, the highest principal component eigenimages represent the positions of targets with respect to the hand rather than the retinal location of hand-object configurations. Thus, for those simulations in which hand-centred output neurons developed, the greatest source of variance arises from changes in the hand-centred locations of target objects rather than changes in retinal locations of the hand-object configurations.