^{1}

^{2}

^{*}

^{2}

^{3}

Conceived and designed the study: YR AT. Performed the experiments/simulations/mathematical derivations: YR. Analyzed the data: YR. Wrote the paper: YR AT.

The authors have declared that no competing interests exist.

Behaving in the real world requires flexibly combining and maintaining information about both continuous and discrete variables. In the visual domain, several lines of evidence show that neurons in some cortical networks can simultaneously represent information about the position and identity of objects, and maintain this combined representation when the object is no longer present. The underlying network mechanism for this combined representation is, however, unknown. In this paper, we approach this issue through a theoretical analysis of recurrent networks. We present a model of a cortical network that can retrieve information about the identity of objects from incomplete transient cues, while simultaneously representing their spatial position. Our results show that two factors are important in making this possible: A) a metric organisation of the recurrent connections, and B) a spatially localised change in the linear gain of neurons. Metric connectivity enables a localised retrieval of information about object identity, while gain modulation ensures localisation in the correct position. Importantly, we find that the amount of information that the network can retrieve and retain about identity is strongly affected by the amount of information it maintains about position. This balance can be controlled by global signals that change the neuronal gain. These results show that anatomical and physiological properties, which have long been known to characterise cortical networks, naturally endow them with the ability to maintain a conjunctive representation of the identity and location of objects.

Forming a coherent picture of our surrounding environment requires combining visual information about the position of objects (

Visual object perception, which is often effortless despite partial occlusion or changes in view, shading, size, etc., has been associated to attractor dynamics in local cortical circuits

The above-mentioned paradigm is conceptually very successful in explaining how information about the identity of an object can be retrieved from noisy input and maintained in working memory, even when the input is transient. However, in day to day life, the identity of an object is hardly the only type of information that one needs to retrieve and maintain about it. If you look at a scene for a short time and then turn your head away, you will still remember details about what objects were present in the scene and where they were located. You can even do this if many of the objects in the scene were occluded. These abilities allow us to maintain a coherent representation of our surrounding environment and are crucial for most real world visually guided behaviours. Visually guided behaviour often requires extracting information about identity of objects (

A great deal of experimental work has been focused on understanding this issue

In this paper, we study how a recurrent network can retrieve

In our model of a cortical patch, we assume that the network responds to stimuli with attractor dynamics, thus autoassociatively retrieving activity patterns from memory. At any given time, the response of a neuron indexed _{i}_{i}_{ij}_{i}(t), the output firing rate at time t+1 is then determined through a threshold-linear gain function_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}

In this paper, we do not explicitly model inhibitory neurons. However, we model their effect on the dynamics of the network by adjusting the threshold of excitatory neurons, such that the mean activity of the network remains constant, i.e. at all times we have

We assume that the network has “learnt”, that is, stored, _{i}^{μ} = 1 if neuron _{i}^{μ}_{ij}_{ij}_{ij}_{ij}

In the model we present here, the way neurons are connected to each other plays a major role in determining whether the network can represent _{ij}

This is the case which has been considered in most previous analyses of autoassociative networks. The alternative is a network with _{i }and _{j }are position vectors of neurons

To assess whether the activity of the network at a given time can reflect which object was presented to it in the past, we need to quantify the degree of correlation between the stored patterns and the activity of the network. We do this by measuring the

Intuitively, _{i}^{μ}_{i}^{μ}^{μ}

This implies that if the sum of _{i}^{μ}

The local overlap is also important for a more technical reason: it turns out to be a natural parameter for which one can easily write a self-consistent equation whose solutions describe the attractors of the network, as shown in our previous work _{i}_{η}_{j}_{j}

By solving Eq. (10) for _{i}_{i}_{i}

In what follows, we show an example of a retrieval bump in a 2D recurrent network with metrically organised connectivity. We consider a two dimensional network containing _{i} is connected to a neuron in position _{j} with probability

In the simulations reported here the width of the connectivity,

At the beginning of the simulation a 15×15 square centred on the neuron in position (58,58) is chosen. The activity of neurons inside this square are initialised to their activity in the first stored pattern while the activity of other neurons are set to zero, that is in the beginning of simulation _{i} is in the square and _{i}_{i} is outside it. In this way at the beginning of the simulation the dot product overlap with the first pattern and the others have the following values

_{i}

Thus by calculating these dot products, or equivalently calculating the sum of the local overlaps _{i}^{μ}

The network is comprised of N = 70×70 = 4900 neurons, each connected to ^{μ}_{i}^{μ}^{1}(^{μ}^{≠1}(

The spatial distribution of activity would have been different (

In this case, too, by cueing one of the patterns, as we did for the metrically organised network, after 200 time steps, we have ^{1}(^{μ}^{≠1}(

Even though

This figure shows the trajectory that the peak of the local overlap follows during the simulation reported in

Can we code the position of an object by the position of the peak of the retrieval bump? The answer to this question depends on whether the retrieval process can end with the peak of the bump on the intended position. We first examine whether the position of the cue (which can be thought of as the position of an object in the retina) determines the positions of the retrieval bump. If the retrieval bump appears at the same position as (or is uniquely determined by) the centre of the cue, it is possible to read the activity of the network and simply decode both

To examine the relation between the position of the initial cue and the final position of the retrieval bump, we ran simulations in which the position of the initial cue was systematically changed across the network and the distance between the position of the retrieval bump and the position of the initial cue was measured. In _{i}

This graph illustrates how the position of the cue affects the position of the bump after retrieval is completed. Cue position (the centre of a 15×15 square window) was chosen on a 7×7 grid with grid spacing 10, whose vertices are shown in (A) as green squares. The red circles show the position of the peak of the local overlap, as it drifted away from the linked green square, over 200 time iterations of the simulated retrieval process. All runs in this example resulted in successful retrieval, i.e. the overlap between the final activity of the network and the cued pattern was higher than with other patterns. (B) The distribution of the distances between the centre of the cue and the centre of the local overlap after 200 time steps. In the legend,

The small number of final stable positions of the bump resembles what has been noticed before in recurrent networks with distance dependent weights between neurons but without stored memory patterns. In such models the synaptic weight between two neurons is generally taken to be excitatory at short distances while inhibitory at long distances

Even though the final position of the bump cannot accurately tell where the cue was initiated, it may still be able to code for a large number of positions in a network with realistic size. This happens if the number of final stable positions increases with the size of the network. To examine this relation, we scaled up the simulated network. The result of such scaling analysis is reported in

The scaling is done for

The approximately linear scaling of the number of final positions with network size indicates that a large number of positions can be represented in realistically large networks, but not

In this section we show that the bump of activity can be stabilised on an arbitrary position on the network if neurons around that position have a slightly higher linear gain than the rest of the neurons. This increase in the linear gain applies to all neurons in that area in the same manner, whether they are selective for the cued pattern or not; that is, it is not pattern selective and solely reflects object position. This

Suppose that a non-pattern-selective signal changes the gain of those neurons which correspond to the position of the object in the visual scene. The effect of such gain modulation is shown in

The panels summarise the results of simulations conducted as for Fig. 3 except for two factors. First, the pattern-selective cue is not localised, i.e. the local overlap at the beginning of each simulation is a uniform function across the network; hence, the distribution of the peak of the local overlap in the beginning of the simulation is not shown: there are no significant peaks. Second, neurons inside the 15×15 square centred around each green square in (A) have a gain factor ^{*}^{*}

In the simulations of

The result of such change in the spatial distribution of the gain is evident for

It should be noted that while in

Even though increasing the gain in a spatially restricted part of the network stabilised the final bump there, there is a disadvantage with this strategy: by using such non-uniform gain, the number of successful runs decreases. Remember that the quality of the cue used in

In order to quantify the interaction between

We denote by _{what}_{where}_{what}_{what}_{where}_{where}_{what}_{where}

To start with, we consider a network (with the architecture used before) that has stored

We first calculate _{what}_{where}_{what}_{where}

In this condition, in the beginning of the simulation, the activity of all neurons in the network is set to their activity in the pattern to be retrieved. _{where}_{what}_{what}_{where}_{what}_{where}_{what}_{where}_{where}_{what}_{where}_{what}

(A) Condition 1: complete cue. (B) Condition 2: 4.6% of the neurons are randomly chosen and their activity is initialised to their activity in the cued pattern, while the rest has zero activity. (C) Condition 3: similar to Condition 2 but the neurons that receive the cue are spatially co-localised with gain modulation. For each cueing condition, the trade-off is shown for

In this condition, we had provided the network with a perfect cue. It is important to see whether a similar trade-off occurs in the more interesting condition in which the cue is incomplete, so that the network has not merely to maintain but also to actually retrieve identity information.

In the second cueing condition, the cue that is given to the network in the beginning of the simulation is incomplete, and it does not have any spatial information. In this case, at the beginning of the simulation, the activity of 4.6% of randomly chosen neurons are set to their activity in the pattern to be retrieved, and the rest at zero. Using such imperfect cue serves to assess the degree to which the network is able to maintain _{where}_{what}_{what}_{where}_{what}_{where}_{what}

In the first two conditions the cue did not have any spatial information by itself and the spatial information was provided entirely through gain modulation. In the third cueing condition, the quality of the cue is the same as condition 2, but now the cue is itself localised. On each run, a 15×15 square is considered whose centre is on one of the 49 pre-assigned positions on the network. In this condition, the square with higher gain coincides with the localisation of the cue. The activity of neurons in the square is set to their activity in the pattern to be retrieved, and the rest to zero. Note that the quality of the cue is the same as condition 2, but now these neurons are not randomly chosen, but are rather localised in a certain region of space.

The analysis of simulations in this Condition 3 is shown in _{where}_{what}_{what}_{where}

The dependence of the _{what}_{where}

In the previous section, the background gain was _{what}_{where}_{what}_{what}_{where}_{what}_{what}_{where}_{min}_{max}_{min}_{min}_{max}_{what}_{what}

When there is no gain modulation, the network fails to retrieve any of the stored patterns (star). When localised gain modulation is incorporated, both _{what}_{where}_{where}_{what}

When a retrieval bump is localised on a particular position, one can in principle use the rest of the network to retrieve other patterns, in the form of additional bumps of activity. If they can coexist with the first bump, the network would then be able to represent the position and identity of multiple objects simultaneously, without encountering the problem of binding. In random networks with no metric connectivity nor localised retrieval, retrieving multiple patterns is indeed possible, at very low storage loads

Assume that a pattern is retrieved and, using localised gain modulation, the bump of activity is stabilised on a desired position. A second cue may then be presented to the network at another position. Even though most of the connections to each neuron in the network come from nearby neurons, the second pattern would still affect the first retrieval bump, because of the global inhibition in the simplest version of our model, as inhibition is taken to regulate a common threshold, such that the mean activity of the network is constant (Eq. (3)). This introduces interactions between distal neurons, which are not directly connected by excitatory synapses, and such interactions are generally disruptive. A simple way to reduce such interaction is to assume that when the local mean activity in part of the network exceeds some limit value, the threshold is raised but only locally, regardless of the activity of neurons outside that region. The local threshold may also be regulated downward, to facilitate the emergence of a retrieved pattern in a region which would otherwise be kept at too low a mean activity level. With such additional provisions, multiple bumps can be formed and stabilised, as shown in the example in

In the beginning of the simulation, pattern 1 is presented to all units in a 15×15 square whose lower left corner is at node (1,1); that is, for each neuron _{i}^{1}, and for those outside to zero. While simulating the network, background threshold values are set to regulate the mean network activity to a fixed level equal to 0.2. If mean activity inside a 30×30 square centred on the cue centre exceeds 1.0, the threshold of neurons inside this square will be regulated to keep its mean activity equal to 1.08, and neurons outside it will be assigned a high threshold. In the second phase of the simulation, in the right column, a second pattern is also presented to all units in a 15×15 square whose lower left corner is at node (36,36), accompanied by a local threshold decrease to facilitate the pattern “holding on”. The threshold is then regulated in the same way as the first pattern. (A) The distribution of activity, (B) the local overlap with the first pattern (cued in the beginning of the first phase) and (C) the local overlap with the second pattern (cued in the beginning of the second phase), all 100 time steps after the presentation of the first pattern. (D), (E), and (F) are the same quantities as (A), (B) and (C) but 100 time steps after the presentation of the second pattern.

Behaviour requires processing and integrating different types of information, from various sources and modalities, into a coherent picture of the world. Within the visual domain, a specific question is how the brain can analyse the identity of objects, which has to be extracted from raw visual input, while maintaining information about their position, directly present in the input.

Previous theoretical work on the representation of objects in neural circuits has been mainly focused on two issues

Our model sheds light onto this issue of how to combine the representation of

The localised retrieval process described here also offers the possibility of retrieving multiple objects while maintaining their position, without facing a binding problem

In our model, independent attractors are set up in a local cortical network only for object identity, as position invariant representations; but they can be accessed in a spatially focused mode, leading to position dependent activity. Associating a single representation to an object, which is then modulated by position, is a particular case of what in cognitive neuroscience parlance is sometimes referred to as type (e.g.

The difference between these two mechanisms is directly reflected in the storage capacity required for object-position attractors, in order to represent the same amount of information as the mechanisms studied here does through spatial modulation. Representing 6 bits of _{where}_{what}^{(6+2)} = 256 distinct object-position attractors. This is ca. 2.5 times beyond the number of attractors that a randomly connected network, with the same number of connections per neuron and the same mean activity level as what we used, could store _{where}

In our model, localised gain modulation is crucial for maintaining

What mechanisms can trigger the neuronal gain?

Another possible source for increasing the gain is attentional signals. In this case the increase in the activity level required for gain modulation is induced by the attentional signal and the position of the bump corresponds to the position of the attentional spotlight. There are several reasons that make attention a likely source of activity localisation through gain modulation. fMRI studies on human subjects show that the retinotopic representation of the position of an attended object in visual cortices show increased activity

Once the increase in the gain of neurons in the right part of the network is triggered, it should be maintained during retrieval. Although the same mechanisms that initiated gain modulation can keep the gain high during retrieval, a promising mechanism for maintaining high level of gain, particularly after the stimulus is removed, is single neuron memory. Several studies show that the recent history of spiking increases the responsiveness of neurons, and that this increase can last for several seconds, thus exhibiting a form of single neuron short-term memory

As discussed above, the attentional signal may trigger the increase in neuronal gain and maintain it elevated for some time. After the attentional signal is removed, the increase in neuronal gain can be maintained by single neuron short term memory mechanisms. Attention can then be directed to another object, while

One of the roles of attention is to bias the competition for limited processing resources in favour of the object that it is acting on

The cue corresponding to pattern one is given by setting the activity of neurons inside a 15×15 square box whose lower left corner is on neuron (1,1) to their activity in pattern one. The cue corresponding to pattern two is given by setting the activity of neurons inside a 20×20 square box whose lower left corner is on neuron (33,35) to their activity in pattern two. Plotted are the final dot product overlap (minus the mean activity; Eq. (9)) with the first stored pattern (full line) and the second stored pattern (dashed line) for two values of the connectivity width (A)

In the model presented here, the units are taken to be arranged on a retinotopic patch of cortex, corresponding to at least a portion of visual field, but we assumed patterns of activity to be generated from a spatially uniform distribution (see Eq. (4)). A more realistic model, however, should allow for the storage of spatially organised patterns

To include the coexistence of categorical and retinotopic maps in the model presented here, one might consider two limit cases, which roughly correspond to these two views. In the first case, category specificity and weak retinotopy coexist at the same spatial scale; one should then assume, in a refined model, that patterns are generated from multiple distributions, each of them corresponding to one category of objects, and patterns drawn from each have higher activity at a

Further work is required, especially in view of many intermediate possibilities, to assess, for example, how much more gain modulation would be needed in order to stabilise a bump of activity away from its preferred position, and how this would affect retrieval.

The ability to represent

In discussing

If synaptic weights are produced by Eq. (5), the weights of the connections that originate from a given neuron can be both negative and positive. This is against Dale's law and against our assertion that all neurons in the model network are excitatory. In this section, we show how the model described in sections “Firing rate description of the network” and “Stored memory patterns and synaptic weights” (see Model) can be conceptually derived from a more realistic formulation, in which all synaptic weights are positive. Let us first consider a network in which the firing rate of neuron _{i}_{i}

The synaptic weights, _{ij}_{back}_{ij}_{ij}_{back}

Combining Eq. (14) with Eqs. (15) and (16), the firing rate of neuron _{ij}

In this section we briefly describe how the self-consistent equation for the local overlap with the retrieved pattern (Eq. (10)) can be derived. We refer the reader to

To start with, we assume, without loss of generality, that the first pattern (_{i}

Inserting _{i}

Averaging the right hand side of Eq. (22) over the distribution of _{j}_{η}_{j}_{j}

We now find the distribution of _{j}_{j}_{i}_{i}_{j}^{μ}_{i}_{i}

Noting that_{ij}_{η}_{i}_{i}_{i}_{i}_{j}_{j}_{i}^{2}

Equations (23) and (29) form a closed set of equations whose solutions determine the steady states of the system. Finding _{i}_{i}_{i}_{i}_{i}_{i}^{1} on

In this section we show how we compute _{what}_{where}_{what}_{c}_{r}_{r}_{c}_{r}_{c}_{c}_{c}

In the simulations all patterns are presented an equal number of times, therefore,

We denote the fraction of successful runs (when _{c}_{r}

Since in unsuccessful runs (when _{c}_{r}_{c}

Using Eqs. (32)–(34) in Eq. (30), we can thus write for fixed degree of gain modulation, fixed background gain, and fixed number of patterns,

Note that the above is, strictly speaking, only a measure of the information implicit in the selection among the

_{where}_{where}_{1} = [0,5],_{2} = _{10} = ^{2}, and we can write _{where}_{k}_{what}_{where}

In this section we discuss why in the low gain regime, gain modulation aids retrieval of the patterns whereas in the high gain regime it has a negative effect. We start from the self-consistent equations, Eqs. (23) and (29).

Assume that the steady state of the network is a bump of activity over a part of the network with single neuron gain _{i}_{i}_{i}_{i}

Eqs. (37) are of the form of mean-field equations of a recurrent network with non-metric connections _{i}_{i}_{min}_{max}_{max}_{min}_{min}_{min}_{max}_{max}

In this section, we discuss why it is more efficient to spatially modulate attractor states associated to objects, than to store distinct attractors for different positions of each object.

Under optimal conditions, the number of attractors that an associative memory with _{1}≃3−4 _{2}, that for the parameters and network size we used turns out to be _{2}≃4. This is actually an overestimation of the decrease in storage capacity due to localised gain modulation, for realistic size networks. This is because when we calculate the mean of the right hand side of Eq. (22) over the distribution of connectivity patterns and _{2}, the process described here results in a moderate reduction in storage capacity

The spatial modulation described here can represent positional information with a resolution _{p}_{p}

On the other hand, the naive storage of distinct, unrelated attractors for each object position pair decreases the number of objects, whose identity could be retrieved, to

An alternative arrangement might be to associate attractors to objects, but allow each attractor to be a continuous 2D manifold, different for each object, so that position can be represented by the position of a bump of activity on such attractor manifold, unrelated to the position of the active neurons in the tissue. This arrangement corresponds to the multiple spatial charts model of Samsonovich and McNaughton _{b}_{b}_{b}_{max}_{max}_{2} in our model.

Discussions with Bharathi Jagadeesh, Peter Latham, and Peter Dayan are gratefully acknowledged, as well as helpful suggestions by the anonymous reviewers.