Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Modulatory feedback determines attentional object segmentation in a model of the ventral stream

  • Paolo Papale ,

    Roles Conceptualization, Funding acquisition, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    p.roelfsema@nin.knaw.nl (PRR); p.papale@nin.knaw.nl (PP)

    ‡ These authors shared first authorship on this work.

    Affiliation Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam, Netherlands

  • Jonathan R. Williford ,

    Roles Conceptualization, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    ‡ These authors shared first authorship on this work.

    Affiliations Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam, Netherlands, STR, Woburn, Massachusetts, United States of America

  • Stijn Balk,

    Roles Software, Writing – review & editing

    Affiliation Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam, Netherlands

  • Pieter R. Roelfsema

    Roles Conceptualization, Funding acquisition, Writing – original draft, Writing – review & editing

    p.roelfsema@nin.knaw.nl (PRR); p.papale@nin.knaw.nl (PP)

    Affiliations Department of Vision & Cognition, Netherlands Institute for Neuroscience (KNAW), Amsterdam, Netherlands, Department of Integrative Neurophysiology, VU University, Amsterdam, Netherlands, Department of Neurosurgery, Academic Medical Centre, Amsterdam, Netherlands, Laboratory of Visual Brain Therapy, Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France

Abstract

Studies in neuroscience inspired progress in the design of artificial neural networks (ANNs), and, vice versa, ANNs provide new insights into the functioning of brain circuits. So far, the focus has been on how ANNs can help to explain the tuning of neurons at various stages of the visual cortical hierarchy. However, the role of modulatory feedback connections, which play a role in attention and perceptual organization, has not yet been resolved. The present study presents a biologically plausible neural network that performs scene segmentation and can shift attention using modulatory feedback connections from higher to lower cortical brain areas. The model replicates several neurophysiological signatures of recurrent processing. Specifically, figural regions elicit more activity in model units than background regions. The modulation of activity by figure and ground occurs at a delay after the first feedforward response, because it depends on a loop through the higher model areas. Importantly, the figural response enhancement is amplified by object-based attention, which stays focused on the figural regions and does not spill over to the adjacent background, just as is observed in the visual cortex. Our results indicate how progress in artificial intelligence can be used to garner insight into the recurrent cortical processing for scene segmentation and object-based attention.

Introduction

Objects are the building blocks of our perception. We perceive a visual scene as a layout of objects in the physical space. We tend to focus on one object at a time when scanning a visual scene, either overtly while moving our eyes or covertly by shifting attention [1]. This process, called object-based attention, is essential to group all the visual features that surround us into the objects with which we interact [25]. When we select a particular object for action, object-based attention is the mechanism that helps identifying the portions of the scene that belong to that object and segregating it from other objects and the background. If subjects grasp and manipulate small objects, attentional selection can occur at the required precise spatial scale [6,7].

Although it may appear effortless to us, object identification and object-based attention in the primate visual system rely on elaborate computations in a hierarchy of cortical regions. The input of the retina, relayed by the lateral geniculate nucleus (LGN), reaches the primary visual cortex (V1) where neurons have small receptive fields (RFs) and are tuned to simple features. Information is then propagated through feedforward connections to higher-order regions in the occipital cortex (areas V2, V3 and V4) and onward to inferotemporal (IT) and frontal (FC) cortices [8]. In the higher areas neurons are selective to more complex features, such as object parts and object identity and they have larger receptive fields.

However, if the task is to segment the scene, the computations carried by feedforward connections are only half of the story [911]. After reaching the higher visual areas, visual information is carried back to the lower areas through feedback connections [12,13] (Fig 1A). Whereas feedforward connections drive the initial response of neurons and shape neural selectivity, feedback connections modulate the level of neural activity. Feedback connections enhance the neuronal activity elicited by relevant figural regions over the activity elicited by the background, a process called figure-ground modulation [1417]. As a result, all the features of objects are enhanced in lower visual brain areas, such as V1, at a high spatial resolution. Furthermore, feedback connections mediate the influence of top-down attention to amplify neuronal activity elicited by task-relevant objects [5,7,18,19].

thumbnail
Fig 1. A neural network to model the primate visual cortex.

A) Hierarchically organized visual cortical areas that perform increasingly complex transformations of the visual input. V1 registers the information coming from the retina. Downstream regions (V2, V4) process information further and object-selective neurons in IT categorize the objects and send this information to the frontal cortex (FC) that represents behavioral goals. The frontal cortex sends attentional signals through feedback connections that modulate the activity of earlier brain regions, resulting in the segmentation of task-relevant objects at a high spatial resolution. B) In the model, the stimulus enters in area V1 (V1m) and feedforward connections (solid arrows) activate downstream regions that classify the objects. The network can select one of the object classes and use feedback connections for class-specific object segmentation in V1m. C) The pattern of connections between V1m, V2m and V4m. The network contains units that propagate information (drivers: solid lines) in the feedforward direction (red) and other units (blue) that propagate information in the feedback direction. The signal flowing in the feedback direction is based on a comparison (subtraction) of the activity of feedback and feedforward units (modulatory feedback: dashed lines). D) Schematic of the computations within one of the layers. a is the activity of feedforward unit, which propagates activity to the next higher layer, gates information flow of unit b (a multiplicative influence; *) and inhibits c (-) of the feedback pathway.

https://doi.org/10.1371/journal.pone.0337087.g001

At first sight, many of these neurophysiological observations may seem disconnected, because previous studies focused on only one or a few of these feedback influences [20], or used models with connectivity that was largely handcrafted [2124]. We therefore sought to integrate previous findings into a coherent framework, capitalizing on the recent advances in the development of artificial neural networks (ANNs) [2527].

State of the art ANNs can achieve human level performance on many complex visual tasks, ranging from object classification to segmentation [25,28]. For example, breakthroughs were achieved in the problem of semantic segmentation, where the goal is to determine the pixels that belong objects of specific categories [28,29]. ANNs for semantic segmentation have features in common with segmentation processes in the brain, by relying on two pathways. The first pathway starts with units representing pixels and end with units coding for object categories through a hierarchy of layers with increases in RF size and complexity of tuning. The second pathway propagates activity in the opposite direction, from object categories back to pixels, so that tuning becomes simpler and RF size decreases during successive processing steps [30]. The selectivity of units in the first pathway of ANNs, resembles the selectivity of neurons in the visual brain [3133]. However, the resemblance of the second ANN pathways to neurobiology is weaker. Firstly, in most ANNs units of the feedback pathway are independent from those of the feedforward pathway [30], whereas many neurons in the brain are influenced by both feedforward and feedback connections [34]. Secondly, in the visual cortex, neurons that are influenced by feedback and those that are mainly feedforward tend to segregate into different layers of the cortical column. The feedback influences are weak in cortical layer 4, which is the input layer of cortex, and stronger in the superficial and deep layers [3538]. Interestingly, neurons with different degrees of sensitivity to feedback effects reside in the same column and have a similar tuning. Thirdly, the feedback pathway of ANNs drives the units, whereas feedback connections in the brain are mostly modulatory. This means that feedback connections usually do not directly activate the neurons but amplify or decrease the feedforward response. Some of these feedback influences are multiplicative, with a stronger influence on neurons that are well driven by feedforward input than on neurons that are not [14,3942].

Hence, the brain processes underlying image segmentation and, in particular, semantic segmentation have remained unclear. How do the feedback pathways of the brain, which are modulatory and intermingled with the feedforward pathways, achieve object segmentation? To answer these questions, we built a biologically inspired ANN to examine if a minimal architecture, based on the current neuroscientific understanding of visual processing, can achieve figure-ground segregation and semantic segmentation based on selective attention, similarly to what has been observed in primates [41]. In our model, feedforward and feedback influences occur within the same cortical column and feedback is modulatory [3638]. The network comprised a hierarchy of areas modeled after the cortical brain regions involved in object processing – denoted here with an ‘m’ suffix – from V1m to FCm (Fig 1B). We included a feedforward pathway, driving the activity in successively higher areas for object identification, and a feedback pathway that modulated neuronal activity (Fig 1C,D) to highlight the figural image regions, in particular if they were attended.

We demonstrated that the model could produce an accurate segmentation of attended objects by increasing the firing rate elicited by image elements in V1m, representing the relevant figure at a high spatial resolution, whereas the activity elicited by nearby background elements was suppressed, just as observed in the visual cortex of monkeys [41,43]. Our networks shows that our current understanding of visual processing in the primate cortex can provide an explanation for object attention and segmentation. When we examined the tuning of model units we observed similarities to neurons in the visual cortex, indicating that the model enables new insights in how feedforward and feedback connections jointly determine neuronal activity.

Materials and methods

Model architecture

Fig 1B provides an overview of the model, which consists of several areas, loosely mirroring the ventral stream of the primate visual cortex. Our architecture was inspired by an ANN [30] that performs semantic segmentation in separate feedforward and feedback pathways, assigning an object class to pixels of the input image (Fig 2A). The architecture introduced by Hong et al. [30] achieved this by first mapping pixels onto object classes in a feedforward network. It then selected one of the classes at a time for a given image and retrieved the pixels belonging to that class in an additional feedforward network that mapped classes back onto pixels. Hence, this ANN carried out an object-based attention task in a purely feedforward network. Instead, our model used feedforward connections that are driving and feedback connections that are modulatory. We trained feedforward and feedback connections separately, during different phases.

thumbnail
Fig 2. Modulatory feedback achieves accurate segmentations of attended objects in cluttered displays.

A) Eight example stimuli and the segmentations achieved by the model. Each row shows an example stimulus and the segmentation generated by the model when instructed to attend to a specific object class (blue, low activity level denoting background; yellow, high activity level labeling the segmented object). The segmentations in V1m are accurate, demonstrating that modulatory feedback connections enable a selective segmentation of the attended object category. B) The internal representation of an example stimulus (left), show how the activity of V1m feedforward units “a” (middle left), is subtracted from the modulatory feedback “b” (middle right) coming from a downstream region, to generate the final modulated response “c” (right).

https://doi.org/10.1371/journal.pone.0337087.g002

The model comprised two main processing streams, a feedforward pathway that drives the neurons and a feedback pathway that is modulatory. The feedforward pathway took an image as input and produced a classification vector that indicated the object classes that were detected in the image as output. This feedforward pathway was not modulated by feedback, akin to what observed in the input layer 4 of area V1 [3638]. The feedback pathway in our model contained units that were influenced by the feedforward units in the same area and that were also modulated by feedback connections, thereby deviating from most previous ANNs. The feedback pathway started with the classification vector at the highest levels of the feedforward pathway, which it combined with a one-hot attention vector that specified a target category (e.g., “attend the texture square” in Fig 1A). The goal of the feedback pathway was to highlight all the low-level features elicited by the target object with enhanced neuronal activity.

Feedforward pathway

The network layers were grouped into areas that roughly correspond to visual cortical areas. All the layers within a given area had the same size. Areas V1m, V2m and V4m had 20 features and 56x56 pixels, 50 features and 28x28 pixels, and 100 features and 14x14 pixels, respectively (Fig 1B), resulting in larger RFs in higher areas (i.e., 2x2 pixels in V1m, 4x4 pixels in V2m and 8x8 pixels in V4m). Each area had three convolutional layers with skip connections bypassing the convolution followed by batch normalization and a ReLU function. The output of each area was down-sampled by a strided convolution (stride of 2 and a 2x2 kernel size). The down-sampled output from V4m (100 features by 7x7 pixels) was passed to a fully connected layer (ITm: 500x1x1), which was then passed to another fully connected layer (FCm: 12x1x1), which had one unit for each of the 12 object classes in our task. This layer used a sigmoid activation function (i.e., softmax) to estimate the likelihood of the various classes appearing in the stimulus. Multiple object classes could appear at the same time.

Feedback pathway

The interactions between the feedforward and feedback pathway ensured that the feedback from the selected shape in FCm modulated the appropriate lower-level units. A 12x1x1 layer encoded the object that should be attended (the “attend” label in Fig 1B). This attention signal was multiplied (elementwise) with the classification vector from the feedforward pathway. A series of fully connected layers were used in the feedback pathway, mirroring the feedforward pathway, followed by a “deconvolution” (transposed convolution) layer that produced output with dimensions of 100x7x7. This output was then concatenated with the corresponding 100x7x7 layer of the feedforward to produce a 200x7x7 tensor, which was passed to a 1x1 convolution reducing the tensor back to 100x7x7.

Each unit in the feedforward pathway had a corresponding (deconvolutional) unit in the feedback pathway in V1m, V2m and V4m, which can be conceived of as being part of the same cortical column (Fig 1D). The layers had the exact same dimension as the corresponding feedforward layers. The feedback connections did not drive units in these areas (unlike the connections from FCm to ITm), but they modulated the activity, in accordance with the effects of attention in the brain [39]. In our model the modulatory feedback interaction was implemented as:

where is the non-negative activity of a feedforward neuron in layer , is the feedback received from a higher layer via a strided convolution, is a learned parameter, and is the activity of a unit of the feedback pathway (Fig 1D, Fig 2B). The feedback signal that is propagated to the next lower layer, is the difference between the activity of the feedforward and feedback units and it constitutes a relatively pure segmentation signal:

Stimulus sets

Based on the MNIST dataset [44], we created an augmented stimulus set, SegMNIST (Fig 2A). We followed recent approaches to make MNIST-like stimulus sets more challenging [4548] and casted it as an object localization task, where the relevant shape could appear at various locations in the image. The size of the image was increased to twice the size of the MNIST image (56x56 pixels instead of 28x28) and MNIST digits were translated, scaled, and positioned on top of a texture. To gain insight in figure-ground segregation processes, we also added textured squares and rectangles at different rotation angles with an orientation that differed from the background. Similar stimuli were used in previous monkey experiments on texture segregation [39,41]

Training of the feedforward and feedback pathways

The feedforward pathway was trained to classify objects of the SegMNIST stimuli using Adam [49] with a learning rate of 0.001, L2-regularization (weight decay = 0.0005), and as loss a sigmoid combined with binary cross-entropy (pytorch’s BCEWithLogitsLoss) in 2,500 epochs (batch size = 1024). There were 12 object classes; a square, a rectangle and the 10 digits, with one to three objects in each image. The task of the network was to determine which object classes were present in the image. We trained the modulatory feedback pathway after the weights of the feedforward pathway had been fixed. The feedback pathway was trained on object segmentation using Adam [49] with learning rate of 0.001, L2-regularizations (weight decay = 0.0005), and as loss a sigmoid combined with binary cross-entropy (pytorch’s BCEWithLogitsLoss) in 2,500 epochs (batch size = 256). The objective was to maximize the number of pixels in the image that were correctly classified (with activity 1 for pixels belonging to the attended class and 0 elsewhere). Because of the imbalance in the number of pixels in objects and the background, we ignored background pixels when computing the error, because it would have biased the error and led to poor generalization. The network was trained using one GPU.

Simulating the activity propagation in V1m

To evaluate the influence of attentional selection on the time course of activity, we let the feedforward activity propagate across the layers in successive time steps. The initial feedback modulation could proceed in lower layers while the feedforward activity was propagating to the higher layers of the network (Fig 3). We simulated a single feedforward pass and a single feedback pass but did not model full recurrence. Our simulations address the situation in which a subject attends a specific shape. We initiated the network at the activity level elicited by a white noise stimulus and activated the attentional template of the to-be-selected shape in FCm. We then presented the stimulus and the activity propagated to the next higher and lower layer on every time step. We presented 1,000 stimuli (with the model attending to either figures or digits) and averaged the responses. The time-courses shown in Fig 3B,D were up-sampled (to 160 steps) and smoothed with a 3rd order Savitzky–Golay filter (7 timestep window) for visualization purposes.

thumbnail
Fig 3. V1m evolves figure-ground modulation by selective attention as observed in monkey V1.

A) Activity of V1 neurons in the study of Poort et al. [41], in which monkeys were cued to either detect texture-based square figures (top) or to perform another task in which they did not attend the figures (but traced a curve). V1 activity depended on whether the texture elements falling in RF were part of the figure, the ground or the edge between figure and ground. It also depended on attention. There were three different phases of activity: a first transient response (top, left green arrow), a later phase in which figures elicited more activity than grounds (middle orange arrow) and an even later phase in which attention increased the figure response (right purple arrow; compare the distance between red and blue curves in the upper and lower panels). B) Response of feedback units in V1m when attention is directed to the figure (top) or to one of the digits (bottom). The activity was propagated for one connection at a time. Dashed regions represent SEM (across 1,000 stimuli with different textures). Every step in the model roughly corresponds to 66ms of processing time in the visual cortex. C) Space-time profile of figure-ground modulation (FGM: difference between the figure and background response) for V1 neurons when the monkey attended to the figure (top) and or attended away (bottom). The horizontal axis represents the position of the RF, which either fell on the figure, the edge or the background. D) Space-time profile of V1m units when the model attended the square (top) or one of the digits (bottom).

https://doi.org/10.1371/journal.pone.0337087.g003

A Google Colab of the project with the model results and all units is available at https://colab.research.google.com/drive/13O0-4uYq3l1tZupVn-NLP8C-WvakajCG?usp=sharing.

Results

Modulatory feedback connections achieve accurate segmentations of attended objects

We trained an ANN to recognize and segment the digits 0 through 9, textured squares and rectangles in a display with a background of texture elements with an orientation that differed from the foreground squares and rectangles. First, we trained the feedforward path of the network to recognize these shapes. Then we trained the feedback pathway to attend and segment one of the shapes from the background. In this training phase, we used a task that was inspired by Poort et al. [41], who trained monkeys to either detect texture-based squares similar those of Fig 2 or to perform another task in which they could ignore these texture-defined figures.

A few example stimuli and the model’s segmentation results are shown in Fig 2A. The stimuli (left column) were particularly challenging because the digits were variable and complex, the edges of the digits blended with the lines of the background texture and the squares and rectangles were defined by a difference in the orientation of the texture elements. Despite these challenges, the classification accuracy of feedforward path of the model was nearly perfect. Specifically, the activation of the FCm units reflected the objects classes with a mean cross-validated absolute error of 0.023 (this measure ranges between 0 when all objects are correctly classified in each image and 12 if none of the objects are correctly classified).

We next examined if and how neurons of the feedback pathway were influenced by the presence of figure-ground stimuli and the shape that had been selected in FCm. The feedback pathway caused highly accurate segmentations. Responses of feedback units to images regions of the attended category were enhanced over the responses of units that responded to the background and also over the responses elicited by non-attended objects (yellow regions in Fig 2A denote attended regions, which have been labeled with enhanced activity; blue regions are labeled as background). Consider, for example, the left stimulus in the second row of Fig 2A that contained a square and two digits, a ‘5’ and a ‘6’. When the square was attended, by switching on the corresponding unit in FCm, the activity of V1m units with receptive fields on the square was enhanced. This response enhancement was confined to the square at a high spatial precision and it did not spill over into the background. However, if attention was directed to one of the digits, the responses to the square were weaker and the response enhancement was confined to the pixels of the appropriate digit. These highly accurate segmentations occurred for most of the images, so that the mean cross-validated pixel-wise absolute error was 0.0015 (the error ranges between 0 when all object pixels correctly segmented and 1 if none of the object pixels is correctly segmented in each image; this measure does not consider background pixels). Furthermore, the proportion of background elements that were erroneously labeled as the attended class was very small (<0.1%).

These results demonstrate that modulatory feedback connections can explain the selective labeling of low-level image elements of an attended object category with enhanced neuronal activity, just as is observed in the primary visual cortex of monkeys [15,16,41,43]. When we examined the tuning of model units to shape and orientation, we also observed strong similarities to the properties of neurons in the visual cortex of monkeys, and the same held true for surround suppression (Fig. S1,S2 in S1 File). Interestingly, tuning to the shape of objects was stronger in the feedback pathway in V4m and V2m than in the feedforward pathway, indicating that feedback connections carried information about the identity of the attended objects from higher regions (ITm and FCm) to lower regions (Fig. S2 in S1 File).

We next compared the time-course of the activity of model units to the neurophysiological results of Poort et al. [41]. In that study, monkeys either attended a texture-defined square figure, similar to the stimuli in Fig 2A, or performed another task in which they ignored the square (i.e., mentally tracing a curve). The main neurophysiological results are summarized in Fig 3A, showing the time-course of the response of V1 neurons with a RF on the center of the square figure (red trace), the edge between the square and the textured background (black trace) or the background (blue trace). The activity started around 40ms after the presentation of the stimulus, when feedforward activations drive the V1 neurons. During this early response phase, the activity depends on the information in the RF, and after a short delay the activity of neurons with a RF on the edge between figure and ground is enhanced over the activity of neurons with an RF on the background. During a later response phase, around 100ms after the stimulus onset, the activity elicited by image elements that belong to the center of the figure start to be also enhanced over the activity elicited by background elements. Finally, approximately 200ms after onset, the influence of attention to the figure (Fig 3A, top) kicks in. Attention to the figure causes a further enhancement of the activity of V1 neurons with a RF on figure region, compared to when the monkey ignores the figure (Fig 3A, note that the separation between black and red traces is more pronounced in the lower panel). Although relatively small, the observed modulation was precisely correlated with the perception of the monkey, demonstrating very efficient and targeted top-down feedback to V1 neurons.

We examined the activity of V1m units in a version of the model in which we propagated activity in the feedforward and feedback directions by one connection at a time. Although the model task was not meant to exactly replicate the monkey experiment, it is notable that the overall activity profile of V1m units resembled the activity of neurons in monkey V1. The model units exhibited a first, transient peak response that was elicited by the appearance of texture elements in the RF (Fig 3B). The early response elicited by the edge (black in Fig 3B), was higher than that elicited by the figure and ground and the response elicited by the figure was stronger than that elicited by the background. At a later phase, attention to the textured square enhanced the activity evoked by the figure further (Fig 3B, top), compared to when attention was directed to one of the digits (Fig 3B, bottom). The influence of attention on neuronal activity in the model feedback pathway was stronger than in monkey V1 and activity elicited by the center of the figure exceeded the activity elicited by the edge. This difference between the model and the neurophysiological results is explained by the model’s training objective, which enforced binary segmentation where figural regions reach a target activity of 1. However, it is of interest that the order of the effects of figure-ground organization and attention was similar to that in monkey V1. A course fit between the model and the neurophysiology is obtained if we assume that every timestep in the model corresponds to ~66ms of activity in the visual cortex.

Fig 3C,D compares the spatial profile of the figure-ground modulation (FGM), which is difference in activity elicited by the figure and the background (subtraction of the blue curve from the red/black curve in Fig 3A,B), between neurons in monkey V1 (Fig 3C) and feedback units in area V1m of the model (Fig 3D). The monkeys saw a figure with a size of 4 degrees and the size of most V1 RFs was less than 1 degree. In the monkey experiment the location of the figure was varied so that the neurons’ RFs could either fall on the figure, the background or the edge. FGM was confined the figure representation. If the monkey did not attend the figure, FGM was stronger at the boundary between figure and background than in the center of the figure. If the monkey attended the figure, however, FGM in the center of the figure was more pronounced and reached the same level as that at the boundary.

Many of these observations also held true for the feedback units in V1m of the model, where FGM was also confined to the figure. If attention was directed to one of the digits, FGM was strongest at the boundaries of the square and weaker in the center of the figure. However, if the model attended the textured square, activity was also strong in the center of the figure. In the model, the activity reached a higher plateau, which is related to the training procedure in which the target activity level in V1m was 1, as described above.

Discussion

In this work, we built a biologically inspired ANN composed of a driving feedforward pathway and a modulatory feedback pathway to model the primate object processing pathway. We demonstrated that this simple architecture could shift selective attention to all the low-level features that belong to the same object, even if it was embedded in a dense texture, demonstrating that modulatory feedback permits the accurate segmentation of complex stimuli. We compared the activity of model units to the responses of neurons in the visual cortex of monkeys. Firstly, the model reproduced the spatial profile of neuronal activity in the primary visual cortex during image segmentation tasks, with a strong enhancement of the representation of image elements of figures relative to the background, even though the image elements

of figure and ground were, on average, the same. Secondly, the time-course of activity of model units in V1m resembled the time-course of the response of V1 neurons. Activity started with a transient response, followed by figure-ground modulation and the influence of attention was expressed at an even later time-point, because feedforward processing had to propagate to FCm, which then had to reach back to V1m. The delayed attentional influence is in accordance with neurophysiological evidence about comparable top-down influences [5053]. However, the attentional influence in V1m was stronger than that in monkey V1. A likely reason for this difference is that our model was trained on a binary segmentation task with a high target value for attended pixels, whereas V1 of animals contributes to many more tasks [17]. Thirdly, we found qualitative similarities between the properties of neurons in the visual cortex and in the model, including orientation tuning, surround suppression and shape-selectivity (S1, S2 Figs in S1 File). It is of interest that tuning to object identity was strongest in the feedback pathway in V4m and V2m, indicating that it was based on feedback from the higher model areas (Fig S2 in S1 File). These results thereby provide insight why object classes can be decoded from the activity of neurons in lower-level visual areas of the primate brain [5464].

There is neurophysiological evidence that modulatory feedback connections play a critical role in figure-ground segmentation in the visual cortex. Kirchberger et al. [65] investigated this with optogenetic silencing of activity in the visual cortex. When they silenced higher visual cortical areas, the initial V1 response to visual stimuli remained intact, but the enhanced activity evoked by figures during a later phase was strongly reduced, parallelling findings in monkeys [66]. When the activity in V1 itself was silenced, mice showed impairments in both contrast detection and figure-ground perception. Crucially, when V1 silencing was delayed until after the initial visual response the mice could detect simple visual stimuli but figure-ground perception was impaired. This result indicates that the later phase of V1 activity, during which figures elicit extra activity, is causally involved in figure-ground perception.

The present study focused on the relation between ANNs and mechanisms for image parsing in the brain. Our focus therefore differed from that of ANNs that aim to improve the state-of-the-art without considering biological plausibility. There are several artificial neural networks without feedback connections that accurately segment images and some of them, including transformer models [28], do not even use recurrent connections. In contrast, we here modeled the segmentation processes that take place in the visual cortex, and our main findings demonstrate that modulatory feedback suffice for accurate image segmentation. Knowledge of brain-like mechanisms for vision may inspire improvements in the performance of ANNs in several ways. For example, the inclusion of feedback connections in ANNs does not only increase the similarity with the computations in the brain [67,68], but improves classification accuracy and noise resistance [45,6971]. Similarly, incorporating horizontal connections, present between neurons in the same visual area, can help ANNs solve complex perceptual grouping tasks [7275]. This exemplifies how brain mechanisms can inspire improvements in ANNs, leveraging a rich cross-fertilization between neuroscience and artificial intelligence [76].

Previous modeling studies addressing the neuronal mechanisms for image parsing and texture segregation in the visual cortex used hand-crafted connectivity schemes and activation functions [40,41]. The present model went beyond these previous studies by using harder segmentation tasks and a neural network training scheme with a loss function for the segmentation outcome. The training procedure resulted in internal representations that resemble, to some extent, those of the primate brain, without handcrafting. A related approach [34], carried out semantic segmentation of digits of the MNIST data set. Our work goes one step further by examining texture-defined figure-ground stimuli, and by implementing a biologically plausible form of top-down feedback. Indeed, in our model feedback was modulatory. The activity in the feedback pathway was gated by feedforward activation and the model compared the feedback activity to the feedforward activation, to determine the magnitude of the segmentation signal (Fig. 1D). The model could segregate orientation-defined figures and illustrated how top-down attention amplifies figure-ground signals, just as is observed in the visual cortex of monkeys [41]. Furthermore, we evaluated the effect of feedback connections on the tuning of units, providing predictions such as an increase in shape selectivity and surround suppression in the feedback pathway, which can be tested in future neurophysiological experiments. A limitation of the present approach is that the network only used a feedforward and a feedback pass, but did not include full recurrence, even though the segmentation results are likely to further improve object-recognition, which in turn, may further refine the segmentation results, especially in shallower networks [77]. In such fully recurrent networks, segmentation and image recognition interact: segmentation at lower levels aids object recognition in higher areas [78,79] and, vice versa, object recognition in higher areas helps with the segmentation of image elements in lower areas [19,8082]. We here modeled shape-selective feedback in the ventral stream of primates, but note that areas of the dorsal stream and thalamic nuclei also send feedback to lower-level visual cortical areas and the LGN, which could play important, complementary roles in image segmentation [8388]. Furthermore, we first trained the feedforward connections for classification and fixed those before we trained the feedback connections for the segmentation task. Such a strategy may be in accordance with the developing cortex, where feedforward connections are formed before the feedback connections [89,90]. However, to our knowledge, it is unknown whether critical periods, during which the plasticity of connections is highest, exhibit a comparable temporal offset.

Overall, we provided a proof-of-concept demonstration that modulatory feedback connections provide a mechanism for the accurate segmentation of attended shapes. Our model represents a minimal architecture for object parsing in the brain, based on what we know about the roles of feedforward and feedback connections [3638]. We expect that future models will improve our understanding of these mechanisms to further elucidate behavioral [9194] and neural [9598] signatures of object segmentation in natural scenes. Future studies might also expand the biologically plausibility of neuronal networks for object-based attention by including horizontal connections and brain-like learning rules [99].

Supporting information

S1 File. Supplementary information include additional analyses on the tuning and RF properties of the model neurons, and their relative methods.

https://doi.org/10.1371/journal.pone.0337087.s001

(DOCX)

Acknowledgments

We thank Jasper Poort for providing the code and pre-processed data from his experiment, and Alexander van Meegen for his early contribution to the model implementation.

References

  1. 1. O’Craven KM, Downing PE, Kanwisher N. fMRI evidence for objects as the units of attentional selection. Nature. 1999;401(6753):584–7. pmid:10524624
  2. 2. Duncan J. Selective attention and the organization of visual information. J Exp Psychol Gen. 1984.
  3. 3. Kramer AF, Weber TA, Watson SE. Object-based attentional selection--grouped arrays or spatially invariant representations?: comment on vecera and Farah (1994). J Exp Psychol Gen. 1997;126(1):3–13. pmid:9090141
  4. 4. Behrmann M, Zemel RS, Mozer MC. Object-based attention and occlusion: evidence from normal participants and a computational model. J Exp Psychol Hum Percept Perform. 1998;24(4):1011–36. pmid:9706708
  5. 5. Roelfsema PR. Solving the binding problem: Assemblies form when neurons enhance their firing rate-they don’t need to oscillate or synchronize. Neuron. 2023;111(7):1003–19. pmid:37023707
  6. 6. Roelfsema PR, Houtkamp R. Incremental grouping of image elements in vision. Atten Percept Psychophys. 2011;73(8):2542–72. pmid:21901573
  7. 7. Roelfsema PR. Cortical algorithms for perceptual grouping. Annu Rev Neurosci. 2006;29:203–27. pmid:16776584
  8. 8. Van Essen DC, Anderson CH, Felleman DJ. Information processing in the primate visual system: an integrated systems perspective. Science. 1992;255(5043):419–23. pmid:1734518
  9. 9. Kar K, Kubilius J, Schmidt K, Issa EB, DiCarlo JJ. Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nat Neurosci. 2019;22(6):974–83. pmid:31036945
  10. 10. Thorat S, van Gerven M, Peelen M. The functional role of cue-driven feature-based feedback in object recognition. 2019. https://doi.org/10.32470/ccn.2018.1044-0
  11. 11. Lindsay GW, Miller KD. How biological attention mechanisms improve task performance in a large-scale visual system model. Elife. 2018;7:e38105. pmid:30272560
  12. 12. Felleman DJ, Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex. 1991;1(1):1–47. pmid:1822724
  13. 13. Buffalo EA, Fries P, Landman R, Liang H, Desimone R. A backward progression of attentional effects in the ventral stream. Proc Natl Acad Sci U S A. 2010;107(1):361–5. pmid:20007766
  14. 14. Poort J, Self MW, van Vugt B, Malkki H, Roelfsema PR. Texture segregation causes early figure enhancement and later ground suppression in areas V1 and V4 of visual cortex. Cereb Cortex. 2016;26(10):3964–76. pmid:27522074
  15. 15. Lamme VA. The neurophysiology of figure-ground segregation in primary visual cortex. J Neurosci. 1995;15(2):1605–15. pmid:7869121
  16. 16. Zipser K, Lamme VA, Schiller PH. Contextual modulation in primary visual cortex. J Neurosci. 1996;16(22):7376–89. pmid:8929444
  17. 17. Roelfsema PR, de Lange FP. Early Visual Cortex as a Multiscale Cognitive Blackboard. Annu Rev Vis Sci. 2016;2:131–51. pmid:28532363
  18. 18. Spitzer H, Desimone R, Moran J. Increased attention enhances both behavioral and neuronal performance. Science. 1988;240(4850):338–40. pmid:3353728
  19. 19. Vecera SP, Farah MJ. Is visual image segmentation a bottom-up or an interactive process?. Percept Psychophys. 1997;59(8):1280–96. pmid:9401461
  20. 20. Carandini M, Demb JB, Mante V, Tolhurst DJ, Dan Y, Olshausen BA, et al. Do we know what the early visual system does?. J Neurosci. 2005;25(46):10577–97. pmid:16291931
  21. 21. Deco G, Rolls ET. A neurodynamical cortical model of visual attention and invariant object recognition. Vision Res. 2004;44(6):621–42. pmid:14693189
  22. 22. Hamker FH. The reentry hypothesis: the putative interaction of the frontal eye field, ventrolateral prefrontal cortex, and areas V4, IT for attention and eye movement. Cereb Cortex. 2005;15(4):431–47. pmid:15749987
  23. 23. van Der Velde F, de Kamps M. From knowing what to knowing where: modeling object-based attention with feedback disinhibition of activation. J Cogn Neurosci. 2001;13(4):479–91. pmid:11388921
  24. 24. Craft E, Schütze H, Niebur E, von der Heydt R. A neural model of figure-ground organization. J Neurophysiol. 2007;97(6):4310–26. pmid:17442769
  25. 25. Serre T. Deep Learning: The Good, the Bad, and the Ugly. Annu Rev Vis Sci. 2019;5:399–426. pmid:31394043
  26. 26. Echeveste R, Aitchison L, Hennequin G, Lengyel M. Cortical-like dynamics in recurrent circuits optimized for sampling-based probabilistic inference. Nat Neurosci. 2020;23(9):1138–49. pmid:32778794
  27. 27. Fox KJ, Birman D, Gardner JL. Gain, not concomitant changes in spatial receptive field properties, improves task performance in a neural network attention model. Elife. 2023;12:e78392. pmid:37184221
  28. 28. Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L. Segment Anything. 2023. http://arxiv.org/abs/2304.02643
  29. 29. Li H, Xiong P, An J, Wang L. Pyramid Attention Network for Semantic Segmentation. British Machine Vision Conference 2018, BMVC 2018. 2018. http://arxiv.org/abs/1805.10180
  30. 30. Hong S, Noh H, Han B. Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation. Adv Neural Inf Process Syst. 2015;2015:1495–503.
  31. 31. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci U S A. 2014;111(23):8619–24. pmid:24812127
  32. 32. Güçlü U, van Gerven MAJ. Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream. J Neurosci. 2015;35(27):10005–14. pmid:26157000
  33. 33. Cadena SA, Denfield GH, Walker EY, Gatys LA, Tolias AS, Bethge M, et al. Deep convolutional models improve predictions of macaque V1 responses to natural images. PLoS Comput Biol. 2019;15(4):e1006897. pmid:31013278
  34. 34. Lei J, Benjamin AS, Kording KP. Object Based Attention Through Internal Gating. ArXiv. 2021. http://arxiv.org/abs/2106.04540
  35. 35. Kirchberger L, Mukherjee S, Self MW, Roelfsema PR. Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Sci Adv. 2023;9(3):eadd2498. pmid:36662858
  36. 36. Keller AJ, Roth MM, Scanziani M. Feedback generates a second receptive field in neurons of the visual cortex. Nature. 2020;582(7813):545–9. pmid:32499655
  37. 37. van Kerkoerle T, Self MW, Roelfsema PR. Layer-specificity in the effects of attention and working memory on activity in primary visual cortex. Nat Commun. 2017;8:13804. pmid:28054544
  38. 38. Self MW, van Kerkoerle T, Supèr H, Roelfsema PR. Distinct roles of the cortical layers of area V1 in figure-ground segregation. Curr Biol. 2013;23(21):2121–9. pmid:24139742
  39. 39. Self MW, Kooijmans RN, Supèr H, Lamme VA, Roelfsema PR. Different glutamate receptors convey feedforward and recurrent processing in macaque V1. Proc Natl Acad Sci U S A. 2012;109(27):11031–6. pmid:22615394
  40. 40. Roelfsema PR, Lamme VAF, Spekreijse H, Bosch H. Figure-ground segregation in a recurrent network architecture. J Cogn Neurosci. 2002;14(4):525–37. pmid:12126495
  41. 41. Poort J, Raudies F, Wannig A, Lamme VAF, Neumann H, Roelfsema PR. The role of attention in figure-ground segregation in areas V1 and V4 of the visual cortex. Neuron. 2012;75(1):143–56. pmid:22794268
  42. 42. Treue S, Martínez Trujillo JC. Feature-based attention influences motion processing gain in macaque visual cortex. Nature. 1999;399(6736):575–9. pmid:10376597
  43. 43. Self MW, Jeurissen D, van Ham AF, van Vugt B, Poort J, Roelfsema PR. The Segmentation of Proto-Objects in the Monkey Primary Visual Cortex. Curr Biol. 2019;29(6):1019-1029.e4. pmid:30853432
  44. 44. LeCun Y, Cortes C, Burges B. The MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. 2010. Accessed 2023 February 24.
  45. 45. Spoerer CJ, McClure P, Kriegeskorte N. Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition. Front Psychol. 2017;8:1551. pmid:28955272
  46. 46. Ernst MR, Triesch J, Burwick T. Recurrent Connections Aid Occluded Object Recognition by Discounting Occluders. Lecture Notes in Computer Sci. Springer International Publishing. 2019. p. 294–305.
  47. 47. Thorat S, Aldegheri G, Kietzmann TC. Category-orthogonal object features guide information processing in recurrent neural networks trained for object categorization. 2021. http://arxiv.org/abs/2111.07898
  48. 48. Michaelis C, Bethge M, Ecker AS. One-Shot Segmentation in Clutter. 35th International Conference on Machine Learning, ICML 2018. 2018;8: 5718–5727. Available: http://arxiv.org/abs/1803.09597
  49. 49. Kingma DP, Ba JL. Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. International Conference on Learning Representations, ICLR; 2015. Available: https://arxiv.org/abs/1412.6980v9
  50. 50. Moore T, Fallah M. Microstimulation of the frontal eye field and its effects on covert spatial attention. J Neurophysiol. 2004;91(1):152–62. pmid:13679398
  51. 51. van Vugt B, Dagnino B, Vartak D, Safaai H, Panzeri S, Dehaene S, et al. The threshold for conscious report: Signal loss and response bias in visual and frontal cortex. Science. 2018;360(6388):537–42. pmid:29567809
  52. 52. Bichot NP, Heard MT, DeGennaro EM, Desimone R. A Source for Feature-Based Attention in the Prefrontal Cortex. Neuron. 2015;88(4):832–44. pmid:26526392
  53. 53. Bichot NP, Xu R, Ghadooshahy A, Williams ML, Desimone R. The role of prefrontal cortex in the control of feature attention in area V4. Nat Commun. 2019;10(1):5727. pmid:31844117
  54. 54. Muckli L, De Martino F, Vizioli L, Petro LS, Smith FW, Ugurbil K, et al. Contextual Feedback to Superficial Layers of V1. Curr Biol. 2015;25(20):2690–5. pmid:26441356
  55. 55. Williams MA, Baker CI, Op de Beeck HP, Shim WM, Dang S, Triantafyllou C, et al. Feedback of visual object information to foveal retinotopic cortex. Nat Neurosci. 2008;11(12):1439–45. pmid:18978780
  56. 56. Papale P, Leo A, Handjaras G, Cecchetti L, Pietrini P, Ricciardi E. Shape coding in occipito-temporal cortex relies on object silhouette, curvature, and medial axis. J Neurophysiol. 2020;124(6):1560–70. pmid:33052726
  57. 57. Papale P, Betta M, Handjaras G, Malfatti G, Cecchetti L, Rampinini A, et al. Common spatiotemporal processing of visual features shapes object representation. Sci Rep. 2019;9(1):7601. pmid:31110195
  58. 58. Papale P, Wang F, Morgan AT, Chen X, Gilhuis A, Petro LS, et al. The representation of occluded image regions in area V1 of monkeys and humans. Curr Biol. 2023;33(18):3865-3871.e3. pmid:37643620
  59. 59. Papale P, Zuiderbaan W, Teeuwen RRM, Gilhuis A, Self MW, Roelfsema PR, et al. V1 neurons are tuned to perceptual borders in natural scenes. Proc Natl Acad Sci U S A. 2024;121(46):e2221623121. pmid:39495929
  60. 60. Ricciardi E, Papale P, Cecchetti L, Pietrini P. Does (lack of) sight matter for V1? New light from the study of the blind brain. Neurosci Biobehav Rev. 2020;118:1–2. pmid:32711007
  61. 61. Dado T, Papale P, Lozano A, Le L, Wang F, van Gerven M, et al. Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain. PLoS Comput Biol. 2024;20(5):e1012058. pmid:38709818
  62. 62. Papale P, Wang F, Self MW, Roelfsema PR. An extensive dataset of spiking activity to reveal the syntax of the ventral stream. Neuron. 2025;113(4):539-553.e5. pmid:39809277
  63. 63. Jeurissen D, van Ham AF, Gilhuis A, Papale P, Roelfsema PR, Self MW. Border-ownership tuning determines the connectivity between V4 and V1 in the macaque visual system. Nat Commun. 2024;15(1):9115. pmid:39438464
  64. 64. Seignette K, Jamann N, Papale P, Terra H, Porneso RO, de Kraker L, et al. Experience shapes chandelier cell function and structure in the visual cortex. Elife. 2024;12:RP91153. pmid:38192196
  65. 65. Kirchberger L, Mukherjee S, Schnabel UH, van Beest E, Barsegyan A, Levelt CN. The essential role of feedback processing for figure-ground perception in mice. Sci Adv.
  66. 66. Klink PC, Dagnino B, Gariel-Mathis M-A, Roelfsema PR. Distinct Feedforward and Feedback Effects of Microstimulation in Visual Cortex Reveal Neural Mechanisms of Texture Segregation. Neuron. 2017;95(1):209-220.e3. pmid:28625487
  67. 67. Kietzmann TC, Spoerer CJ, Sörensen LKA, Cichy RM, Hauk O, Kriegeskorte N. Recurrence is required to capture the representational dynamics of the human visual system. Proc Natl Acad Sci U S A. 2019;116(43):21854–63. pmid:31591217
  68. 68. Kubilius J, Schrimpf M, Nayebi A, Bear D, Yamins DLK, DiCarlo JJ. CORnet: Modeling the Neural Mechanisms of Core Object Recognition. bioRxiv. bioRxiv. 2018. p. 408385. https://doi.org/10.1101/408385
  69. 69. Jarvers C, Neumann H. Incorporating feedback in convolutional neural networks. Cognitive Computational Neuroscience. 2019.
  70. 70. Yan S, Fang X, Xiao B, Rockwell H, Zhang Y, Lee TS. Recurrent Feedback Improves Feedforward Representations in Deep Neural Networks. ArXiv. 2019 [cited 1 Jan 2021]. Available: http://arxiv.org/abs/1912.10489
  71. 71. Spoerer CJ, Kietzmann TC, Mehrer J, Charest I, Kriegeskorte N. Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. PLoS Comput Biol. 2020;16(10):e1008215. pmid:33006992
  72. 72. Kim J, Linsley D, Thakkar K, Serre T. Disentangling neural mechanisms for perceptual grouping. ArXiv. 2019. http://arxiv.org/abs/1906.01558
  73. 73. Linsley D, Kim J, Veerabadran V, Serre T. Learning long-range spatial dependencies with horizontal gated-recurrent units. Adv Neural Inf Process Syst. 2018;152–64.
  74. 74. Linsley D, Kim J, Ashok A, Serre T. Recurrent neural circuits for contour detection. 8th International Conference on Learning Representations, ICLR 2020. 2020 [cited 25 Oct 2023]. Available: http://arxiv.org/abs/2010.15314
  75. 75. Mollard S, Wacongne C, Bohte SM, Roelfsema PR. Recurrent neural networks that learn multi-step visual routines with reinforcement learning. PLoS Comput Biol. 2024;20(4):e1012030. pmid:38683837
  76. 76. Doerig A, Sommers R, Seeliger K, Richards B, Ismael J, Lindsay G, et al. The neuroconnectionist research programme. ArXiv. http://arxiv.org/abs/2209.03718. 2022. Accessed 2022 November 11.
  77. 77. Seijdel N, Tsakmakidis N, de Haan EHF, Bohte SM, Scholte HS. Depth in convolutional neural networks solves scene segmentation. PLoS Comput Biol. 2020;16(7):e1008022. pmid:32706770
  78. 78. Walther D, Koch C. Modeling attention to salient proto-objects. Neural Netw. 2006;19(9):1395–407. pmid:17098563
  79. 79. Stettler M, Francis G. Using a model of human visual perception to improve deep learning. Neural Netw. 2018;104:40–9. pmid:29705669
  80. 80. Vecera SP, O’Reilly RC. Figure-ground organization and object recognition processes: an interactive account. J Exp Psychol Hum Percept Perform. 1998;24(2):441–62. pmid:9554093
  81. 81. Peterson MA, Harvey EM, Weidenbacher HJ. Shape recognition contributions to figure-ground reversal: which route counts?. J Exp Psychol Hum Percept Perform. 1991;17(4):1075–89. pmid:1837298
  82. 82. Korjoukov I, Jeurissen D, Kloosterman NA, Verhoeven JE, Scholte HS, Roelfsema PR. The time course of perceptual grouping in natural scenes. Psychol Sci. 2012;23(12):1482–9. pmid:23137967
  83. 83. Domijan D, Setić M. A feedback model of figure-ground assignment. J Vis. 2008;8(7):10.1-27. pmid:19146243
  84. 84. Raudies F, Neumann H. A neural model of the temporal dynamics of figure-ground segregation in motion perception. Neural Netw. 2010;23(2):160–76. pmid:19931405
  85. 85. Briggs F. Role of Feedback Connections in Central Visual Processing. Annu Rev Vis Sci. 2020;6:313–34. pmid:32552571
  86. 86. Jones HE, Andolina IM, Shipp SD, Adams DL, Cudeiro J, Salt TE, et al. Figure-ground modulation in awake primate thalamus. Proc Natl Acad Sci U S A. 2015;112(22):7085–90. pmid:25901330
  87. 87. Schmid D, Neumann H. Thalamo-Cortical Interaction for Incremental Binding in Mental Contour-Tracing. Cold Spring Harbor Laboratory. 2023.
  88. 88. Schmid D, Neumann H. A model of thalamo-cortical interaction for incremental binding in mental contour-tracing. PLoS Comput Biol. 2025;21(5):e1012835. pmid:40338986
  89. 89. Berezovskii VK, Nassi JJ, Born RT. Segregation of feedforward and feedback projections in mouse visual cortex. J Comp Neurol. 2011;519(18):3672–83. pmid:21618232
  90. 90. Batardière A, Barone P, Knoblauch K, Giroud P, Berland M, Dumas A-M, et al. Early specification of the hierarchical organization of visual cortical areas in the macaque monkey. Cereb Cortex. 2002;12(5):453–65. pmid:11950763
  91. 91. Neri P. Object segmentation controls image reconstruction from natural scenes. PLoS Biol. 2017;15(8):e1002611. pmid:28827801
  92. 92. Neri P. Semantic control of feature extraction from natural scenes. J Neurosci. 2014;34(6):2374–88. pmid:24501376
  93. 93. Zuiderbaan W, van Leeuwen J, Dumoulin SO. Change Blindness Is Influenced by Both Contrast Energy and Subjective Importance within Local Regions of the Image. Front Psychol. 2017;8:1718. pmid:29046655
  94. 94. Fowlkes CC, Martin DR, Malik J. Local figure-ground cues are valid for natural images. J Vis. 2007;7(8):2. pmid:17685809
  95. 95. Papale P, Leo A, Cecchetti L, Handjaras G, Kay KN, Pietrini P, et al. Foreground-Background Segmentation Revealed during Natural Image Viewing. eNeuro. 2018;5(3):ENEURO.0075-18.2018. pmid:29951579
  96. 96. Williford JR, von der Heydt R. Figure-Ground Organization in Visual Cortex for Natural Scenes. eNeuro. 2016;3(6):ENEURO.0127-16.2016. pmid:28058269
  97. 97. Hesse JK, Tsao DY. Consistency of Border-Ownership Cells across Artificial Stimuli, Natural Stimuli, and Stimuli with Ambiguous Contours. J Neurosci. 2016;36(44):11338–49. pmid:27807174
  98. 98. Papale P, De Luca D, Roelfsema PR. Deep generative networks reveal the tuning of neurons in IT and predict their influence on visual perception. Cold Spring Harbor Laboratory. 2024.
  99. 99. Pozzi I, Bohté SM, Roelfsema PR. Attention-gated brain propagation: How the brain can implement reward-based error backpropagation. Advances in Neural Information Processing Systems. 2020.
  100. 100. van der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng. 2011;13(2):22–30.
  101. 101. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. ArXiv. 2019 [cited 29 Jan 2021]. Available: http://arxiv.org/abs/1912.01703