You prime what you code: The fAIM model of priming of pop-out

Our visual brain makes use of recent experience to interact with the visual world, and efficiently select relevant information. This is exemplified by speeded search when target- and distractor features repeat across trials versus when they switch, a phenomenon referred to as intertrial priming. Here, we present fAIM, a computational model that demonstrates how priming can be explained by a simple feature-weighting mechanism integrated into an established model of bottom-up vision. In fAIM, such modulations in feature gains are widespread and not just restricted to one or a few features. Consequentially, priming effects result from the overall tuning of visual features to the task at hand. Such tuning allows the model to reproduce priming for different types of stimuli, including for typical stimulus dimensions such as ‘color’ and for less obvious dimensions such as ‘spikiness’ of shapes. Moreover, the model explains some puzzling findings from the literature: it shows how priming can be found for target-distractor stimulus relations rather than for their absolute stimulus values per se, without an explicit representation of relations. Similarly, it simulates effects that have been taken to reflect a modulation of priming by an observers’ goals—without any representation of goals in the model. We conclude that priming is best considered as a consequence of a general adaptation of the brain to visual input, and not as a peculiarity of visual search.


Visual Features
The visual features were derived from 40,000 image patches of 31 × 31 × RGB generated by extracting 5 patches from 8,000 images. The images were drawn from the SUN database [3], but we excluded images that were black and white, were less than 300 px wide or tall, or that were labeled ''outliers' in the dataset.
Features were computed by running JADE independent component analysis [4] (ICA), after the data had been whitened by a principal component analysis that left only those components to account for at least 95% of the variance in the data. Different runs with newly sampled image patches resulted in highly corresponding features, though the number of features varied around 54±3. In the simulations used here, a single feature space of 54 features was used.
Self-information and visual pop-out Input images were decomposed into 31×31 patches. To correct for edge artifacts, images were reflected at the edges, although the stimuli in our images were placed at sufficient distance from this boundary not to be affected. Convolution of the image with the ICA unmixing matrix yields the response of each feature at each location.
These responses across images were summarized into a histogram of 1,000 bins for each feature channel. The histogram was smoothed by convolution with a Gaussian filter, which had a σ of .05s, where s is the standard deviation computed across the bins in the histogram. The resulting bin values were then normalized to sum to 1.
The resulting bin values were used to determine the probability p f (x) of all stimulus values at every location x. These were transformed into salience values, or Insight that the salience value in AIM is composed of the independent contribution of all feature channels is clear when the salience computation is rewritten as: where f refers to each feature channel; x refers to any location in the image, and is omitted from the upcoming equations.
The fAIM model fAIM generalizes this model and assumes all feature channels have a dynamic weight or 'gain' g f that determines their contribution to overall salience: The average salience level for each stimulusS is computed, which is used to compute stimulus pop-out (P ) asS T −S A . Here, T implies that the Target salience is used, but pop-out can be computed for each distractor stimulus in the same manner.
As explained in the main text, the same computation within feature channels determines the intertrial gain modulation for each feature (∆g f ) via: with w i = w e = 1 for all simulations. The gain vector used on a subsequent trial resulted from adding ∆g and bounding gain values via a sigmoid function: The gains were bounded between 0 and u = 2 / x with middle point m = 1 / x , where x = 54 was the number of feature channels in the visual representation. All gains were initialized at m, so channel gains summed to 1. The slope parameter k = 100 was chosen so that in the present values, the sigmoid would only affect values after repeated PLOS 2/3 modulations in the same direction, as can be observed in Fig. 3B.
Our model offers a mechanistic, qualitative explanation of a wide range of priming effects. For the most part, varying the model parameters merely scale the found effects, which does not lead to different conclusions than those drawn in the present work.
However, future work could explore the quantitative relation between salience and search performance in more detail, and in that case these parameters could be varied to capture variability across paradigms or individuals. For example, the w e /w i ratio could be changed to capture dissociable effects of distractor inhibition and target facilitation in priming [1], and the slope parameter k might be modulated to capture variability in the time course of priming effects [2].