Primary visual cortex as a saliency map: parameter-free prediction of behavior from V1 physiology

It has been hypothesized that neural activities in the primary visual cortex (V1) represent a saliency map of the visual field to exogenously guide attention. This hypothesis has so far provided only qualitative predictions and their confirmations. We report this hypothesis' first quantitative prediction, derived without free parameters, and its confirmation by human behavioral data. The hypothesis provides a direct link between V1 neural responses to a visual location and the saliency of that location to guide attention exogenously. In a visual input containing many bars, one of them saliently different from all the other bars which are identical to each other, saliency at the singleton's location can be measured by the shortness of the reaction time in a visual search task to find the singleton. The hypothesis predicts quantitatively the whole distribution of the reaction times to find a singleton unique in color, orientation, and motion direction from the reaction times to find other types of singletons. The predicted distribution matches the experimentally observed distribution in all six human observers. A requirement for this successful prediction is a data-motivated assumption that V1 lacks neurons tuned simultaneously to color, orientation, and motion direction of visual inputs. Since evidence suggests that extrastriate cortices do have such neurons, we discuss the possibility that the extrastriate cortices play no role in guiding exogenous attention so that they can be devoted to other functional roles like visual decoding or endogenous attention.


Introduction
Spatial visual selection, often called spatial attentional selection, enables vision to select a visual location for detailed processing using limited cognitive resources [15]. Metaphorically, the selected location is said to be in the attentional spotlight, which typically coincides with the spatial zone centered on gaze position. Hence, a visual input outside the spotlight, e.g., a letter in a word on this page more than 10 letters from the current fixation location, is difficult to recognize. Therefore, if one is to find a particular word on this page, the reaction time (RT) to find this word will depend on how long it takes the spotlight to arrive at the word location. The spotlight can be guided by goal-dependent (or top-down, endogenous) mechanisms, such as when we direct our gaze to the right words while reading, or by goal-independent (or bottom-up, exogenous) mechanisms such as when we are distracted from reading by a sudden appearance of something in visual periphery. In this paper, an input is said to be salient when it strongly attracts attention by bottom-up mechanisms, and the degree of this attraction is defined as saliency. For example, an orientation singleton such as a vertical bar in a background of horizontal bars is salient, so is a color singleton such as a red dot among many green ones; and the location of such a singleton has a high saliency value.
Therefore, saliency of a visual location can often be measured by the shortness of the reaction time in a visual search to find a target at this location [37], provided that saliency, rather than top-down attention, is the dictating factor to guide the attentional spotlight. It can also be measured in attentional (exogenous) cueing effect in terms of the degree in which a salient location speeds up and/or improves visual discrimination of a probe presented immediately after the brief appearance of the salient cue [28,29].
Traditional views presume that higher brain areas such as those in the parietal and frontal parts of the brain are responsible for saliency, i.e., to guide attention exogenously [37,7,40,15]. This belief was partly inspired by the observation that saliency is a general property that could arise from visual inputs with any kind of feature values (e.g., vertical or red) in any feature dimension (e.g., color, orientation, and motion) whereas each neuron in lower visual areas like the primary visual cortex is (more likely) tuned to specific feature values (e.g., a vertical orientation) rather than general visual features. However, it was proposed a decade ago [24,25] that the primary visual cortex (V1) computes a saliency map, such that the saliency value of a location is represented by the highest response among V1 neurons to this location relative to the highest responses to the other visual locations, regardless of the preferred features of neurons giving such responses. Although this V1 saliency hypothesis is a significant departure from traditional psychological theories, it has received substantial experimental support [48,19,47,16,43,41,44], detailed in [46]. In particular, behavioral data confirmed a surprising prediction from this hypothesis that an eye-of-origin singleton (e.g., an item uniquely shown to the left eye among other items shown to the right eye) that is hardly distinctive from other visual inputs can attract attention and gaze qualitatively just like a salient and highly distinctive orientation singleton does. In fact, observations [43,44] show that an eyeof-origin singleton can be even more salient than a very salient orientation singleton. This finding provides a hallmark of the saliency map in V1 because the eye-of-origin feature is not explicitly represented in any visual cortical area except V1. (Cortical neurons, except many in V1, are not tuned to eye-of-origin feature [13,2], making this feature non-distinctive to perception.) Functional magnetic resonance imaging and event related potential measurements also confirmed that, when top-down confounds are avoided or minimimzed, a salient location evokes brain activations in V1 but not in the parietal and frontal regions [41].  Figure 1: V1 saliency hypothesis states that the bottom-up saliency of a location is represented by the maximum V1 response to this location. In this schematic for illustration, V1 is simplified to contain only two kinds of neurons, one tuned to color (their responses are visualized by the purple dots) and the other tuned to orientation (black dots). Each input bar evokes responses in a cell tuned to its color and another cell tuned to its orientation (indicated for two input bars by linking each bar to its two evoked responses by dotted lines), and the receptive fields of these two cells cover the same retinal location even though (for better visualization) the dots representing these cells are not exactly overlapping in the cortical map. Iso-feature suppression makes nearby V1 neurons tuned to similar features (e.g., similar color or similar orientation) suppress each other. The orientation singleton in this image evokes the highest V1 response because the orientation tuned neuron responding to it escapes iso-orientation suppression. The color tuned neuron tuned and responding to the singleton's color is under iso-color suppression. The saliency map is likely read out by the superior colliculus to execute gaze shifts to salient locations.
So far, the existing tests of the V1 saliency hypothesis have been qualitative. Here, we report its first quantitative prediction that is derived without free parameters. This prediction is of the distribution of the reaction times in a visual search for a singleton bar defined by its uniqueness in color, orientation, and motion direction among uniformly featured background bars. This prediction can be made because the hypothesis directly links the response properties of V1 neurons with the reaction times of visual searches. Specifically, according to the hypothesis, the saliency of a visual location is represented by the maximum of the responses of V1 neurons to this location, regardless of the input feature selectivity of the neurons concerned [24,25]. For example, a visual input in Fig. 1 contains many colored bars, each activates some V1 neurons tuned to its color and/or orientation. The highest response to each bar signals the saliency of its location according to the V1 hypothesis, regardless of whether the V1 neuron giving this (highest) response is tuned to the color or orientation (or both color and orientation) of the bar. These highest V1 responses for various visual locations thus represent a saliency map of the scene. This saliency map may potentially be read out by the superior colliculus, which receives mono-synaptic input from V1 and controls eye movement to execute the attentional selection [33]. If an observer searches for a uniquely oriented bar in the retinal image in Fig. 1, the reaction time to find this bar, associated with the saliency of the target location, should thus be associated with the highest V1 response to the target. In particular, a shorter reaction time should result from a larger value of the highest response to the search target (when the highest responses to various non-target locations are fixed). As will be explained in detail, a feature singleton, e.g., an orientation singleton, tends to be the most salient in a scene because it tends to evoke the highest V1 response to the scene due to iso-feature suppression [24], the mutual suppression between nearby neurons preferring the same or similar features [1]: iso-feature suppression makes neurons responding to a non-singleton item suppressed by other neurons responding to neighboring non-singletons sharing the same or similar features. As will be shown, the direct link between the reaction times and V1 responses assumed by the V1 saliency hypothesis, together with V1's neural response properties (in particular iso-feature suppression and feature selectivities by the neurons), enables a quantitative prediction on the distribution of the reaction times without any free parameters. Furthermore, we will show that this prediction matches behavioral reaction time data quantitatively.
In addition, this paper explores the implications of the confirmation of this quantitative pre-diction by experimental data. We will show that the prediction arises when the cortical area(s) responsible for computing saliency satisfies two requirements, one functional and one physiological.
The functional requirement is, as stated by the V1 saliency hypothesis, that the saliency of a location is signalled by the highest response to that location among the responses from the cortical neurons. The physiological requirement is that the saliency computing cortical area(s) should have the following properties found in V1: a neuron's response should be tuned to color, orientation, or motion direction, or tuned simultaneously to any two of these three feature dimensions; however, there should be few neurons tuned simultaneously to all the three feature dimensions [13,26,12] with the ubiquitously associated iso-feature suppression. Hence, the confirmation of the prediction enables us to identify possible candidate brain areas for saliency computation. In principle, if an extrastriate area also satisfies the physiological requirement, it might also play a role in saliency together with V1. We will discuss experimental evidence on whether the extrastriate cortical areas satisfy this physiological requirement and thus whether they can be excluded from playing a role in saliency. Parts of this work have been presented in abstract form elsewhere [50,45].

Results
In this section, starting from an overview of the background of V1 mechanisms and the V1 saliency hypothesis, we show a direct link between the reaction time to find a visual feature singleton in a homogeneous background (like that in Fig. 1) and the highest V1 response to this singleton.
From this link, we derive the quantitative prediction of the hypothesis and present its experimental test using behavioral data. In this process, we also present some related but spurious theoretical predictions that should be violated unless certain conditions on the V1 neural mechanisms hold.
These spurious predictions and their tests (falsification) by behavioral reaction time data not only help to provide further insights in the underlying neural mechanisms but also help to illustrate and verify our methods.
Iso-feature suppression between neurons as the mechanism for high saliencies of feature singletons In the retinal image of Fig. 1, the location of an orientation singleton, a left-tilted bar in a background of right tilted bars, is most salient. This is because a V1 neuron tuned to its orientation, with its receptive field covering the bar, responds more vigorously than any neuron responding to the background bars. Note that, throughout the paper, 'a neuron responding to a bar' means the most responsive neuron among a local population of neurons with similar input selectivities responding to this bar regardless of the number of neurons in this local population. The higher response to the orientation singleton is due to iso-orientation suppression between nearby neurons tuned to same or similar orientations [1,18,23]. Hence, neurons responding to neighboring background bars suppress each other because they are tuned to the same or similar orientation, whereas the neuron responding to the orientation singleton escapes such suppression because it is tuned to a very different orientation.
In addition to the orientation feature, V1 neurons are also tuned to other input feature dimensions including color, motion direction, and eye of origin [14,26]. Hence, each colored bar in the retinal image of Fig. 1 evokes not only a response in a cell tuned to its orientation but also another response in another cell tuned to its color (omitting other input features for simplicity), this is indicated by the dotted lines linking the two example input bars and their respective evoked V1 responses. In general, there are many V1 neurons whose receptive fields cover the location of each visual input item (including neurons whose preferred orientation or color does not match the visual input feature), and only the highest response from these neurons represents the saliency of this location according to the V1 saliency hypothesis (note that this highest response is unlikely to be from a neuron whose preferred feature is not in the input item). In the example of Fig. 1, responses from the color tuned neurons to all bars suffer from iso-color suppression [39], which is analogous to iso-orientation suppression, since all bars have the same color. Focusing on V1 neurons tuned to color only and neurons tuned to orientation only for simplicity, the highest response evoked by the orientation singleton is in the orientation-tuned rather than the color-tuned cell, and this response alone (relative to the responses to the background bars) determines the saliency of the orientation singleton. Later in the paper, the notion that many V1 neurons respond to a single input location or item will be generalized to include neurons tuned to motion direction and neurons jointly tuned to multiple feature dimensions. Determining the highest V1 response to each input location will involve determining which of the many neurons whose receptive fields cover this location has the highest response.
Analogous to iso-orientation suppression and iso-color suppression, iso-motion-direction and isoocular-origin suppressions are also present in V1 [1,18,23,5,39,17], and we call them iso-feature suppression in general [24]. Accordingly, an input singleton in any of these feature dimensions should be salient (see Fig. 2B for a color singleton), since the neuron responding to the unique feature of the singleton escapes the iso-feature suppression of the neurons responding to the uniformly featured background items. This is consistent with known behavioral saliency and has led to the successful prediction of the salient singleton in eye-of-origin [43]. Iso-feature suppression is believed to be mediated by intra-cortical neural connections [31,9] linking neurons whose receptive fields are spatially nearby but not necessarily overlapping.
The feature-blind nature of saliency representation in V1 According to the V1 saliency hypothesis, it is only V1 response vigor that matters for saliency, and not the visual feature value concerned. Let us compare Fig. 2A and Fig. 2B: one has an orientation (O) singleton and one has a color (C) singleton, and they share the same background bars. In each image, the singleton should activate some neurons which are orientation tuned and some other neurons which are color tuned (for the moment, we omit for simplicity neurons tuned simultaneously to color and orientation). In Fig. 2A, the most activated neuron by the singleton is orientation tuned due to iso-orientation suppression; color tuned neurons responding to any bar, singleton or otherwise, suffer iso-color suppression. In Fig. 2B, the most activated neuron is color tuned instead; and the orientation tuned neurons responding to any bar suffer iso-orientation suppression. However, if the highest responses evoked by the two singletons are identical, then the two singletons are equally salient (assuming that the population responses to the background bars in the two images are identical), even though different singletons evoke this highest response in neurons tuned to different feature dimensions. Conversely, if the respective highest responses evoked by the two singletons are different from each other, the singleton evoking the higher response is more salient, regardless of neurons giving the highest responses. The feature-blind nature of this saliency representation in V1 enables the brain to have a bottom-up saliency map in V1, despite the feature tuning of V1 neurons, without resorting to higher cortical areas such as the frontal eye field or lateral-intraparietal cortex [10,15].

Linking V1 responses with reaction times
When the effect of top-down attentional guidance is negligible in a visual search task, a higher saliency at the target location should lead to a shorter reaction time to find the target, by the A V1 neuron tuned to orientation signals the singleton's saliency signals the singleton's saliency A V1 neuron tuned to color A V1 neuron tuned to color only, orientation only, or to both color and orientation, whichever one with the highest response, signals the singleton's saliency A: orientation (O) singleton B: color (C) singleton C: color-orientation (CO) double-feature singleton V1 responses to the unique feature of the singleton is free from the iso-feature (e.g., iso-orientation and iso-color) suppression, which are present between nearby V1 neurons tuned to similar features Figure 2: Schematics of visual stimuli for singleton searches. Due to iso-feature suppression, the most active response to each image is from a neuron responding to the singleton bar. This most activated neuron is tuned to orientation in image A, tuned to color in image B, and to color, orientation, or both features of the singleton for image C. The highest V1 response to the singleton signals the saliency of its location.
definition of saliency. In stimuli like those in Fig. 2, the feature singletons are so salient that we can assume that its saliency dictates the immediate attention shifts upon the appearance of the stimuli. It is typical that the first saccade after the appearance of such search images is directed to the feature singleton. However, the latency of this attentional shift is shorter for a more salient singleton. Assuming a fixed additional latency from the shift of attention to the singleton to an observer's response to report the singleton, then the reaction time for the visual search task is determined by the singleton's saliency.
Let a visual image (or scene) have visual input items at n locations i = 1, 2, ..., n, and let r i be the highest V1 response (among multiple responses from multiple V1 neurons) evoked by location i.
Then the saliency of location i is determined by the value of r i relative to other values r j for j = i. This is because, according to the V1 saliency hypothesis, saliency read-out process is like an auction for attention, with r i the bidding price for attention by location i, such that the location giving the highest bid is the most likely to win attention [42]. Let us order r i such that then the first location is the most salient in the scene. Formally, we can use a function g(.) to describe saliency at the most salient location = g(r 1 |r 2 , r 3 , ..., r n ).
In this paper, we are concerned only with visual scenes like those in Fig. 2. Each of such a scene is called a feature singleton scene in this paper. It has one feature singleton in a background of many items which are identical to each other, and the feature singleton is far more salient than all other input items. Then, r 1 is the highest response evoked by the singleton and is substantially and significantly larger than any r i for i > 1. For example, the background bars may evoke r i (for i > 1) that are no more than 10 spikes/second whereas the singleton evokes a r 1 that is no less than 20 spikes per second. In such a feature singleton scene when n is very large (e.g., 660 in the visual stimuli we will use later), we can reasonably expect that g(r 1 |r 2 , r 3 , ...), the saliency of the singleton, depends on (r 2 , r 3 , ...) mainly through the statistical properties across the r i 's rather than the exact value of each r i for i > 1. For example, the statistical properties of (r 2 , r 3 , ..., r n ) can be partly characterized by the averager and standard deviation σ across r i 's for i > 1; then a singleton with a larger (r 1 −r)/σ, and perhaps also a larger r 1 /r, tends to be more salient [24]. More strictly, the function g(.) in equation (2) may also depend on the locations x i of visual inputs associated with r i for all i. However, this paper assumes that this dependence is negligible when we restrict our visual scenes to the singleton scenes satisfying the following: (1) the eccentricity of the singleton from the center of the visual field is fixed, (2) different r i and r j are sufficiently similar for all i > 1 and j > 1 (j = i) and the spatial distribution of the locations of the non-singleton items (whose r i are those with i > 1) is approximately fixed. This paper is concerned only with scenes which are assumed to satisfy these conditions. Therefore, given a fixed invariant background response distribution shared by a set of feature singleton scenes, we can assume that the saliency of the singleton can be approximately seen as depending only on the highest response r 1 to the singleton. Then, we can omit the explicit expression of (r 2 , r 3 , ...) in equation (2) and write (still using the same notation g(.) for convenience) the saliency of the singleton location = g(r 1 , the highest response to the singleton).
The g(r) monotonically increases with r, and its exact dependence on r is determined by the properties of the invariant background response distribution. Since a larger saliency at the location of the singleton should give a shorter reaction time to find it (assuming again top-down factors are negligible), we can write this reaction time also directly as a function of the highest response to the singleton: the reaction time to find a feature singleton = f (r 1 , the highest response to the singleton), (4) in which f (.) is a monotonically decreasing function of its argument. The exact form of f (.) should depend on the invariant background response distribution and on the saliency read-out system. It can also depend on the observer (e.g., some observers can respond faster than others). We will see that these details about f (.) do not matter in our study. Regardless of these details, among feature singleton scenes sharing an invariant background response distribution, two singletons evoking the same highest V1 response should require the same reaction time to find them for a given observer, at least statistically, and the singleton evoking a larger V1 response (the highest response) should require a shorter reaction time. With this, the reaction time of a visual search for a feature singleton is directly linked to the V1 responses.

A race model
Let us apply equation (4) to the singleton scenes like those in Fig. 2, when these scenes share an invariant background response distribution. For ease of argument, we start first by a simplified toy V1 which is assumed to have only two kinds of neurons, one tuned to color only and one tuned to orientation only. This assumption is untrue in the real V1; we make it temporarily in this toy V1 to illustrate the method. Furthermore, we assume that V1 responses are deterministic rather than stochastic given a visual input. (These simplifications will be removed later.) Let r O or r C , respectively, denote the response of the orientation tuned neuron or the color tuned neuron to the singleton in Fig. 2A or Fig. 2B, respectively. Due to iso-feature suppression, r O and r C are also the highest responses to the respective singletons. Let RT O and RT C denote the reaction times to find the orientation and color singletons, respectively. Then, according to equation (4).
Consider now the case that the singleton bar is unique in both orientation and color, as in Fig  Analogously, the response r C of the color tuned neuron to the singleton should be identical in Fig   2B and Fig 2C. Hence, the maximum V1 response to the singleton in Fig 2C is max r C , r O (where max(.) means to take the maximum value among the arguments), and the reaction time RT CO to find the double-feature singleton is when we combine equations (4) and (5) and the fact that f (.) is a monotonically decreasing function (min(.) means to take the minimum value of the arguments).
Hence, in the toy V1 which has only neurons tuned to orientation only and neurons tuned to color only but no neurons tuned to both, the V1 saliency hypothesis predicts that the double-feature singleton should be as salient as the more salient of the two single-feature singletons, such that the reaction time to find the CO singleton is the shorter one of the reaction times for the single-feature singletons. If for example RT C = 400 millisecond (ms) and RT O = 500 ms, then RT CO = 400 ms.
The equation describes the deterministic version of a race model [30] often used to model a behavioral reaction time as the shorter reaction time of two or more underlying processes with their respective reaction times. For our example, it is as if the reaction time for the CO singleton is the winning reaction time in a race between two racers with their respective reaction times. We note that this race model In any case, independently of each other for the responses to the double-feature singleton, then the stochastic version of the race model (in equation (7)) is in which RT C and RT O are independent random samples from their respective distributions. For example, if the average of RT C and RT O are 400 and 500 ms, respectively, the average of RT CO will be shorter than 400 ms by this stochastic race model, since each sample of RT CO is the race winner of the two samples RT C and RT O . This reflects statistical facilitation in this race model between the two single-feature singletons. For simplicity, we use as a shorthand for equation (9), with the notation x P = y to mean that x and y have the same probability distribution.
The race model, or race equality, RT CO P = min(RT C , RT O ) is a prediction of the V1 saliency hypothesis if one were hypothetically to assume a toy V1 in which there is no V1 neuron which can respond more vigorously to the double-feature singleton than the orientation-only tuned neuron and the color-only tuned neuron. This assumption is wrong, even though it enables us to predict the distribution of RT CO from those of RT C and RT O . Next we show that the predicted distribution of RT CO does not agree with the behavorial data previously collected by Koene and Zhaoping [19] (see Methods section).
Each of the above equalities should hold when V1 is assumed to have no neurons, i.e., the CM, MO, or even CMO neurons, which are tuned to more than one feature dimension and can respond more vigorously to the corresponding double-feature (or triple-feature) singleton than it does to the Seven types of singleton pop-out stimuli; the most activated neural responses are indicated under each stimulus C singleton M singleton O singleton Figure 4: Schematics of the seven kinds of feature singletons. Each bar is colored green or purple (of the same luminance in the behavioral experiment), tilted to the left or right from vertical by the same absolute tilt angle, moving to the left or right by the same motion speed. The motion direction is schematically illustrated by an arrow pointing to the left or right. Under each schematic, the non-trivial neural responses (e.g., these responses are expected to be substantially higher than the responses to the background bars) evoked by the singleton are listed.

Figure 5:
The observed and predicted distributions of reaction times for a double-or triplefeature singleton, using four different race models (race equalities), RT CO , in a race between the corresponding racers whose reaction times are those of the corresponding single-feature singletons. The data is from the same subject SA already shown in Fig. 3, panel A shows the same information as that in the bottom panel of Fig. 3. In each panel, the distributions of the predicted and the observed reaction times, respectively, are significantly different from each other. different from the observed distribution.
V1 neurons tuned conjunctively to color and orientation predict that RT CO is likely shorter than predicted by the race model Here we show that, because real V1 contains neurons (we call CO neurons) that are tuned simultaneously to color and orientation [26], the predicted RT CO using equality RT CO can be longer than the observed RT CO . Neurons tuned to color or orientation only are referred to as C or O neurons. Iso-feature suppression implies that the CO neuron responds more vigorously to a CO singleton than to a background bar. Let r CO denote the response of the CO neuron, the maximum response to the CO singleton is then max r C , r O , r CO , and according to equation (3), Additionally, the CO neuron's response to the single-feature C and O singletons are also likely higher than its response to a background bar. For example, a CO neuron tuned to green color and left tilt will respond to a green, left-tilted bar when this bar is a C singleton (in a background of purple, left-tilted, bars), or an O singleton (in a background of green, right-tilted, bars), or an CO singleton (in a background of purple, right-tilted, bars). The response level r CO is likely to distinguish between the three types of singletons, since r CO is under iso-orientation suppression when the bar is a C singleton, under iso-color suppression when the bar is an O singleton, and is free from iso-feature suppression when the bar is a CO singleton. To distinguish these responses, we use r CO α to denote the response of the CO neuron to a singleton α = C, O, or CO. For completeness, we use r CO B to denote the CO neuron's response to a background bar. Since r CO B suffers from both iso-color and iso-orientation suppression it is likely that r CO B < r CO α for α = C, O, and CO.
To be consistent and systematic, we similarly use r C α and r O α to denote C and O neural response to a singleton bar α = C, O, and CO or a background bar α = B. For example, the responses of the C neuron to the four kinds of input bars are written as r C C , r C O , r C CO , and r C B . We have previously ignored r C O and identified r C CO with r C C since we argued that (ignoring response stochasticity for simplicity) because a C neuron response should only be affected by the presence of absence of iso-color suppression to make the orientation feature irrelevant. Similarly, the O neuron has the follow two distinct levels of responses, We will refer to neural responses such as that can be equated with the same neurons' responses to a background bar as trivial responses.
Note that the meaning of, e.g., C, in a mathematical expression here depends on whether it is a superscript or a subscript. As a superscript in, e.g., r C it means that the neuron giving the response is tuned to the color (C) feature; as a subscript in, e.g., r O C or RT C it means the visual input bar evoking the response or reaction time is a color (C) singleton. For simplicity and without loss of validity, we always ignore responses from neurons not tuned to the feature(s) of the bars, since their responses will always be smaller and will not affect the saliency values dictated by the maximum response to each location.
Combining equation (3) with the equations above, we have Since a C singleton bar is more salient than a background bar, by V1 saliency hypothesis, its maxi- equations (17)(18) to have The above two equalities (compare them with equations (17) and (18) This can be seen by reminding ourselves that a r O C (= r O B ) is a trivial response (i.e., statistically the same as the neuron's response to a background bar) to a C singleton whereas r C O (= r C B ) is a trivial response to an O singleton. Continuing from equation (20), This equality is the extension of equation (21) to multiple (two or more) reaction times (for multiple singletons, each alone in one input scene), and holds for all our singleton scenes. It will be used to derive other race equalities.
This requirement can be met either by which means that r CO CO and max r CO C , r CO O have the same distribution. Note that inequality (24) can be satisfied when the CO responses r CO C , r CO O , and r CO CO are negligible relative to the C and O responses r C C and r O O .
The two conditions, equations (24) and (25), can both be satisfied when CO neurons are absent In this paper, a prediction e.g., a predicted equality such as is called a spurious prediction if the neural properties (such as the two conditions above) upon which it relies are either known to be violated in V1, or whose presence in V1 is largely uncertain. Whether the neural properties required for a spurious prediction can be satisfied or not may depend on individual observers, whose neural and behavioral sensitivities and feature selectivities are likely individually specific (e.g., some observers may be color weaker than others).
Meanwhile, the race equality RT CO P = min(RT C , RT O ) is likely broken when the CO neurons are present. Iso-feature suppression makes it likely that where x means the ensemble average of x. If so, the equality RT CO replaced by a race inequality Hence, the V1 saliency hypothesis makes the qualitative prediction that RT CO is likely to be statis- Similarly, V1 also contains MO neurons that are tuned simultaneously to orientation and motion direction [13]. Hence, the following inequality analogous to RT CO < min(RT C , RT O ) , is likely to hold. However, V1 is reported to contain few CM neurons that are tuned simultaneously to color and motion direction [12], although conflicting reports [12,27,36]  The inequality RT αα ′ < min(RT α , RT α ′ ) for α or α ′ = C, M , or O and α = α ′ is called a double-feature advantage or redundancy gain, and has been observed previously. Focusing on the time bins for the shortest reaction times, Krummenacher et al [20] showed that the density of RT CO in these bins were more than the summation of the densities of the racers RT C and RT O . Koene and Zhaoping [19] showed that RT Zhaoping and Zhe [49].
see Methods for its proof. The left side of the equality is the race outcome from four racers with their respective reaction times as RT CMO , RT C , RT M , and RT O , and the right side is the race outcome of another three racers with their respective reaction times as RT CM , RT CO , and RT MO . This race equality thus states that the two different races produce the same distribution of winning reaction times. Since we are quite confident about the condition (that CMO cells are absent in V1) behind this equality, we call this a non-spurious race equality. It enables us to predict the distribution of RT CMO from those of the other six types of reaction times. We call this prediction our non-spurious prediction, which can be compared with behaviorial data as shown next.
The non-spurious race equality holds across all six observers One may ask whether our non-spurious equality (equation (30)) is hard to falsify because it has a different and more complex structure than our spurious equalities RT αα ′ P = min (RT α , RT α ′ ) and  equality (equation (30)), the RT CMO to be predicted has to race with some other types of reaction times to contribute to the equality, making its prediction more complex (see Methods). To show that this complexity in the race equality does not prevent a falsification of a spurious equality, we create three new spurious equalities that have the same complexity as our non-spurious equality but can be falsified by our behavioral data. Listing our non-spurious equality with these three newly Observed densities P (RT CMO ) and those predicted from the non-spurious race equality for six observers Figure 7: Observed and predicted distributions of RT CMO using the non-spurious race equality for six observers, including observer SA whose details are shown in Fig. 6. The predictions agree with data for all observers, indicated by the p > 0.05. times, implying that the original equality is not violated for the shortest reaction times, which are most likely to be race winners (for the non-spurious equality) to be influential. Among all the six observers, each corollary spurious equality is broken in fewer observers than its original spurious equality. Nevertheless, the breakings of the corollary spurious equalities, which are just as complex as our non-spurious equality, demonstrate that complexity of a race equality is insufficient to prevent a falsification. Observed P (RT CMO ) and those predicted from four equalities sharing a similar structure A: from the non-spurious equality (RE 1 ) B: from the spurious equality RE 6 C: from the spurious equality RE 7 D: from the spurious equality RE 8 Figure 8: The predicted and observed P (RT CMO ) from the non-spurious equality and three spurious ones sharing similar complexities, listed in equations (31)(32)(33)(34). These equalities are also denoted by RE 1 , RE 6 , RE 7 , and RE 8 in Table 1.
Qualitative conclusions from our reaction time data despite sensitivity of some findings to parameter variations in our data analysis method  (44)). In this section, some general statistics of our findings across 5 × 4 × 4 = 80 (or 5 × 4 × 4 × 4 = 320 for the more complex equalities) variations of the technical parameters are presented. In particular, we show the number of observers whose data break each spurious or non-spurious race equality, averaged across the variations of the technical parameters in the testing method.
For convenience, Table 1  For each equality, one of the reaction times involved is designated as the one whose distribution will be predicted from those of the other reaction times using the equality. This designated reaction time is named as RT goal in Table 1. It is always the one for the singleton with the largest number of unique features, thus it tends to be the shortest reaction time and thus is more precisely determined, by the nature of the race(s), from the other reaction times involved in the race equality. Hence RT CMO is the RT goal for all race equalities except RE i with i = 2-4, whose RT goal are RT αα ′ for αα ′ = CO, MO, and CM , respectively.  Whether a race equality can be falsified by data from a particular observer depends on several factors. First, as mentioned before, it may depend on the observer, as there may be inter-observer difference in terms of the V1 properties and visual sensitivities. For example, some observers may be more color weak than others. Second, even when a race equality is truely false for a particular observer, it may appear to hold from this observer's behavioral data when there are not enough samples of reaction time data to achieve a sufficient statistical power for revealing a difference between prediction and observation (especially when the deviation from a race equality is small). Meanwhile, even when a race equality is fundamentally true, there is a 5% chance to find it accidentally broken by behavioral data. This is because, by definition (see Methods), a null hypothesis proclaiming the race equality is declared as false when the distance D between the predicted (by the race equality) and observed distributions of reaction times is larger than 95% of the random samples of the distances D in the situation when the null hypothesis strictly holds. Third, empirically, we observed that in some occasions the technical parameters in our procedure can also affect whether a race equality is falsified by data.

Fractions of tests showing violation of race equalities for individual observers (color coded)
for RE i with i = 1, 2, 3, 4 for RE i with i = 5, 6, 7, 8 Figure 9: The fraction of the tests of each race equality in which the equality is falsified for each observer. Each observer is color coded by a unique color: red, white, green, blue, cyan, magenta, yellow, or black (our example observer SA is coded by red). Different tests of a race equality use different sets of parameters in the testing method to include all possible combinations of the parameter values specified in the Methods section. Each race equality is tested on six observers except for RE i for i = 2-4 which include two additional observers coded by yellow and black data bars, respectively. Each equality RE i for i = 2-4 and its corollary equality RE i+4 are positioned in a vertically aligned manner for easy of comparison.
Given a race equality and an observer, if parameter variations for the tests do not sensitively affect the qualitative outcome of the test, then the fraction of all the (80 or 320) tests in which the equality is found broken should be close to 1 or 0 to indicate that the race equality is consistently broken or maintained, respectively. Fig. 9 plots these fractions across observers and race equalities.
Among 54 different combinations of observers and race equalities, 34 give this fraction as either larger than 95% or smaller than 5%, and 11 have this fraction closer to 50% than to either 100% or 0%. Sensitivity of the test to the test parameters are mainly caused by the sensitivity to the metric used to measure the difference between the predicted and observed distributions of reaction times.
We found that for some observers in some race equalities, e.g., observers marked by white, blue, and cyan color for RE 2 , a race equality is consistently broken using one metric and consistently maintained using another metric, (almost) regardless of the variations of the other parameters for the tests.
Since each observer has a 5% chance to accidentally break a true race equality, one expects that, among N = 6 or 8 observers, an average of 0.05 · N = 0.3 or 0.4 observers, respectively, to break a true race equality accidentally. More generally, there is a chance of N ! n!(N −n)! 0.05 n 0.95 N −n that n out of N observers will break this true race equality accidentally. Accordingly, for six observers, there is a chance probability of 27%, 3%, or 0.2%, respectively, that at least one, two, or three observers accidentally break a true race equality; for eight observers, the corresponding chance becomes 34%, 6%, or 0.6% respectively. Therefore, if more than one or two out of six or eight observers, respectively, break a race equality, we say that the equality is broken or incorrect since such a high tendency of equality breaking can happen only by a chance of less than 0.05 for a truely correct race equality.  Race equalities Figure 10: Average numbers of observers to break various race equalities, as shown in blue data points whose error bars denote standard deviations. The non-spurious race equality is RE 1 . Data from 6 observers were tested for race equalities RE 1 and RE i for i ≥ 5 and data from 8 observers were tested for RE i for i = 2-4. Applying a test of a given race equality to all the observers gives a number of observers breaking this equality, and the average of this number over 80 (for RE i with i = 2-5) or 320 (for RE 1 and RE i with i > 5) tests, each characterized by a unique set of parameters in the testing method, gives the blue data point for this equality. The background shadings visualize the probabilities of at least a certain number of observers breaking a true race equality accidentally, those probabilities larger than 0.05 are visualized by shades with a red hue. Note that the number of observers in this probability representation is an integer number, whereas the data points are generally non-integers since they are averages of integer numbers. This finding is consistent with physiology that there are CO and MO neurons in V1 [13,26]. Third, the spurious prediction RE 4 for equality RT CM P = min(RT C , RT M ) is marginally broken, or not as seriously broken as the spurious predictions RE 2 and RE 3 , since it is broken by an average of only 2.4 out of eight observers. This is consistent with the idea that V1 has fewer CM neurons compared to CO and MO neurons, and is consistent with the controversy in experimental reports [12,27,36] regarding the presence or absence of the CM cells. Fourth, the spurious prediction RE 5 for equality We note that our non-spurious RE 1 and the spurious RE i for i = 6-8 have very similar structures and use the same technical procedure to predict RT CMO from the same set of other types of reaction times. Hence, a clear rejection of race equality RE 7 by our data indicates that our data have a sufficient statistical power to reject our non-spurious equality RE 1 if it were as clearly incorrect as RE 7 . Therefore, our non-spurious V1 prediction is confirmed at least within the resolution provided by the statistical power of our data. This resolution is manifested in Fig. 8 in which it can clearly distinguish between the two reaction time distributions depicted in red and blue curves in Fig. 8B or Fig. 8C but not in Fig. 8A or Fig. 8D.

Discussion
The main finding quantitatively with behavioral data in all six observers such that the distribution of RT CMO can be quantitatively predicted from those of the other types of reaction times of the same observer without any free parameters. This prediction is derived using the following four essential ingredients: (1) the V1 saliency hypothesis that the highest V1 neural response to a location relative to the highest V1 responses to other locations signals this location's saliency, (2) the feature-tuned neural interaction, in particular iso-feature suppression, that depends on the preferred features of the interacting neurons (e.g., whether the neurons have similar preferred features) and causes higher responses to feature singletons, (3) the data-inspired assumption that V1 does not have CMO neurons tuned simultaneously to color, motion direction, and orientation, and (4) the monotonic link (within the definition of saliency) between a higher saliency of a location and a shorter saliency-dictated reaction time to find a target at this location. Hence, our finding supports the direct functional link between saliency of a visual location and the maximum (rather than, e.g., a summation) of the neural responses to this location, as prescribed by the V1 saliency hypothesis. Additionally, it means that saliency computation (at least for our singleton scenes) essentially employs only the following neural mechanisms: feature-tuned interaction between neighboring neurons (in particular iso-feature suppression) and a lack of CMO neurons, both available in V1, and mechanisms which are absent in V1 are not needed.

The supporting findings
In addition, the following qualitative findings are obtained.  Table 1. These relationships include the relative degrees of spuriousness between predictions and the dependence of some predictions on the non-spuriousness of some other predictions and certain properties of behavioral reaction times. The outcomes of testing the other five predictions using behavioral reaction times are consistent with the predicted relationships, lending further support to the idea that the saliency-dictated behavioral reaction times are indeed directly linked with the V1 neural responses as prescribed by the V1 saliency hypothesis.

Implications for the V1 saliency hypothesis
Previously, the V1 saliency hypothesis provided only qualitative predictions. An example is the prediction that an ocular singleton should be salient [43], another is the prediction that a very salient border between two textures of oblique bars can be made largely non-salient by a superposed checkerboard pattern of horizontal and vertical bars (in a way unexpected from traditional saliency models) [47]. The first one qualitatively predicts that the reaction time to find a visual search target should be shorter when this target is also an ocular singleton, but it cannot quantitatively predict how much shorter this reaction time should be. Similarly, the second one predicts that the reaction time to locate the texture border should be substantially prolonged, but not how much prolonged, by the presence of the superposing texture. Confirmations of these qualitative predictions not only support the V1 saliency hypothesis linking V1 neural responses to behavioral saliency, but also support the idea that (V1) neural mechanisms employed in the derivations of the predictions, in particular the iso-feature suppression, are used for the saliency computation. However, they cannot conclude whether additional mechanisms not yet considered, particularly the more complex mechanisms available only in higher brain centers, might also contribute to the saliency computation.
In contrast, if a prediction specifies not only that one reaction time should be qualitatively shorter, but also be quantitatively shorter by, say, 20%, than another reaction time, and if data reveal instead that the first reaction time is only 10% shorter, then additional mechanisms for saliency computation must be called for. Now, the quantitative agreement between our non-spurious prediction and the reaction time data without any free parameters enables us to conclude that saliency computation requires essentially no other neural mechanisms than the feature-tuned interactions between neurons and a lack of CMO neurons -both are V1 properties.
We should keep in mind that some other mechanistic ingredients or assumptions were omitted in our closing sentence in the last paragraph. Let us articulate and remind ourselves of these other ingredients which have been explicit or implicit in this paper. One is the assumption that the response fluctuations in different types of neurons to a single input item are independent of each other. Hence, for instance fluctuations in the responses of the C, O, and CO neurons to the CO singleton are assumed to be independent of each other. A related assumption is that the fluctuations of the responses to different input items in a scene are sufficiently independent of each other, so that we can approximately treat the statistical properties of the responses to the background bars as independent of the responses to the singleton. Another simplification is the assumption that the response of a neuron to a singleton is independent of whether this singleton is unique in a feature dimension to which this neuron is not tuned. For example, we assume that there is no statistical difference between r C C , r C CO , and r C CMO , or between r CO C and r CO CM , or between r C B and r C O . This assumption may not be strictly true given the known activity normalization in cortical responses [11], although it may be seen as an approximation. Of course treating the population responses to the background bars as having the same statistical property regardless of the type of the singleton in our singleton scenes is another simplification which is in fact only an approximation, it enabled us to write equation (3). Meanwhile, equation (3) led to equation (4) by an implicit assumption that flucutations in the saliency readout to motor responses are negligible (this might be more likely valid for bottom-up than top-down responses). Furthermore, we are assuming that perceptual learning by the observers to do visual search is negligible over the course of the data taking, so that the monotonic function relating V1 responses to reaction times is fixed. The above simplifications or idealizations were made to keep our question focused on the most essential mechanisms. That our prediction agrees quantitatively with data suggests that the above simplifications or idealizations are sufficiently good approximations within the resolution that can be discerned by our data.

Implications for the role of extrastriate cortices
An important question is whether extrastriate cortices, i.e., cortical areas beyond V1, might also contribute to compute saliency. This question is important, since, if these cortical areas could be excluded from determining saliency, future investigation of the extrastriate cortices could focus on their role in other functions. From the discussions in the previous sub-section, extrastriate cortices could contribute to computing saliency if they possess the mechanistic ingredients of feature-tuned interaction (in particular iso-feature suppression) between neighboring neurons and a lack of CMO tuned cells. If so, we could simply extend the hypothesized link between the highest neural response to a location and the saliency of this location to extrastriate cortices, which also projects to superior colliculus and so can influence eye movements.
However, extrastriate cortices contain CMO neurons (private communication from Stewart Shipp, 2011). For example, Burkhalter and van Essen [2] observed that, in V2 and VP, many cells were feature selective in multiple feature dimensions, including orientation, color, and motion direction, and that the incidence of selectivity in multiple dimensions was approximately that which would be expected if the probabilities of occurance of different selectivities in any given cell were independent of one another. These observations imply that triple-feature tuned CMO cells are present even though they are fewer than double-or single-feature tuned cells. In fact, since they observed that most neurons are tuned to orientation and most neurons are tuned to color, the probability that a cell can be a CMO cell must be no less than 25% of the probability of this cell being tuned to direction of motion (M). Similar conclusions in V2 are reached by other investigations [8,35], although the numerical probability of a neuron being a CMO neuron depends on the criteria to classify whether a neuron is tuned to a feature dimension. In addition, unlike the case in V1 where the presence of CM neurons is controversial, V2 is known to have CM neurons, in addition to having CO and MO neurons [8,36,35]. Some of these CM, CO, and MO neurons (which are defined experimentally as being tuned to the two specified feature dimensions simultaneously without restrictions on the neuron's selectivity in the other feature dimensions) in V2 can well be CMO neurons, especially when the chance for a neuron to be tuned to another feature dimension is independent of the other feature dimensions to which this neuron is already tuned. Selectivity to conjunctions of more than two types of features in extrastriate cortices is consistent with general observations that neurons in cortical areas beyond V1 tend to have more complex and specialized visual receptive fields.
According to our analysis in the Methods section, if a cortex containing the saliency map had CMO neurons, then, statistically, RT CMO would be likely smaller than predicted by our non-  Further discussions assuming no role in saliency by the extrastriate cortices Although the current study cannot firmly establish the possibility that extrastriate cortices play no role in the saliency function, the implication of such a possibility is so non-trivial that we discuss it here at the end of this paper.
Traditionally, it has been thought that the control of the direction of attention, including exogenous attention, rests on a network of neural circuits comprising frontal and parietal areas, including the frontal eye field and intraparietal areas [3,10,15]. The role of subcortical areas such as the superior colliculus has also been suggested [21], although it is likely to merely implement attentional Indeed, if we compare V1 with extrastriate cortical areas, the neural activities in the former are more associated with sensory inputs than perception (i.e., outcomes of visual decoding) and less influenced by top-down attention, whereas those in the latter are more associated with perception rather than sensory inputs and more influenced by top-down attention [4]. For example, lesions in V4 impair visual selection of only non-salient objects [34] disfavored by exogenous selection, demonstrating an involvement of V4 in endogenous selection. Equally, neural responses in V4 but not V1 to binocularly rivalrous inputs are dominated by perceived input rather than the retinal images [22], contrasting the involvement of V4 and V1 in perceptual decoding.
Identifying V1's role in exogenous selection thus helps to crystallize the research questions and pave the way for investigating extrastriate cortical areas.

Behavioral data to test various race equalities
We test predictions of various race equalities using behavioral data previously collected by Koene and Zhaoping [19]. Six observers (three of them male) have completed the experiment with reaction time data on all the seven singleton types. Two additional observers (one of them male) however lacked data on RT CMO (since they completed only an earlier version of the experiment), hence their data will only be used to test the race equalities not involving RT CMO (these equalities were the focus of Koene and Zhaoping's study). More details about the experiment can be found in the original paper [19], which did not publish or use the RT CMO data.
The behavioral experiment was designed such that there is a symmetry between the two distinct feature values in any feature dimension, C, O, or M. For example, the two color features, green and purple, are equally luminant, so that it is reasonable to assume that the two C singleton stimuli, one is a green bar in a background of purple bars and the other is a purple bar in a background of green bars, evoke the same population response levels at least in a statistical sense. More explicitly, we assume that the response level to the color singleton is drawn from the same distribution regardless of whether the singleton is green or purple, even though the most responsive neurons to the two singletons differ in their color preference. Furthermore, it is also reasonable to assume that the population responses to the background bars are statistically the same so that the two stimuli share the same invariant background response distribution, even though the two backgrounds activate different neural populations. Then, we can treat the two color singleton stimuli the same in terms of saliency, which is feature-blind once the response levels are given. Therefore, given an observer, our data analysis pools all the RT C data samples into a single pool regardless of whether the singleton is green or purple. Analogously, it is reasonable to assume that all singleton scenes share the same invariant background response distribution regardless of the singleton type, a singleton scene is Proof of the non-spurious race equality in equation (30) To prove this equality between the two races, min(RT  (4) and analogous to equation (17) we have For the second line above, we used equalities which, analogous to equations (15)(16), arise because (due to iso-feature suppression) a neuron's responses to a singleton bar and a background bar are the same unless the singleton is unique in at least one the feature dimensions to which this neuron is tuned. Then, analogous to equation (20), we can ignore all the trivial response levels r X B to get Similarly, again treating a unique feature as a background feature for any neuron not tuned to the corresponding feature dimension, we have In Fig. 4, the non-trivial responses to determine each singleton's reaction time are listed under the corresponding schematic. The expression should not be simplified by deleting the repetitions because the neural responses are stochastic. For example, the two occurances of r C C should be understood as two independent and random samples of r C C from its probability distribution. If r C C follows a Poisson distribution with an average of 20 spikes/second, the two occurances of r C C jointly contribute to the race by the maximum value of the two random samples from this distribution, and this maximum value is on average larger than 20 spikes/second.
Methods to test a race equality Fig. 11 outlines our methods to test each race equality against behavioral data, using the equality RT CO P = min(RT C , RT O ) as an example. Briefly, given a race equality, the distribution of RT goal , the designated type of reaction times in a given equality (e.g., RT CO is the RT goal in race equality RT CO P = min(RT C , RT O )), is predicted from the behaviorally observed distributions of the other reaction times in this equality. The predicted distribution is compared with the behaviorally observed distribution of RT goal , and a distance D between these two distributions is calculated. Ideally, a zero D means that the race equality agrees with data. However, this distance D is typically nonzero even when a race equality does hold, since finite numbers of reaction time data samples can only approximately represent the underlying distributions of various reaction times. A statistical test is devised to give a p value in testing the null hypothesis of the race equality, such that p is the probability that the distance D between the predicted and observed RT goal should be at least as big as observed if the race equality holds. A p > 0.05 is chosen to suggest that the race equality is consistent with behavioral data. Our testing methods involve several components which are represented by various boxes in Fig. 11. The rest of the Methods section describes the details in each of these components for interested readers.

Methods to predict a distribution of reaction times from a race equality
Here we describe the method component in box (1) of Fig. 11. A race equality enables us to predict the distribution of one type of reaction times in the race equality from those of the other reaction times in the race equality. The predicted reaction time, denoted by RT goal , is designated as RT CMO Obtain the null distributions (5) Get null distributions of reaction times, one for each type of the reaction times in the race equality, such that these distributions not only satisfy the race equality but also are the most likely to generate the observed data samples of the reaction times (6) Generate "null" samples of reaction times from the null distributions, as many random samples as in the observed behaviroal data for each type of reaction times in the race equality (7) Generate a "null" sample of D value, using the same procedure as in (1) and (2) but the "null" samples of reaction times, which simulate behavioral data collected in a situation when the null hypothesis holds (8) Generate P null (D), the null distribution of D values, using the "null" samples of D by repeating (6) and (7) Table 1.
In RE i for i = 2-5, the distribution of RT goal = RT 1 can be predicted directly as the distribution of the race outcome RT 2. Given any race, e.g., min(RT C , RT O ) or a three-racer race RT 2 ≡ This predicted RT goal distribution can then be compared with the distribution of the collected behavioral data samples of RT goal .
The predictions of RT goal in RE 1 and RE i for i = 6-8 use a different and more complex method.
In these races, the RT goal is always RT CMO , and RT 1 can be written as where RT part is the winner reaction time from another race involving only the racers in the RT 1 race other than the racer RT CMO . Explicitly, for RE 1 , have been tried to test the robustness of our conclusion to these details). For any particular RT α (with α denoting a singleton type or a race winner), if n i is the number of the RT α samples in the i th time bin, the distribution of RT α across the time bins is described by an N -dimensional vector whose i th component is n i /( j n j ).
Let N -dimensional vectors P ≡ (P 1 , P 2 , ..., P N ) and Q ≡ (Q 1 , Q 2 , ..., Q N ) denote the distributions of RT 1 and RT 2, respectively, in these time bins, and let p and q denote the distributions of RT CMO and RT part , respectively. RT 1 ≡ min(RT CMO , RT part ) means Then RT 1 P = RT 2 means P i = Q i , i.e., From reaction time data on RT C , RT M , RT O , RT CM , RT CO , and RT MO , the samples for RT part and RT 2 can be obtained by equation (41) to construct the distributions q and Q. Then, p can be predicted as the solution to the above linear equation of p, provided that this solution satisfies the probability constraints p i ≥ 0 and i p i = 1. If the solution violates p i ≥ 0 or i p i = 1 (this can happen for example when q i > Q i for some i due to sampling noise arising from the limited data samples and/or due to a lack of actual race equality in reality), then the predicted p is chosen as the one that minimizes a distance between P and Q while satisfying the constraints p i ≥ 0 and i p i = 1 (through an optimization procedure, e.g., via the fmincon routine in MATLAB). The following four different distance measures (between P and Q) were separately tried to test the robustness of our conclusion in the paper (1) : |P − Q| 2 , the squared Hemming distance, the Hellinger distance, (3) : i |P i − Q i |, the 1-norm distance, and (4) i max(Q i , ǫ) log max(Qi,ǫ) max(Pi,ǫ) , with a given ǫ ≪ 10 −100 .
The last distance is the Kullback-Leibler divergence if all P i and Q i were larger than a very small ǫ.
Unless interested in repeating the procedure in this method, readers may wish to skip the rest of this paragraph which describes how t i 's are determined for each race equality RE j for j = 1, 2, ..., 8.
Given a subject and a race equality, all the behavioral data samples of the reaction times for all the singleton types involved in this race equality are put into a single pool. This pool of samples were divided into L = 100 time bins bounded by time boundaries denoted as T i 's, ordered as, whose values are chosen such that all bins contain (as close as possible) an equal number of samples of reaction times from this pool. For reasons that will be clear in the next method section, each t i is chosen from among these T i 's as follows. Let RT (max) and RT (min) denote the largest and smallest reaction times, respectively, from all the behavorial data samples of RT goal , RT 2, and (for RE 1 and RE i for i = 6-8) RT part . Given (T 0 , T 1 , ..., T L ), t 0 is the largest T j smaller than RT (min) and t N is the smallest T j larger than RT (max). Then, let RT ′ (max) and RT ′ (min) denote the largest and smallest RT goal behavioral data samples, respectively. If RT ′ (min) > RT (min) and the largest T j smaller than RT ′ (min) is larger than t 0 , then this T j is assigned to t 1 . If RT ′ (max) < RT (max) and the smallest T j larger than RT ′ (max) is smaller than t N , then this T j is assigned to t N −1 .
Depending on whether t 1 and t N −1 have just been assigned, there are now N ′ = N − 1, N − 2, or N − 3 of the unassigned t i values, which will be assigned in ascending order to τ 1 < τ 2 < ... < τ N ′ .
Each τ i is the T j value not yet assigned to τ k for k < i and is closest to the value τ ′ i which is larger than a fraction F i (with F 1 < F 2 < ... < F N ′ ) of the RT goal data samples. Our data analysis tried each of the following four ways to choose F i 's to see whether this paper's conclusion is robust against variations in these details. One is to choose F i = i/(N ′ + 1). The other three uses in which erf (.) is the error function and x F > 0 is a parameter with value x F = 1.25, 1.35, or 1.45.
The statistical test for the hypothesis that the predicted and the behavoirally observed distributions arise from the same underlying entity We cannot use the Kolmogorov-Smirnov test to see whether RT 1 samples and RT 2 samples are generated from the same underlying distribution, because the samples of at least RT 2 are not independently generated due to the underlying race between the racers. Hence, we devised the following statistical test to test whether the predicted and observed distributions of RT goal arise from the same underlying entity. This section details the methods in boxes (2)-(8) of Fig. 11.
The method (box (2) of Fig. 11) to measure the distance D between the predicted and observed distributions is as follows. Given an observer and a race equality, let p andp denote the predicted and observed distributions of RT goal (in the time bins used for predicting the distribution of the reaction times), respectively. Let D denote the difference between them. This difference is measured by one of the four distance metrics as listed in equation (44), substitutingp and p for P and Q, respectively. All the metrics have been tried to test the robustness of our conclusion.
To test whetherp and p are statistically indifferent from each other, we generated m = 500 other, simulated, distances D, each from a set of simulated samples of reaction times (there are as many simulated samples as in the real behavioral data for each type of reaction time) collected from a simulated behavioral experiment in a hypothetical situation when the race equality holds while the distribution of the simulated samples of reaction times resembles that of the real behavioral reaction time samples. Given the fixed time boundaries T 0 < T 1 < T 2 < ... < T L obtained from the real behavioral data, the procedure to obtain a (simulated) D value using the simulated samples of reaction times is the same as that to get a real D value when the real reaction time samples are used. The p value of the statistical test is the fraction of the simulated D values which are larger than the real D value (obtained using the real behavioral data), a p < 1/m = 0.002 is given when no simulated D is larger than the real D. Our predicted and observed distributions of RT goal are said to be significantly different from each other, i.e., not arising from the same underlying entity, and we declare that the race equality is not consistent with behavioral data, when p < 0.05.
Each set of simulated samples of reaction times (for a given race equality), serving as data collected in a simulated behavioral experiment in a hypothetical situation when the race equality holds, is generated as follows (box (6) of Fig. 11). First, we should have already constructed (detailed in the next paragraph) a set of probability distributions of the corresponding set of reaction times involved in a race equality. For example, for race equality RT CO P = min(RT C , RT O ), we should have already available a set of three distributions, one each for RT CO , RT C , and RT O , respectively. This set of distributions is such that, first, it actually satisfies the race equality and, second, given the constraint that the race equality is satisfied, the distributions are the most likely to be the underlying distributions from which the behaviorally observed samples of reaction times could be generated.
These distributions are called the null distributions of reaction times for the race equality concerned.
From each of these distributions, as many simulated samples of reaction times as the corresponding real behavioral samples of reaction times (for a particular singleton type) are generated as random samples.
The null distributions of reaction times are constructed as follows (box (5) of Fig. 11). Given a subject and a race equality, the real behavioral reaction times for all singleton types involved in the race equality are discretized into L = 100 time bins using time boudnaries T 0 < T 1 < ... < T L as described in and around equation (45). For each singleton type α, let n α ≡ [(n α ) 1 , (n α ) 2 , ..., (n α ) L ] be the histogram of the real behavioral RT α samples in these time bins. The likelihood, or probability, that an underlying distributionp α ≡ (p α1 ,p α2 , ...,p αL ) of the reaction times over these time bins is the generator of this histogram n α is whose logarithm is We construct hypothetical probability distributions,p α , one for each α of the singleton types involved in the race equality, such that the log-likelihood α ln(likelihood(p α )) = α L i=1 n αi lnp αi + constant (49) is maximized, subject to the constraints that the race equality RT 1 P = RT 2 (which takes the form like equations (42)(43)) is satisfied by thesep α s and, for each α, L i=1p αi = 1 andp αi ≥ 0.
Again, this can be achieved through an optimization procedure (e.g., using fmincon in MATLAB).
We verified that the resultingp α s indeed satisfy the race equality RT 1 P = RT 2 and sufficiently resemble the respective histograms of behavioral data RT α . Then, for each singleton type α, a probability distribution of reaction times over continuous time duration (T 0 , T L ) is constructed from p α such that, the probability density within the time window [T i−1 , T i ) is uniform and equal tô p αi /(T i − T i−1 ).
Note that since the time boundaries, the t i 's, for the coarser time bins used in predicting the distribution of RT goal (box (1) of Fig 11) are chosen from among the time boundaries, the T j 's, for the finer time bins for the null probability distributions which strictly satisfy the race equality, the race equality remains satisfied when eachp α is viewed through the coarser time bins used for predicting the RT goal distribution.