Electrophysiological Correlates of Object Location and Object Identity Processing in Spatial Scenes

The ability to quickly detect changes in our surroundings has been crucial to human adaption and survival. In everyday life we often need to identify whether an object is new and if an object has changed its location. In the current event-related potential (ERP) study we investigated the electrophysiological correlates and the time course in detecting different types of changes of an objecṫs location and identity. In a delayed match-to-sample task participants had to indicate whether two consecutive scenes containing a road, a house, and two objects, were either the same or different. In six randomly intermixed conditions the second scene was identical, one of the objects had changed its identity, one of the objects had changed its location, or the objects had switched locations. The results reveal different time courses for the processing of identity and location changes in spatial scenes. Whereas location changes elicited a posterior N2 effect, indicating early mismatch detection, followed by a P3 effect reflecting post-perceptual processing, identity changes elicited an anterior N3 effect, which was delayed and functionally distinct from the N2 effect found for the location changes. The condition in which two objects switched position elicited a late ERP effect, reflected by a P3 effect similar to that obtained for the location changes. In sum, this study is the first to cohesively show different time courses for the processing of location changes, identity changes, and object switches in spatial scenes, which manifest themselves in different electrophysiological correlates.


Introduction
Ungerleider and Mishkin's theory [1] proposing that the ventral stream is more active for processing object identity, while the dorsal stream is more active for processing spatial information has been confirmed by many functional magnetic resonance imaging (fMRI) studies and electrophysiological studies in macaques [2][3][4][5][6][7][8].
In contrast, other studies both in macaques and humans question the validity of a strict dorsal/ventral dual-stream model, showing both streams are active in both identity and location processing [9][10][11][12]. However, several event-related potential (ERP) studies have revealed different time courses for the processing of object location and object identity information, with some, but not all favoring spatially based information over identity based information [13][14][15]. Furthermore, multiple different ERP correlates have been suggested to be related to the processing of object location and object identity, including N2 and P3 effects [13,14,[16][17][18][19]. The posterior N2 effect has been found to be related to the detection of a change [20][21][22], and in a review article by Koivisto and Revonsuo [23] has been interpreted as an instance of the visual awareness negativity (VAN). This VAN is related to the moment an individual becomes aware of a change with respect to visual information held in memory [23,24]. Posterior N2 effects are often followed by a P3 effect, which has been shown to be related to confidence of the response [21] and to conscious postperceptual processing [23,24]. On the other hand, the N2 effect has also been found to be followed by a second sustained lateralized negativity when objects have to be encoded in greater detail [25].
All of the neuropsychological studies mentioned above have focused on visual object identification and location processing on blank backgrounds. However, there is reason to believe that objects are processed differently when presented in an environment [26,27]. Murphy et al. [28] performed an ERP study on locating and identifying objects in a spatial environment, and found that changes in object identity elicited earlier P3 effects than changes in object location. However, in their study, a stronger emphasis was placed on the identity of the objects compared to the location of the objects. After studying the stimuli, during the test phase, participants had to indicate whether the identity of the object was either the same or different as the studied object, regardless of whether the object was in a different location compared to the study phase. In the present ERP study we investigated the time course of detecting different types of changes of an objects location and identity when objects were presented in an environment. We adopted a paradigm in which equal emphasis was placed on both object location and object identity. In addition, we investigated the possible segregation of these kinds of change detections in terms of electrophysiological correlates. Furthermore, we were also interested in the electrophysiological signature of the processing of instances where two objects switch location, in which both spatial and featural information need to be integrated for the detection of a change.
To unravel the time course and ERP components related to detecting changes in object identities and object locations in an environment, we presented objects in contextually rich spatial scenes containing a house, a road, and two objects. In order to equally emphasize object identity and object location, we contrasted them directly in a single task in which, using a delayed response paradigm, participants had to decide whether a second scene was identical to a first scene or not. Six conditions were randomly presented in this task ( Figure 1A). In (a) the match condition, the second scene was identical to the first scene, in (b) the side change condition, one object's location changed sides in the visual display, in (c) the depth change condition, one object's location was moved in depth, in (d) the disappearance condition, one of the objects disappeared, in (e) the identity change condition, one of the objects was replaced by another object, and in (f) the switch condition, the objects switched position, measuring objectto-location binding. Conditions (b), (c), and (d) were all considered location change conditions, since one of the objects left its initial location, while in conditions (e) and (f) both initial locations remained occupied by an object.
Our hypothesis regarding the time course of detecting location and identity changes was that location changes would be processed earlier than identity changes [13,14]. This is in line with a recent neuronal model on scene processing which proposes that global scene layout is processed first to form a hypothesis on target location, after which this location is processed more deeply to identify the object [29,30]. Both location changes and identity changes were hypothesized to elicit earlier effects than the object switch condition, since information about the objects identities needs to be bound to their respective locations in order to detect a change and attention to both objects is needed to do so [31]. We expected time course differences between conditions to manifest themselves at the level of the N2 component, reflecting visual change detection processes [20][21][22], and the P3 component, reflecting post perceptual processing [23,24]. Furthermore, we hypothesized that object location and object identity processing in scenes would manifest themselves in different ERP components [2][3][4][5][6][7][8]13,19].

Ethics Statement
All participants provided written informed consent in accordance with the declaration of Helsinki. This study was approved by the local ethics committee (Commissie Mensgebonden Onderzoek region Arnhem-Nijmegen, The Netherlands).

Participants
Twenty-three paid volunteers participated in this study, after having given written informed consent. Three participants were excluded from the analyses, two due to problems with the recording computer, and one because of a large number of errors (deviating more than 3 sd from the mean). Thus, twenty participants (10 men, 10 women) remained in the sample. They were all right-handed as assessed by a self-reporting questionnaire. Their mean age was 22.7 years (ranging from 18.2 to 28.5) and they were mostly undergraduate students from the Radboud University Nijmegen. Participants had normal or corrected-tonormal vision and were not color blind.

Stimuli
The stimuli consisted of computer-generated environments created with Google Sketchup TM (see Figure 1A). This program allows for the creation of 3D environments. These environments are scaled environments and one can measure 'real life' sizes of objects and distances between objects in the environment. For example, the front wall of the house measured 6 by 10.5 m. Two objects were placed along the road, which in half of the trials ran from the left bottom of the scene, and in the other half from the right bottom of the scene up to the house centered just above the middle of the scene. The objects could be placed at 4 locations along the road. These locations were either 20 or 38 meters in front of the house (in real size) and were placed at 5 meters horizontal distance from the road (measured from the middle of the object) at both sides. All objects were familiar objects scaled to their normal size. The locations of the objects were counterbalanced. The viewpoint was 72.83 meters from the house, with a viewing angle of 36.6 degrees (10.75 m high) centered at the middle of the front door of the house.
In total, 45 different objects were used to create the stimuli. The object names and real sizes of the objects are reported in Table S1. They were paired differently in each trial, to construct a total of 360 trials. In addition, 4 different objects were used to construct four practice trials. The experiment contained six conditions: a) match, b) side change, c) depth change, d) disappearance, e) identity change, or f) switch (see Figure 1A). In the match condition, the second scene (S2) was the same as the first scene (S1). In the side change condition, one of the objects was moved to the other side of the road in S2, but stayed at the original distance from the house (either 20 or 38 m). In the depth change condition, one of the objects was moved to a new position along the road (from 20 to 38 m or vice versa), but stayed at the same side of the road. In the disappearance condition, one of the objects disappeared. In the identity change condition, one of the objects was replaced by another object. Finally, in the switch condition, the objects from S1 switched positions with each other. For illustration purposes, all scenes in Figure 1A were derived from the same sample scene, but in the experiment no scene appeared in more than one trial. Side change, depth change, and disappearance are all location change conditions, i.e., one of the locations occupied by an object in S1 is not occupied in S2. The scenes were sized 16.7 by 10.6 cm and displayed on a 27634 cm screen to keep eye-movements as limited as possible. The remaining part of the screen was black.

Procedure
Participants were equipped with an electrode cap and seated in an electrically shielded room in front of a computer screen at a distance of approximately 60 cm. After they were comfortably seated, the task instruction was visually displayed on the screen. Participants were told that they would be shown two consecutive scenes and they had to press a mouse button after the second picture disappeared to indicate whether the scenes were the same (left button) or different (right button). They were instructed to blink during the presentation of a blinking star displayed in between trials and to blink as little as possible during the rest of the trial. They were also instructed to keep their gaze fixed on the middle of the screen (where the fixation cross was presented), but that they could move their eyes if it was otherwise impossible to see the whole scene.
The time course of a trial is shown in Figure 1B. First, a fixation cross was displayed for 1000 ms, followed by presentation of S1 for 1200 milliseconds. After that a fixation cross was shown which was jittered in duration between 1000 and 2000 milliseconds. Then S2 was presented for 2000 milliseconds followed by the words 'Kies nu' ('Choose now') which indicated that the participant should respond. After the participant had responded, a blinking star was displayed for 2000 ms, indicating that the participants were allowed to blink. All conditions apart from the match condition contained 45 trials. In the latter condition 135 trials were presented to avoid a bias towards pressing the 'different' button.
Four practice trials were used to familiarize the participants with the experimental setting, after which they could ask questions to the experimenter. Next, the experiment started when the participant pressed a button. In total, 5 blocks of 72 trials each were presented to the participant. Each block lasted about 10 minutes and there were breaks in between the blocks.

EEG Recordings and Analyses
EEG data were recorded with a 64-electrode equidistant actiCAP (Brain Products GmbH) referenced to the left mastoid ( Figure 2). Signals were passed through a BrainAmp DC amplifier (Brain Products GmbH) and were recorded on-line with a sampling rate of 1000 Hz. Measured activity was filtered on-line using a 200 Hz low-pass filter, and a time constant of 10 sec. To measure eye movements, an additional electrode was placed below the left eye. Impedance was kept below 20 KV, which is a standard setting in active electrode recording. Data were analyzed in the Matlab-based open source program Fieldtrip [32]. EEG signals were re-referenced to the mean of the left and right mastoid. The signals were screened manually for movement and muscle artifacts and then corrected for horizontal and vertical eye movements by employing the Independent Component Analysis (ICA) method [33]. Signals were filtered with a 0.1 to 40 Hz bandpass filter. Then EEG data were segmented from 200 ms before to 1200 ms after the onset of S2. Segments were baseline-corrected by subtracting the mean amplitude in the 2200 to 0 ms prestimulus interval. Only correct trials were analyzed. For each electrode ERPs were computed by averaging the segments of artifact-free trials per condition. A mean of 40 to 42 trials per condition were available for the computation of the ERPs with a minimum of 32 epochs per condition per subject. In the match condition, only every third trial was included in the final analysis to keep the number of trials in each condition similar. Mean amplitudes of four latency windows surrounding the peak latency of distinct ERP components were entered into our analyses: Based on our hypotheses, analyses were conducted in latency windows surrounding the N2 (190-250 ms), and P3 (300-500 ms) peaks. In addition, based on visual inspection, mean amplitudes in N1 (80-140 ms) and N3 (250-400 ms) latency windows were analyzed.

Behavioral Performance
Participants' performances were highly accurate. Percentages of correctly answered trials per condition are reported in Table 1. Reaction times were not analyzed because participants were instructed to give a delayed response 2 seconds after the stimuli were presented.

Event-related Potentials
The ERP waveforms for all conditions are shown in Figures 3A  and 3B. The figures show that the waveforms of the side change, depth change, and disappearance conditions are all deviating from the match around 200 ms after stimulus onset ( Figure 3A), while the waveforms for the identity change and switch conditions deviate from the match later at around 300 ms after stimulus onset ( Figure 3B). However, the identity change and switch do not resemble each other, with the identity change condition deviating negatively from the match condition and the switch condition revealing a somewhat delayed positive effect compared to the match condition.
Omnibus analyses. As a first step, omnibus ANOVAs with the within-subject factors Condition (6 levels) and Region (5 levels) were carried out on the preselected time windows. The statistical results are shown in Table 2. In the N1 latency window (80-  140 ms), a fronto-central negativity was found in all conditions, which did not differ between conditions. In the N2 latency window (190-250 ms), a fronto-central negativity was elicited, together with a posteriorly distributed positivity. The ANOVA did not reveal a main effect of Condition, but a significant interaction between Condition and Region was obtained. In the N3 latency window (250-400 ms), a negativity can be seen for the identity change condition compared to the match condition, in the absence of a negativity in the other conditions. This was confirmed by a main effect of both Condition and Region, and an interaction between these factors. In the P3 latency window, Figure 3 illustrates a positive component at central and posterior electrodes in all conditions. The ANOVA revealed a main effect of Condition, a main effect of Region, and an interaction between these two factors. Next, separate ANOVAs were carried out in the N2 and P3 time windows, comparing all conditions separately to the match condition. Whenever a significant effect of Condition or an interaction between Condition and Region was found, differences between conditions were tested separately for each region. Results of these analyses are reported in Table 3 and 4. In the N3 time window an ANOVA was carried out, comparing the identity change to the match, since only the identity change showed a negative effect compared to the match. N2 latency window. Scalp distributions of all non-matching conditions minus the match condition are shown in Figure 4. Statistical results are reported in Table 3. While the analyses on the side change, depth change, and disappearance conditions all elicited an interaction between Condition and Region, the identity change and switch conditions neither resulted in an effect of Condition, nor in an interaction between Condition and Region. Figure 4 shows that the depth change, and disappearance elicited an anterior P2 and posterior N2 compared to the match condition in this time window, while the identity change and switch conditions did not differ from the match, which was confirmed by the post-hoc analyses. For the side change condition, the interaction was also significant and a similar pattern as in the depth change and disappearance conditions can be observed. However, here post-hoc tests failed to reach significance.
N3 latency window. In the N3 latency window, we only observed a negativity in the identity change condition compared to the match condition. Statistical analysis revealed a marginally significant effect of Condition (F (1,19)  P3 latency window. Statistical results of the ANOVAs in the P3 window are reported in Table 4. Scalp distributions of differences between the non-matching conditions and the match condition are shown in Figure 4. The results revealed a significant main effect of Condition for all contrasts involving the match condition except for the identity change versus match. In addition, significant interactions between Condition and Region were found for all comparisons against the match. For the side change, depth change, and disappearance conditions the P3 effect was significant in all regions, with a maximum over the central areas (see Figure 4). In the switch condition, the P3 effect had a less widespread distribution across the scalp, confirmed by significant main effects in all but the left anterior region. In the identity change condition, significant effects of condition were only found in the anterior regions. However, as can be seen in Figure 4, this is due to a negativity of the identity change compared to the match instead of a P3 effect. However, in a slightly later time window, the identity change seems to show a P3 effect compared to the match condition. Additional statistical analyses of the 450-650 ms latency window did reveal a significantly larger amplitude in the identity change condition than the match condition at central and posterior sites (all p,.005).
Side change, depth change, and disappearance. We were interested in differences between the different location change conditions. A difference between side and depth change versus disappearance would indicate that the scenes are processed further after one detects that the object has disappeared from its original location. A difference between side change and depth change could indicate a difference between categorical and coordinate changes [35,36].
The three location change conditions were tested against each other in the N2 and P3 time windows. In the N2 window no effect of Condition and no interaction between Condition and Region were obtained. In the P3 time window, no main effect of Condition, but a significant effect of Region (F (4,76) = 52.71, p,.001), and an interaction between Condition and Region (F (8,152) = 6.51, p,.001) were found. Side change elicited a larger P3 effect than depth change in the right posterior region (F (1,19) = 5.83, p = .026). The P3 in the disappearance condition was more right lateralized than in the side change and depth change conditions. This was confirmed by a larger positivity at the left posterior sites for both side and depth change compared to disappearance (F (1,19) = 21.64, p,.001, and F (1,19) = 13.01, p = .002, respectively).
Identity change vs. switch. The identity change and switch were tested in a Condition (2)6Region (5) repeated measures ANOVA in the N2, N3 and P3 latency windows. In the N2 latency window, no effect of Condition, and no interaction between Condition and Region were found. In the N3 latency window, effects of Condition (F (1,19) = 11.26, p = .003) and Region (F (4,76) = 76.10, p,.001) were obtained, but no interaction between the two factors. Identity change elicited a larger anterior negativity and a smaller posterior positivity than the switch. In the P3 latency window effects of Condition

Discussion
In the present ERP study, we investigated the time course and electrophysiological correlates of processing object location and object identity in spatial scenes. In a delayed match-to-sample task participants had to indicate whether two consecutive scenes, both containing a road, a house, and two objects, were either the same or different. The scenes were identical, one of the objects had changed identity, one of the objects had changed location, or the objects had switched positions. The results show that location changes are detected earlier than identity changes, which in turn are detected faster than object switches. These effects are reflected by modulations of the N2, N3, and P3 components.
Relative to the match, all location change conditions revealed a posterior N2 effect in the 190-250 ms latency window, although in the side change condition this effect did not reach statistical significance (see Figure 4). The observed N2 effect reflects the detection of a change in a visual stimulus and could be interpreted as an instance of the VAN [21][22][23][24]. These results show that in this early time window, location changes have already been detected,  while identity changes and object switches have not. Importantly, the effect in this time window cannot be explained by differences in visual attention between the conditions, as the conditions were randomly intermixed in this study. Moreover, it has been shown that differences in attention to visual stimuli are reflected in an N1 effect [37], and we did not find an effect of condition in the N1 window. Therefore, we interpret the posterior N2 effects as reflecting the detection of a visual change.
The results show that ERPs to identity changes deviate somewhat later from the match condition than location changes, resulting in an N3 effect in the 250-400 ms latency window. One may argue that this negative effect reflects a delayed VAN, as it  has previously been shown that the latency of the VAN can be delayed until 460 ms depending on the contrast between the stimuli [23]. However, the N3 effect in our study reaches its maximum amplitude at anterior electrode sites whereas the VAN has been shown to have a posterior scalp distribution. These differences in scalp distribution indicate that the N3 effect is functionally different from the N2 effect in the location change conditions. Indeed, a similar anteriorly distributed N3 effect has previously been found to reflect processing of object specific representations [20,38]. Only after objects presented in the scene have been identified and compared to object representations in memory, a difference can be detected. These results show that object specific processing within an environment takes place within 400 ms after presentation of the second stimulus.
The 300-to-500 ms time window revealed P3 effects for several conditions. Amplitude modulations of the P3 component were found for location changes and the switch compared to the match and identity change. The effect was even larger for the location changes compared to the switch. These results are in line with previous studies presenting objects on blank screens showing larger P3 amplitudes for location changes relative to matches [16], and a larger amplitude for location changes than identity changes [17]. The switch condition did not deviate from the match until this time window, suggesting that the switch was detected only after 300-500 ms, and thus later than the location change and identity change. This is in line with the theory that both information about location and identity of the object have to be bound together, and this requires integration of information processed in the dorsal and ventral stream [1].
The only condition that did not show a larger P3 compared to the match in the 300 to 500 ms latency range, was the identity change condition. However, the identity change did elicit a small P3 effect in a slightly later time window. This later occurrence of the positive effect in the identity change condition is possibly due to the elicitation of an overlapping N3 component in the identity change condition. Alternatively, the apparent P3 latency difference between the identity change condition and the location change conditions may also reflect a longer duration of perceptual and decision-related processing in the identity change condition [21,28,39,40]. This would be an indication that the detection of the identity changes requires more effort than the detection of location changes. In addition, irrespective of the latency difference, the P3 amplitude of the identity change appears to be smaller than the amplitude of the location changes. The P3 amplitude has previously been shown to be inversely related to the difficulty of stimulus evaluation [39,40] and to the confidence of the response [21,41], providing further evidence for more effortful processing of identity changes relative to location changes.
The present study also provided an opportunity to test for more subtle differences between multiple types of location changes. We, therefore, compared the side change, depth change, and disappearance condition with each other. Results show that only the P3 component was modulated differently in these conditions. An effect was shown for side change and depth change compared to disappearance. While the P3 was centrally distributed for the side and depth change, in the disappearance condition it was more right lateralized. This implies that scenes in which an object has moved to another place are processed differently from a scene in which one of the objects has disappeared. Moreover, it shows that despite being able to perform the task while only attending to the locations that were previously filled with an object, participants processed the whole scene.
Side change and depth change also slightly differed in P3 effect size, with side change eliciting a larger P3 effect than depth change only in the right posterior region. This might be due to a difference between categorical and coordinate processing, which has been shown to modulate P3 amplitude [35], and in the present study would be reflected in the side versus depth change. On the other hand, participants may have perceived both the side and the depth change as a categorical change, since there were only two possible depths which could be coded as near and far. Alternatively, the difference in P3 amplitude may be due to the relative difficulty of detecting a depth change compared to the side change. Whereas in the depth change condition the object changed visual appearance, i.e. it changed size, in the side change condition the visual appearance remained the similar, making change detection in the depth change condition harder [40,42]. Also, the side change condition resulted in a larger difference in the retinal image compared to the depth change condition. Future research controlling for the degree of change in the retinal image may elucidate the nature of this small P3 effect.
When comparing identity change to the switch, we found an N3 effect, similar to the one obtained in the identity change to match comparison. Also, we found a larger P3 amplitude for the switch compared to the identity change. These findings imply that two objects switching position is not merely processed as two object changing identity at two distinct locations, but can be considered a functionally distinct category of change detection.
In sum, our results show that when objects are presented within a spatial scene, the time courses of detecting location changes, identity changes, and object switches differ from each other. Location changes were detected already within 250 ms after presentation of the second stimulus as shown by a modulation of a posteriorly distributed N2 component, followed by a P3 effect. Detection of identity changes occurred later, but within 400 ms and elicited an anterior N3 effect, followed by a delayed and reduced P3 effect. In contrast, object switches were detected within 500 ms as reflected in the modulation of the P3 only, resembling the P3 effect found in location changes. These differences in neural correlates of the detection of location changes and detection of identity changes is in line with results showing that these types of information are processed in different visual streams [2][3][4][5][6][7][8]. The ERP time courses of the different contrasts suggest that location changes are easiest to detect followed by identity changes and finally object switches.
It may be argued that these findings are inherent to the processing of the scenes in our paradigm. If one is aware that possible changes always involve the objects in the scene, one will direct attention to the two locations that previously contained an object. A location change can then be detected directly by noticing the object has disappeared, whereas for detecting an identity change, the object first needs to be processed further in terms of object representations. For the detection of an object switch, spatial and featural information first need to be combined, causing a delay in the detection of these changes. Alternatively, the observation of time course differences for location and identity processing may translate to ecologically valid environments, where location changes of objects in the immediate environment (e.g. an object approaching) are detected before the object's identity. This has been proposed in two visual search models and confirmed by behavioral, MEG and effective connectivity results, showing that low spatial frequency components of a scene are processed first, guiding spatial attention to an object that is processed subsequently [27,28,[43][44][45]. However, in order to strengthen the claim that location change is processed before identity change, replication of a temporal advantage of location over identity in different spatial environments would be beneficial.
To conclude, human beings are able to detect any changes in an objects location, object's identity, and switches of objects extremely fast, even when these objects are embedded in a context. Our results suggest that location changes are detected faster than identity changes. To appropriately act and function in our environment is crucial to human adaptation and survival. The fast processing of changes in our surroundings are the first steps of a human neural mechanism that allows adjustment of behavior. Being aware of changes in an objects location may indeed be more important for accurate adaption of behavior than changes of an objects identity. For example, fast detection and processing of the locations and changes in location of objects such as vehicles are important in everyday life when crossing the street. This study is the first to cohesively show that the human neural system is finetuned to change detection and moreover differentiates between different types of changes in an environment.

Supporting Information
Table S1 Pictures of the objects used in constructing the stimuli and their size in the virtual 3D environment. (DOCX)