Processing spatial configurations in visuospatial working memory is influenced by shifts of overt visual attention

J. David Timm; Frank Papenmeier

doi:10.1371/journal.pone.0281445

Abstract

When memorizing multiple objects, humans process them in relation to each other, proposing a configuration benefit. Shifts in overt visual attention through eye movements might influence the processing of spatial configurations. Whereas some research suggests that overt visual attention aids the processing of spatial representations, other research suggests a snapshot-like processing of spatial configurations, thus likely not relying on eye movements. In the first experiment, we focused on the comparison between an enforced fixation and a free view condition regarding configurational effects. Participants encoded objects’ locations and were asked for changes at retrieval. One object was displaced in half of the trials and was either accompanied by a configuration or was displayed alone. In the second experiment, we expanded this idea by enforcing fixation during different task phases, namely encoding, maintenance and retrieval. We investigated if a fixed gaze during one specific phase drives the influence of eye movements when processing spatial configurations. We observed reliable configuration benefits for the free view conditions. Whereas a fixed gaze throughout the whole trial reduced the effect, enforced fixations during the task phases did not break the configuration benefit. Our findings suggest that whereas the processing of spatial configurations in memory is supported by the ability of performing shifts of overt visual attention, configurational processing does not rely on these shifts occurring throughout the task. Our results indicate a reciprocal relationship of visuospatial working memory and eye movements.

Citation: Timm JD, Papenmeier F (2023) Processing spatial configurations in visuospatial working memory is influenced by shifts of overt visual attention. PLoS ONE 18(2): e0281445. https://doi.org/10.1371/journal.pone.0281445

Editor: Alessandra S. Souza, University of Zurich, SWITZERLAND

Received: January 31, 2022; Accepted: January 24, 2023; Published: February 9, 2023

Copyright: © 2023 Timm, Papenmeier. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files are available on OSF: https://osf.io/gw7v5.

Funding: This research project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project number 357136437 (https://www.dfg.de/) to FP. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Single objects are remembered together with their surrounding objects in visuospatial working memory (VSWM), which is called a spatial configuration benefit [1]. Thus, a change of an object location is detected more easily, when the probed object is accompanied by the objects maintained in parallel than when the probed objects is shown alone at retrieval. Even when participants are instructed to memorize multiple single objects individually, the global spatial configuration of all encoded objects is processed automatically [1, 2]. Rendering a part of this global configuration relevant by a retro cue during maintenance results in similar configuration benefits for the relevant partial configurations [3–5]. Configuration benefits are not only limited to featureless objects [1, 2] but also arise with natural stimuli [6, 7]. While it is well established that spatial configurations are represented in VSWM, the conditions causing their representation are still not well understood. With the present research, we addressed this issue by studying the influence of eye-movements on the representation of spatial configurations in VSWM.

The performance and programming of saccadic eye-movements is strongly interconnected with visual attention and visuospatial working memory. For example, performing a saccadic eye-movement to one location inevitably causes the direction of covert visual attention to the very same location making it impossible to saccade to one target but to attend to another target [8, 9]. In a similar manner, limited working memory resources are directed towards saccade targets improving the precision of their representation in VSWM [10], with some research postulating that shifts in overt visual attention determine the composition of VSWM contents [11]. Vice versa, shifts of overt visual attention by means of saccadic eye-movements require an operating memory store for target selection, the maintenance of information across saccades, object correspondence, or gaze correction [12]. While the strong interconnection between saccadic eye-movements and VSWM has been shown for the representation of single objects, we extend this research by focusing on the relation between eye-movements and the representation of spatial configurations. We do so by first reviewing previous research on the relation between eye-movements and VSWM in general and thereafter focusing on the potential influence of eye-movements on the representation of spatial configuration in VSWM in particular.

Previous research established the relation between eye-movements and VSWM by either asking participants to explicitly perform task-unrelated eye-movements [13] or by preventing participants from performing potentially task-related eye-movements due to fixation instructions [14–16]. For example, when participants concurrently memorize spatial information and try to detect a shape change of a visual object on the screen, memory for the spatial information is reduced when the shape is moving, thus causing participants to perform task-unrelated voluntary eye-movements, rather than stationary [13]. It is assumed that task-unrelated voluntary eye-movements interfere with the maintenance of spatial information [13]. The role of eye-movements for maintenance in VSWM was further investigated by instructing participants to either free view during maintenance or to fixate a specific location such as the fixation cross during maintenance [14–16]. The free-view eye-movement patterns during maintenance suggest that participants utilize eye-movements for rehearsal during maintenance by fixating previously occupied target locations [14–16]. Whereas asking participants to fixate the task-unrelated fixation cross instead of free-viewing reduces memory performance [14, 15], giving participants an instruction to fixate a single self-chosen location instead of free viewing during maintenance causes no reduction of memory performance [16]. The latter finding was attributed to participants still being able to rehearse the object locations with covert visual attention. This idea is supported by research showing that forcing covert visual attention to the fixation cross by the presentation of a dual-task during maintenance causes similar reductions in memory performance as instructing participants to actually fixate the fixation cross [14, 15]. It is important to note, however, that while instructions to fixate the fixation cross cause a general drop in memory performance, the specific means of the functional role of free view eye-movements during maintenance are still not well understood. For example, while fixation during maintenance causes a general drop in memory performance, it does not seem to increase the rate of time-based forgetting due to decay [14], and the fixation of objects during maintenance that undergo a change during test also does not necessarily cause better change detection for those objects [15].

Regarding the potential influence of eye-movements on the representation of spatial configuration in VSWM, it seems useful to first consider the process underlying the representation of inter-object relations in VSWM. Two competing accounts explaining the storage and processing of spatial configurations in VSWM can be identified in the literature. One account assumes a rather rigid, snapshot-like, representation of spatial configurations in VSWM whereas the other account assumes a more flexible representation.

The snapshot account is based on the observation that only the global spatial configuration of all encoded objects but not a partial spatial configuration of a subset of encoded objects results in a memory benefit [1, 2] and this snapshot might be represented in a separate view-dependent snapshot store in VSWM that is also used for spatial navigation [2, 17]. This idea fits with the finding that spatial configurations are represented in a view-dependent manner with configuration benefits disappearing when viewpoint changes of 30 or 60 degrees between encoding and retrieval are introduced [7]. Further, location change detection is impaired by changes to task-irrelevant features that destroy the perceptual grouping of the memorized objects [18].

The flexible account suggests that spatial configurations are stored interdependently in VSWM. It is based on the idea that the representation of inter-object relations such as summary statistics (mean color or location across individual object representations) might result from a hierarchical representation of features in VSWM with individual objects being organized in higher order hierarchical clusters [19–22]. This more flexible account is in accordance with recent findings showing that the influence of inter-objects relations on memory representations can be manipulated by shifting visual attention with retro cues during maintenance and thus after encoding has already been completed. This accounts for both the representation of object features such as orientation [23] and object locations as represented in spatial configurations [3–5]. Regarding the latter, it was shown that the presentation of valid retro cues during encoding caused a reorganization of spatial configurations to the relevant probed one [3–5].

Whereas the snapshot account suggests a rather holistic representation of spatial configurations, the flexible account emphasizes the role of individual objects as nodes for the higher-order representations. Applying both accounts to the potential role of eye-movements on the representation of spatial configurations in VSWM, on might predict that eye-movements are rather unimportant or even harmful for the representation of spatial configurations based on the snapshot account. That is, the spatial configuration benefit should be stronger for a fixation condition than for a condition with free eye-movements. Based on the flexible account, in contrast, one might predict that eye-movements and thus the shift of overt (and covert) attention might have a strong influence on the representation of individual objects and thus potentially also their inter-object relations. Thus, the spatial configuration benefit should be stronger with free eye-movements than in a fixation condition. There is some initial evidence supporting both views. In accordance with the former view, the implicit learning of spatial configurations within the contextual cueing paradigm is not only possible but even superior without eye-movements than with eye-movements [24]. In support of the latter view, an experiment that was designed to test the influence of retro cues and set-size on the reorganization of spatial configurations but that also manipulated eye-movements observed a configuration benefit for the group of participants allowed to perform eye-movements freely but not for the other group of participants being enforced to maintain fixation [3].

In our research, we focused on the role of eye-movements for the representation of spatial configurations on VSWM. That is, we were interested under which conditions the spatial inter-object relations between objects are utilized for VSWM, such that a location change is easier detected during test when the probed object is not presented alone but together with the global spatial configuration of all objects. There is prior related work that studied the memory for object locations within a spatial grid [14, 25, 26]. They showed that the presence of the spatial grid during maintenance supports the rehearsal of object locations whereas the absence of the spatial grid results in decay. Interestingly, a fixation instruction as compared with a free view instruction resulted in a general drop of memory performance that was stronger for the conditions with the spatial grid than without the spatial grid [14]. Nonetheless, the benefit of the presence of the spatial grid remained even with the fixation instruction. It is important to note, however, that there are some important differences between this prior work on environmental support during maintenance [14] and our present experiments. Our research focused on the representation of inter-object relations between memorized objects whereas this prior work focused on the influence of environmental support (spatial grid of target and distractor locations) on rehearsal during maintenance. Therefore, our manipulation of the spatial configuration addressed the presence of inter-object relations during test whereas the research on environmental support addressed the presence of a spatial grid during maintenance. This makes it difficult to draw clear predictions for our present experiments based on this prior work.

With the present set of two experiments, we aimed at investigating the role of eye-movements for the processing of spatial configurations in VSWM. We did so by manipulating eye-movements within-participants rather than between-participants [3] providing a stronger test of whether eye-movements rather than other random cognitive differences across groups of participants drive the encoding of spatial configurations. Further, we manipulated eye-movements in a fine-grained manner such as the encoding, maintenance and retrieval phase in Experiment 2 in order to shed light on the interrelation of eye-movement performance and the representation of spatial configurations in VSWM.

Experiment 1

With this experiment, we investigated the influence of shifts of overt visual attention on the processing of spatial configurations in VSWM. Therefore, we manipulated whether participants had to maintain fixation or whether there were allowed to perform eye-movements freely and measured the benefit of the presence of configurational information during retrieval on memory performance in a location change detection task.

Method

We performed the method including sample size and analyses as we had preregistered on OSF: https://osf.io/gw7v5.

Participants

We used the R-Package powerbydesign [27] to conduct a power analysis based on our previous experiments [3, 4]. Our goal was to obtain at least a power of .90 at the standard of .05 alpha error probability. This led to a sample size of 56. Participants were invited for this experiment receiving course credit or monetary compensation. We paid 2€/15 min. We preregistered the following exclusion criteria: Participants identified as not doing the task (pressing always the same button or a performance level that does not deviate from chance) were supposed to be removed from the data set and replaced by new participants as well as any participants not completing the whole experiment. Eight participants had to be replaced due to a performance level that did not deviate from chance. Participants had normal or corrected-to-normal vision and their age ranged from 18 to 31 years (M = 23.9 years, SD = 3.3 years). Up to two participants were tested at the same time on different laptop computers. Simultaneous testing and monetary compensation were consistent throughout all experiments. The research was conducted in accordance with APA standards for ethical treatment of participants and with approval of the institutional review board of the University of Tübingen. All participants provided written informed consent.

Materials

We presented eight grey squares (RGB color hex code: #777777) on a white background (#FFFFFF) on a 15.6” computer screen (Dell Precision M4800) using an SMI iView RED250 mobile eye tracker to record eye movements with a sampling rate of 250 Hz. The experiment was programmed with PsychoPy 1.85.6 [28]. Each square measured 0.8° x 0.8° (degrees of visual angle). The objects could appear in a 18° x 18° centered array (black outline: #000000). A black fixation cross (0.5° x 0.5°, color: #000000) was presented in the center of this array. We generated random object locations for each trial with a minimum center-to-center distance of twice the diameter of a square and with a minimum distance to the center of the array of once the diameter of a square. Participants’ heads were positioned in a chin rest.

Procedure

Participants performed a location change detection task (see Fig 1). During encoding all objects were shown for 2000 ms. Then a maintenance phase of 2000 ms followed with no objects visible. At retrieval objects reappeared either with a configuration (global) or without configuration (single) and one object was marked by a red outline (#FF0000). We manipulated the position of the object probed at retrieval (new/old). Participants had to press the respective keyboard button (1/9) to indicate whether the object probed changed its location or not.

Download:

Fig 1. Location change detection task.

Note that the fixation cross is not displayed in this figure for illustrative purposes.

https://doi.org/10.1371/journal.pone.0281445.g001

Trials were presented in randomized order with the restriction that the experiment consisted of two blocks containing 80 trials each leading to 160 trials in total. One block was performed without any restrictions regarding eye movements while the other block was performed with a fixed gaze. Participants started each trial with fixating the centric cross for 250 ms, which was visible across the whole trial. If participants did not hold the fixation throughout the trial in the enforced fixation condition, the trial was aborted and repeated at the end of the block. In detail, if gaze samples were recorded outside of an invisible surrounding circle with a radius of 1.5° around the centric cross for more than 250 ms, the respective enforced fixation trial was aborted and repeated at the end of the block. Within the enforced fixation block, a mean of 22.6% (SD = 14.6%) of the presented trials were aborted and repeated across participants. We counterbalanced the block order leading to one group of participants with a first block of free view and a subsequent block with a fixed gaze and vice versa for the other half of the participants. Each other condition occurred equally often within each block and, thus, also within the whole experiment. Participants performed an eye movement specific practice block, containing one trial per possible condition (4 trials), in the beginning of the experiment depending on the eye movement condition (with/without). In the beginning of the second block, another practice block was done, containing the other eye movement condition. Participants were not aware of a change in the possibility of shifting overt visual attention until that second practice block occurred. The whole experiment duration was approximately 45 minutes.

Results and discussion

The data and the analysis can be obtained from https://osf.io/rqejx/. We performed the analyses in accordance with our preregistration. We calculated sensitivity (according to signal detection theory) as dependent measure across the responses to the old and new probe location trials [29]. Sensitivity d’ is defined as with phits being the proportion of hits and pfa being the proportion of false alarms [29]. Hits refer to the accurate detection of old locations, and false alarms refer to responding “old” to a new location. Note that sensitivity cannot be calculated for either phits or pfa having values of 0.0 and 1.0. Thus, we corrected such values to the proportions equaling half a trial correct or half a trial incorrect respectively. Trials with response times exceeding 10 seconds were removed before the analysis (0.08%).

We compared location change detection performance as indicated by the sensitivity measure across conditions using a 2 (eye movements: free view, fixation; within) x 2 (configuration: global, single; within) repeated-measures ANOVA (see Fig 2). Importantly, there was a significant interaction effect of eye movements and spatial configuration, F(1, 55) = 7.30, p = .009, η_p² = .12. That is, the configuration benefit was stronger for the free view trials than for the trials with enforced fixation. This suggests that the ability of planning and performing shifts of overt visual attention by eye movements supports the processing of spatial configurations. Furthermore, there was a significant main effect for eye movements F(1, 55) = 15.74, p < .001, η_p² = .22 and a significant main effect for configuration, F(1, 55) = 44.24, p < .001, η_p² = .45. Due to the interaction effect, we further investigated the conditions with t-tests (see Table 1).

Download:

Fig 2. Sensitivity (d’) across conditions for Experiment 1.

Error bars indicate the standard error of the mean (SEM).

https://doi.org/10.1371/journal.pone.0281445.g002

Download:

Table 1. p-values–t-tests for configurations in Experiment 1 across conditions.

https://doi.org/10.1371/journal.pone.0281445.t001

To investigate the influence of block order, we performed an additional exploratory mixed ANOVA with the factors eye movements (free view, fixation; within), configuration (global, single; within) as well as the group factor block order (fixation/free view, free view/fixation; between). The main effects of spatial configuration, F(1, 54) = 43.92, p < .001, η_p² = .45, and eye movements, F(1, 54) = 15.65, p < .001, η_p² = .22, as well as their interaction, F(1, 54) = 7.39, p = .009, η_p² = .12, remained significant. Importantly, neither the main effect of block order, F(1, 54) = 0.51, p = .480, η_p² = .01, nor any interaction effect including block order, all F(1, 54)s ≤ 1.67, ps ≥ .202, were significant. Therefore, block order did not influence our results.

Within the preregistration of both of our experiments we wrote that we would compare fixation and saccade parameters between hit and false alarm trials in the free view condition. This analysis is not informative for the present research question, but for the sake of completeness we present it within S1 Appendix.

To summarize, the overall ability in performing eye-movements (as compared with enforced fixations) had a significant effect on the processing of spatial configuration. The configuration benefit was larger under the free view condition than under enforced fixation. However, we still observed a configuration benefit also with enforced fixation. This speaks against the strong assumption of the configuration benefit depending on eye-movements. Rather, participants utilized the global spatial configuration of the objects for improving their memory performance and being able of perform free eye movements enhanced the processing of spatial configurations in VSWM.

Experiment 2

With our first experiment, we investigated the influence of overt visual attention on the processing of spatial configurations in VSWM. As predicted, the configuration benefit was stronger in the free view trials than with enforced fixation. What remained unresolved, however, was whether it was the general ability in performing free shifts of overt visual attention or rather the free distribution of overt visual attention during specific task phase, such as during encoding, maintenance, or retrieval, that supported the processing of spatial configurations. Previous research suggests that eye-movements during specific phases, such as during maintenance [13–15], might aid memory performance. That is, participants tend to move their eye to previously encoded locations during maintenance, and fixating the fixation cross during maintenance reduces memory performance [14, 15]. Therefore, in contrast to Experiment 1, we did not enforce fixation throughout the whole trial in our Experiment 2, but we rather enforced fixation only during specific phase (i.e. encoding, maintenance, retrieval). By doing so, we investigated whether enforcing fixation in just one of the three phases might be enough to reduce the configuration benefit as compared with a free view condition.