Multiple Strategies for Spatial Integration of 2D Layouts within Working Memory

Tobias Meilinger; Katsumi Watanabe

doi:10.1371/journal.pone.0154088

Abstract

Prior results on the spatial integration of layouts within a room differed regarding the reference frame that participants used for integration. We asked whether these differences also occur when integrating 2D screen views and, if so, what the reasons for this might be. In four experiments we showed that integrating reference frames varied as a function of task familiarity combined with processing time, cues for spatial transformation, and information about action requirements paralleling results in the 3D case. Participants saw part of an object layout in screen 1, another part in screen 2, and reacted on the integrated layout in screen 3. Layout presentations between two screens coincided or differed in orientation. Aligning misaligned screens for integration is known to increase errors/latencies. The error/latency pattern was thus indicative of the reference frame used for integration. We showed that task familiarity combined with self-paced learning, visual updating, and knowing from where to act prioritized the integration within the reference frame of the initial presentation, which was updated later, and from where participants acted respectively. Participants also heavily relied on layout intrinsic frames. The results show how humans flexibly adjust their integration strategy to a wide variety of conditions.

Citation: Meilinger T, Watanabe K (2016) Multiple Strategies for Spatial Integration of 2D Layouts within Working Memory. PLoS ONE 11(4): e0154088. https://doi.org/10.1371/journal.pone.0154088

Editor: Marko Nardini, Durham University, UNITED KINGDOM

Received: July 30, 2015; Accepted: April 8, 2016; Published: April 21, 2016

Copyright: © 2016 Meilinger, Watanabe. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by the Japanese Society for the Promotion of Science and the Humboldt Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In daily life, people experience rooms, buildings or neighborhoods, but also information on displays from successive gazes and views. Although they might never see the whole environment at once, they nevertheless develop a grasp of its overall structure. To form this, observers must spatially integrate the separately experienced spatial information into a common reference frame [1–3]. The underlying processes are still widely unknown.

At least two levels of spatial integration can be distinguished: integration across gazes into a common view and integration across views. For example, when learning an object layout from a single viewpoint, multiple eye fixations may eventually be integrated into a common view. Typically, visual information largely overlaps between gazes. Such overlap is not necessarily present when integrating multiple views, for example, opposite room views. Furthermore, integration across gazes happens on a short time scale and is usually examined within 2D screens [4–6]. Integration across views may happen in 2D (e.g., on a screen) and in 3D, both on a short and long time scale (e.g., when moving within a room or when revisiting a room). The present study examined view integration in 2D within working memory taking into consideration the results on spatial integration obtained from 3D short- and long-term learning [1,3,7–10].

Unless given incentives and sufficient time to integrate beforehand, spatial integration is mentally costly: performance is better when acting on separately learned spatial layouts compared to acting on the combination of the two layouts [11–15]. One part of integration costs involves transforming misaligned spatial information into a common reference frame [3].

One key issue in the integration of spatial information is the question of which reference frame (i.e., coordinate system) is used for integration. For integration across gazes, retinal and non-retinal (i.e., head-, torso-, or environment-based) reference frames may be used. Despite the subjective constancy of our surrounding world, it has been shown that the whole visual field is not automatically integrated across gazes [16]. However, adaptations to line orientation, form, or faces persist across gazes [17], thus indicating the automatic updating of certain attended features across gazes. Visual landmarks are used as an environmental reference to locate objects across gazes [5,6]. If sufficient time is given, participants can integrate object locations across gazes within a single view and memorize it [2].

When integrating across views, multiple solutions for integrating reference frames are possible. For example, when learning two misaligned layouts (layout 1 and layout 2) integration could happen in the reference frame of layout 1 or layout 2, or in the reference frame from which the information is used afterwards. Additionally, an independent reference frame might be used; for example, one along a very salient orientation, such as the main axis of the surrounding room. Prior research showed indications of all four cases [1,3,7–10]. Why would participants use such a wide variety of reference frames for integration? Prior examinations differed considerably in their methodology using different stimuli, learning time, or providing the possibility to update information between presentations. The present study aim is to construct a single setup within which these factors could be isolated from each other, varied, and tested for their responsibility of triggering a certain reference frame for integration. In the following, we will review prior research, isolate potentially crucial factors, describe a setup within which these factors can be tested, report on experiments that examined these variations, and discuss their implications.

Evidence for integrating within the reference frame of earlier experienced information comes from studies in which participants learned layout 1 separately and, then, layout 1 and 2 together [1,7–10]. Subsequent memory tests involving both layouts required integrating information from both layouts. For example, Kelly and McNamara [8] had participants learn a layout of objects on the floor. Afterwards, a second layout was added and both were learned together from the same or from a different viewpoint. Subsequent imagined perspective-taking tasks indicated that participants used the reference frame of the first layout (i.e., its learning perspective) to also encode information from the second layout. Similar results were obtained within a virtual environment [7]. These studies also tested conditions in which the second viewpoint was more salient than the first viewpoint (e.g., because of its initial structure or it was aligned with the walls of the surrounding room). Here, spatial information might be reorganized toward the reference frame of the second viewpoint [7,8]. Participants also used the orientation of the overall layout for reference (i.e., the form of layout 1 and 2 seen together) rather than the intrinsic orientation of the individual layouts when learning from one viewpoint [1].

Results suggest that participants used a reference frame established on prior information for encoding subsequently added information unless another reference frame was very salient in which case it was used instead (e.g., the main intrinsic orientation of the overall layout or the surrounding room). One characteristic of these tasks is familiarity. Participants learned object layouts in the order of minutes, which were usually terminated only after reaching a criterion level in a pointing task. Therefore, participants extensively familiarized themselves with the layout and sometimes also the pointing task. This familiarity likely yielded a well-established memory trace and only afterwards did they experience additional information with the prior layout always visible. As a first goal, we examined whether familiarity with a task together with familiarity with the layout as established by the self-determined learning time would result in prioritizing earlier reference frames.

In the aforementioned experiments, participants used earlier established reference frames or later salient ones. Salient orientations may be ones aligned with a grid layout easy to verbalize, for example, by rows and columns, [18], or aligned with the surrounding walls [19]. The opposite case of excluding any salient orientations that could provide a reference frame for integration is rather difficult, if not impossible. Therefore, in prior studies, salient orientations were either earlier presented information or later presented information. To account for this shortcoming, secondly, we aimed to balance salient layout orientations with the use of earlier and later reference frames aligning the layout equally often with each reference frame.

Another characteristic of prior studies is that both layouts could be learned together during the second presentation when both were visible. However, often, relevant spatial information cannot be perceived at the same time; for example, when integrating the front and back view of a room, a house, or views of multiple rooms. These views are experienced separately and must be brought together without profiting from having them present within a single view. Our third goal, therefore, was to examine whether prior results also generalize to spatial integration of separate experiences.

Prior experiments indicated the use of earlier and salient later reference frames. However, one study also suggested the integration within the reference frame in which participants acted [3]. In this study, participants were required to plan and walk the shortest path across floor tiles that lit up briefly during two presentations. Between presentations, participants changed their viewpoint by walking around the tiles. Participants were thus able to update the reference frames of prior experiences to their later viewpoint (i.e., they could memorize the tiles relative to their body and actualize these locations while walking around the tiles). Participants integrated the two views upon the lit-up tiles in the reference frame in which they conducted the task (i.e., in which they started walking across the tiles). This acting reference frame was either the frame of the second presentation or it was the viewpoint of the first presentation when they walked back to it before acting. It remains unknown why participants used the acting reference frame for integration. Did they do so because they always updated all spatial information and did so until they acted?, or because they knew beforehand within which reference frame they acted afterwards and, thus, transformed the spatial information into this reference frame, maybe even during the presentation of the tiles. As the fourth goal both possibilities were examined within the present study. We also disentangled the reference frames of acting from the first and second presentation.

In order to examine these questions, we conducted four experiments in which participants were required to integrate separate layout parts. Experiment 1 acted as a baseline. Here, we wanted to see which reference frames participants spontaneously used. Experiment 2–4 examined whether this baseline pattern could be influenced by familiarity, updating, and knowing within which orientation to act afterwards. In Experiment 2, participants familiar with the task could self-determine how long they watched each stimulus. We expected them to prioritize the reference frame of earlier presented information. In Experiment 3, the layout visibly moved and rotated from its position during the first presentation to its position and orientation during the second presentation. The reference frame orientation of the first presentation could be updated to the orientation of the second presentation. Here, we expected prioritizing the reference frame of the later presentation. In Experiment 4, participants knew beforehand in which reference frame orientation they had to act. Here, we expected reliance on the acting reference frame.

Experiment 1: Baseline

Experiment 1 was conducted to obtain a baseline for spatial integration strategies in our spatial integration task.

Methods

Participants.

A total of 16 women and 21 men participated in the experiment. They were on average 21.6 years old (SD = 3.1). Three additional participants (two women) were not significantly better than the chance rate (see below) and were not included. Participants were recruited through a university participant panel. They gave written informed consent before conducting the experiment and were paid for their participation. The experimental procedure was approved by the institutional review board of The University of Tokyo.

Materials and tasks.

As illustrated in Fig 1, the participants' task was to determine the form of a layout of three objects that were presented in two parts on a 5 × 5 cell grid. The layout consisted of a “∧” sign, here called triangle, a rectangle with a small bar perpendicular to one of its sides, and a circle with a dot at the center. The three objects formed an L-shape with the rectangle placed left or right of the triangle and the circle above or below the triangle. A trial consisted of three screens within which the symbols were displayed. In screen 1, the rectangle and triangle were visible; in screen 2 the triangle and circle; and, in screen 3, only the rectangle. Participants used the triangle in screens 1 and 2 to infer the location of the circle relative to the rectangle. For this, they had to take the orientation of the triangle into account as the overall layout might have rotated between screens. Between screens 2 and 3, it might have rotated again. In screen 3, a mouse pointer appeared in the middle of the rectangle and participants were asked to click on the grid cell containing the circle using the mouse. We measured latency and correct responses.

Download:

Fig 1. Illustration of three example trials used in the experiment.

In screens 1 and 2, participants saw two parts of a layout. They integrated these parts to indicate the location of the circle relative to the rectangle presented in the center of screen 3. The correct location is displayed by the grey circle, which was not visible in the experiment. The presented layouts had the same orientation in all, none, or two out of the three screens. For example, in the top line the layout is oriented upwards in screen 1 and 2, but to the left in screen 2. The timeline at the bottom indicates presentation times for each screen and for a blank white screen in between. In screen 3, we measured the time until participants clicked the grid cell in which they located the circle.

https://doi.org/10.1371/journal.pone.0154088.g001

We used three objects as the minimum number of objects, which allows us to examine the integration of two separate presentations. With the grid, we could determine exact responses (i.e., indicating the correct or a wrong grid cell) and also assign eye fixations concurrently as measured throughout each trial. A layout with objects not adjacent could have posed extra difficulty, which we avoided. Three objects in a row are adjacent to each other as well. However, to not add variation to the form of the layout, we only used the L-shape. Due to the L-shape, the circle was always either in the cell left-above, left-under, right-above, or right-under the rectangle. Therefore, the chance level was defined as 25%. This is a conservative estimate as participants could click anywhere on the screen, which would have resulted in a much lower chance level. However, the chance rate was only used to identify participants with major problems in the task and, for this, 25% seemed more realistic.

Intrinsic orientation.

In screen 1, the rectangle and triangle were always displayed next to each other at one side of the grid (i.e., left as in Fig 1, or right, top, or bottom). Throughout the experimental trials, their relative orientation was always as displayed in Fig 1. Although they always pointed in the same direction, the common direction varied relative to the screen. Thus, in screen 1 they jointly pointed to the top, as in screen 1 of Fig 1, or to the right, left, or bottom. The triangle was equally often left and right of the rectangle, and the circle was equally often above and below the triangle. We did not change the relative orientation of the rectangle and triangle to each other. As indicated in the test experiments, that would have been too demanding for many participants. As a consequence, the overall layout showed a clear intrinsic orientation throughout all of the experiment’s trials.

Please note that each trial consisted of learning a layout. In many prior experiments, participants learned only a single overall layout and their acquired knowledge was tested multiple times. The present approach has the advantage in that salient intrinsic orientations can be balanced with earlier, later, and acting reference frames.

Orientation match conditions.

As displayed in Fig 1, layout presentations between two screens coincided or differed in orientation. We examined all five possibilities of matching and non-matching orientations between the three screens: all three screen orientations matched (condition “all same”), all screen orientations differed (condition “all different”), or the orientations of two screens matched each other, but not with the third screen: screens 1 and 3 had the same orientation, but differed from the orientation of screen 2 (condition “1 & 3 same” see Fig 1, first line); screens 1 and 2 had the same orientation, which differed from the orientation of screen 3 (condition “1 & 2 same” see Fig 1, second line); or screens 2 and 3 had the same orientation, which differed from the orientation of the layout in screen 1 (condition “2 & 3 same” see Fig 1, third line). Participants had to integrate the differently oriented layout parts within a single reference frame in order to fulfill the task. To do so, differently oriented presentations had to be aligned with each other. This alignment could happen in different reference frames: the reference frame within which earlier information was presented (i.e., the reference frame of screen 1 or RF1), the reference frame where later information was presented (i.e., the reference frame of screen 2 or RF2), or the reference frame within which participants acted (i.e., the reference frame of screen 3 or RF3). Irrespective of the reference frame in which the two layouts were integrated, the integrated layout had to be transformed to the reference frame of screen 3 from which the answer was given. This potentially required transformations to align the layout parts for integration and for giving answers. Aligning misaligned spatial information is known to increase errors/latencies [2,3] and the required translations were identical in all orientation matched conditions. The error/latency pattern between the five conditions was indicative of the required alignment costs and of the reference frame used for integration.

The following predictions are illustrated in Fig 2. When integrating spatial information in the acting reference frame (i.e., RF3, Fig 2 bottom line), information from RF1 and RF2 must each be rotated into RF3. If RF3 is identical to RF1 and RF2 in the “all same” condition, no rotation costs occur and participants should perform best. One rotation is required if RF3 is identical to either RF1 or RF2 as in the “1 & 3 same” and “2 & 3 same” conditions. Participants should perform the second best. Two rotations are required if RF3 is different from both RF1 and RF2 as in the “1 & 2 same” condition and in the “all different” condition; participants should perform worst. These transformations can occur only within screen 3, as participants know RF3 orientation only after screen 3 onset. Therefore, the pattern is predicted for both errors and latency: all same < 1 & 3, 2 & 3 < 1 & 2, or all different

Download:

Fig 2. Illustration of required reference frame transformations.

Spatial information presented in screen 1 and 2 is integrated within the reference frame of screen 1 (top), screen 2 (middle), or screen 3 (bottom). Regardless of which reference frame participants integrated, they always reacted on the integrated layout in screen 3. Arrows indicate required transformations. Transformation costs for dotted arrows might be negligible due to updating or sufficient time.

https://doi.org/10.1371/journal.pone.0154088.g002

When integrating within RF1 in which earlier information was presented, information from RF2 must be transformed into RF1 and, from there, into RF3 (Fig 2 top row). Conditions in which RF1 and RF2 as well as RF1 and RF3 are identical should profit, yielding the following error/latency pattern: all same < 1 & 2, 1 & 3 < 2 & 3, or all different. To integrate in the later RF2, information from RF1 is transformed into RF2 and, from there, into RF3 (Fig 2 middle row). Conditions in which RF1 and RF2 and RF2 and RF3 are identical should profit, yielding the following error/latency pattern: all same < 1 & 2, 2 & 3 < 1 & 3, or all different.

Rotation center conditions.

From screen 1 to screen 2, the layout either rotated 90° clockwise, 90° counterclockwise (as in Fig 1), or it did not rotate at all. As displayed in Fig 3, the layout rotated either around the screen center (i.e., the middle cell of the grid; “screen rotation”) or rotated around the center of the layout (i.e., the grid point between the rectangle, triangle, and circle; “layout rotation”). The motivation for this variation was to ascertain whether participants used screen-relative coordinates or layout-relative ones (i.e., where the origin of their reference frame was located). In case participants used screen-coordinates, rotation around the screen center should be easier. If they relied on layout-based reference frames, rotation around the layout center might have been easier. In case of no rotation, the layout stayed either at the same spot, which worked as a control condition for “layout rotation,” or the layout moved to the location where it would be after rotating around the screen center, only without rotating. From screens 2 to 3, the layout always moved from its location at one side of the screen to the center. In neither case was an object presented at a location where another object was presented on the screen before.

Download:

Fig 3. Illustration of layout and screen rotations.

Between screens 1 and 2, the presented layout rotated either around the grid cross between the three layout objects or around the center of the screen. Rotation center points are indicated by the black dots, which were not visible during the experiment.

https://doi.org/10.1371/journal.pone.0154088.g003

Condition balancing.

The main variation of the experiments was the orientation match. A pairwise balance was used with rotation center, layout form (i.e., which of the four layouts was used), layout orientation at the start (i.e., whether the rectangle and triangle pointed upwards, downwards, or to the left or right), and rotation direction (clockwise vs. counterclockwise between screens 1 and 2). From all possible trials, a random subset of 60 trials were chosen, which fulfilled these balancing constraints. We used these 60 trials in the experiments.

Timing.

A trial started with a fixation cross presented for 1500 ms. As also displayed in Fig 1, stimuli in screens 1 and 2 were presented for 2000 ms each. This duration ensured that participants had sufficient time to encode the stimuli into working memory as we were not interested in encoding processes [20–22]. Between stimuli presentations, participants saw a blank screen for 500 ms. If participants did not react within 10 seconds at screen 3, this was considered a miss and the next trial was started. The next trial always followed immediately after the previous one.

Setup and procedure.

Participants sat in front of a CRT monitor. The experiment was presented on a rectangular 29 × 29 cm area in the center of the monitor screen with a resolution of 1024 × 1024 pixels. Participants put their heads on a chinrest so their eye height was in the middle of the screen 58 cm away. The experiment ran on a MacBook Pro with Matlab using the Psychophysics and Eyelink toolbox extensions [23]. The code is available upon request.

Participants received written and verbal instructions. They trained for the task for as long as they wanted on a different set of trials. During training, the experimenter ensured that participants understood the task. Then, the eye tracker was calibrated and the experiment started. Trials followed each other and were presented in a random order that was determined individually for each participant. Participants were instructed to react as quickly and accurately as possible. After the experiment, participants completed a questionnaire asking their age, subjective sense of direction, and whether they applied certain subjective strategies (e.g., verbalizing the layout or mentally rotating it). The overall procedure lasted about 30 min.

Eye tracking.

We recorded eye fixations within single grid cells along a trial using an individually calibrated Eyelink 1000 running at 500 Hz. The automatic fixation extraction provided by the software offered from SR research was used. We employed an extraction setting optimized for cognitive experiments, which pools micro-saccades into longer fixations. If a fixated cell was occupied by a layout object—currently visible or not—the fixation was assigned to this object. For example, looking at the correct location of the non-visible circle in screen 3 was considered a circle fixation. Within each screen, we analyzed fixation sequences across objects, ignoring fixations at non-object locations and multiple subsequent fixations at the same location. In the following, we describe fixation patterns from Experiment 1, which were largely representative also for the other experiments.

In screen 1, the rectangle and triangle were displayed. Most participants either fixated on only the rectangle (27% of the cases), looked then at the triangle (24%), or continued going back (31%) and sometimes forth again (7%). Rarely, participants looked only at the triangle (4%) or at the triangle and then the rectangle afterwards (2%).

In screen 2, the triangle and the target circle were visible. Sequences included looking only at the triangle (16%), circle (14%), or continuously looking between the two (27%). In 43% of the cases, participants initially (8%) or eventually looked at the correct location of the rectangle, which was not displayed. This suggests that, at least in some cases, participants had determined the whole layout already in screen 2. However, rectangle fixations were not associated with higher accuracy or faster reactions afterwards, Fs < 1.

In screen 3, the rectangle was always displayed in the center of the screen, but its orientation differed, which was crucial for the task. In most cases, participants looked at the rectangle first, and then continued looking at the correct location(s) of the triangle and/or the circle (40%) or looked at other screen locations (37%) where they might have assumed the layout. Looking at correct (non-visible) layout locations was associated with fewer errors, F(1, 1574) = 57.6, p < .001, and quicker reaction times, F(1, 1293) = 36.7, p < .001, looking at other locations corresponded with higher error rates, F(1, 1571) = 136, p < .001, and latencies, F(1, 1290) = 102, p < .001. Participants seemed to have looked at non-visible layout locations. Those with the wrong conception of the layout might have been more unsure about the target location and thus took longer to react. Sometimes, participants first gazed at the (correct) locations of the triangle (13%) or the circle (10%). However, as we recorded only the first new fixation location after the onset of screen 3, participants might have looked at the upcoming location of the rectangle even before it was displayed and continued from there to their “first” fixation on the triangle or circle. Notably, only in 0.1% of the cases did participants look only at the rectangle and nowhere else on the screen. The tasks were not solved independent of eye-fixations at task-relevant locations. The overall pattern was also similar in the other experiments. We did not find stable differences in fixation patterns, fixation frequency, or duration across experiments as a function of experimental variations.

Data analysis and design.

Not reacting within 10 s (average 0.08–0.36% per experiment) and clicking on the wrong grid cell were considered as error responses. If a participant’s hit rate was not significantly higher than the chance rate of 25% (defined as randomly guessing one of the four grid cells cornering the rectangle in screen 3), their data were not analyzed. We used latency data from correct trials and deleted values deviating more than 3 SD from the overall mean (1–2% per experiment).

Errors and latencies were submitted to a linear mixed model analysis with the within-participants factors for orientation match (5 levels) and rotation center (screen vs. layout). Each of the 10 conditions was repeated 6 times. We used planned pairwise comparisons between the orientation match conditions to examine the predicted patterns. Observing the multiple predicted differences of a pattern in random data is highly unlikely. As predictions were correlated (e.g., all patterns predicted best performance for the “all same” and worst performance for “all different” condition), pairwise comparisons between conditions also more clearly differentiated between the individual patterns than overall similarity with a predicted pattern. For example, condition “1 & 3 same” is predicted to show better performance than “2 & 3 same” when integrating in early reference frame, worse performance when integrating in later reference frame, and no difference when integrating in the reference frame of acting.

In order to estimate the use of the layout intrinsic reference frames, we compared layout orientations within screen 3 with each other using a within-participant linear mixed model analysis (4 orientations). A layout orientation with a rectangle (and triangle) pointing upwards as in screen 1 of Fig 1 was arbitrarily defined as 0°. The layout in screen 3 in the first line of Fig 1 was along the intrinsic layout orientation, while the orientations in screen 3 lines 2 and 3 were not. Full-factorial crossing of test orientation with orientation match and rotation center would have resulted in unequal cell numbers. Crossing was also not possible in Experiment 4, but we wanted to conduct the same analysis in each experiment. Therefore, we did not use full factorial crossing of these three factors. When conducting the analysis in Experiments 1–3, the resulting patterns were highly similar.

Compared to an ANOVA, linear mixed model analysis is less restrictive regarding distribution assumptions [24]. Commonly accepted effect sizes for linear mixed models are not yet available. Thus, we report partial eta squares (η_p²) derived from data aggregated per participant and the respective condition. Unless otherwise explicitly mentioned, all significant results at p < .05 are reported.

All relevant data are within the supporting information S1 Dataset.

Results

Early, late, and acting reference frames.

As shown in Fig 4 (top row), the mean error rates, F(4, 2154) = 11.1, p < .001, η_p² = .12, and mean latencies, F(4, 1696.3) = 13.0, p < .001, η_p² = .15, differed depending on which screen orientations matched. For latency, this was qualified by an interaction with the rotation center, F(4, 1694) = 4.04, p = .003, η_p² = .10; this did not change any main effect of the orientation match. Planned pairwise comparisons showed that performance in the “all same” condition was quicker, Fs > 17.7, ps < .001, η_p² > .15, and more accurate compared to in the other conditions, Fs > 11.7, ps < .002, η_p² > .13. When the case layout orientations differed in all screens, participants responded slower than they did when at least two orientations coincided, Fs > 6.73, ps < .010, η_p² > .06. The results indicated that the participants were sensitive to the amount of rotation between screens. However, no clear strategy as predicted was observed. A visual inspection of the individual performance patterns suggests that roughly equal proportions of participants showed patterns resembling integration within earlier, later, as well as acting reference frames. While this is not a statistically reliable assignment it suggests that there was some variability in how individual participants solved the task.

Download:

Fig 4. Reference frames used for integration as indicated by orientation match conditions.

Mean error rate (left), latency (right), and standard errors as estimated from the marginal means are displayed. Asterisks and daggers indicate significant differences in pairwise comparison.

https://doi.org/10.1371/journal.pone.0154088.g004

Layout intrinsic reference frames.

When looking at layout intrinsic alignment effects, there was a strong effect of layout orientation during testing on latency, F(3, 1700) = 42.4, p < .001, η_p² = .39. When the rectangle was presented upright (cf., the orientation in screen 3, top line of Fig 1) participants reacted quicker than they did when the rectangle pointed to the left or right, Fs > 14.0, ps < .001, η_p² > .18, which was quicker than when the rectangle pointed downwards, Fs > 39.4, ps < .001, η_p² > .35. The more the layout orientation deviated from being upright, the slower the participants reacted. This indicated that the participants strongly relied on layout intrinsic reference frames.

Discussion

The participants in Experiment 1 relied on layout intrinsic reference frames. This is in line with prior integration experiments [7,8] and spatial learning in general (see McNamara et al., 2008 for an overview) in which salient intrinsic reference frames were widely used if present. This replicates work from learning object layouts within a room to learning layouts presented on a screen for a much shorter time. Importantly, the present results show that intrinsic reference frames play a role not only in experiments in which a single layout was learned or where layout 1 was learned first, and then layouts 1 and 2 were learned together. Present results show that intrinsic reference frames also matter when integrating separate experiences never seen together within a single view.

Participants were sensitive to the amount of layout rotations conducted throughout a trial. However, there was no clear prioritizing of reference frames within which to integrate information: earlier, later, or acting reference frames. Individual participants seemed to rely more on specific strategies. If participants exclusively relied on intrinsic reference frames, no effect of layout rotations would have been observed.

The results of Experiment 1 provided a baseline for further comparisons. The setup itself did not prioritize integration in early, late, or acting reference frames as such. In the subsequent experiments (Experiments 2 to 4), we changed the circumstances in a way to introduce such prioritizing based on the considerations established in the introduction.

Experiment 2: Familiarity

Preferences for early reference frames were all found in experiments in which participants learned object layouts within minutes rather than seconds [7–10]. Their acquired knowledge was often tested for accuracy before they proceeded to the test phase. Consequently, participants were familiar with the task and layout. We introduced these circumstances to the present setup to see whether participants would now preferably integrate in the earlier reference frame as well.