Binocular contributions to motion detection and motion discrimination during locomotion

Hongyi Guo; Robert S. Allison

doi:10.1371/journal.pone.0315392

Abstract

During locomotion, the visual system can factor out the motion component caused by observer locomotion from the complex target flow vector to obtain the world-relative target motion. This process, which has been termed flow parsing, is known to be incomplete, but viewing with both eyes could potentially aid in this task. Binocular disparity and binocular summation could both improve performance when viewing with both eyes. To separate the binocular disparity and binocular summation and analyse how they affect flow parsing, we tested detection and discrimination thresholds under three viewing conditions: stereoscopic, synoptic (binocular but without disparity) and monocular. Experiment 1 tested motion detection during simulated forward self-motion and when stationary. Experiment 2 and 3 tested motion discrimination in forward and backward self-motion and stationary conditions. We found that binocular disparity significantly improved detection thresholds and discrimination biases, at the cost of lower precision. Binocular summation only significantly improved detection thresholds when stationary. It did not significantly affect detection thresholds during locomotion, discrimination biases, or discrimination precisions. Our results indicated that both binocular summation and binocular disparity contribute to motion detection and motion discrimination, but they affect performance differently while stationary and during locomotion.

Citation: Guo H, Allison RS (2024) Binocular contributions to motion detection and motion discrimination during locomotion. PLoS ONE 19(12): e0315392. https://doi.org/10.1371/journal.pone.0315392

Editor: Guido Maiello, Justus Liebig Universitat Giessen, GERMANY

Received: June 6, 2024; Accepted: November 25, 2024; Published: December 20, 2024

Copyright: © 2024 Guo, Allison. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data are publicly available on the Borelias Dataverse with doi:10.5683/SP3/5RSPKG. However, human participant data will be completely anonymized and all personally identifiable information will be erased, as stated in the ethics protocol and the informed consent.

Funding: This study was supported by a grant from the Natural Sciences and Engineering Research Council of Canada (RGPIN-2020-06061). The funders had no role in the design, data collection, data analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

During locomotion, the directions of objects in the surrounding environment change, creating a motion pattern in our visual field, which is known as optic flow [1]. When an object in the scene also moves, its flow pattern is complex, due to both the object motion and the locomotion of the observer. Nevertheless, the target motion can still be accurately perceived by a moving observer. For example, a sports player can often intercept a moving ball successfully. The flow parsing hypothesis [2–4] suggests that our visual system decomposes the composite flow to estimate the scene-relative object motion. Since its proposal, multiple studies have reported results consistent with the flow parsing hypothesis [2, 4–10].

Flow parsing gain is defined as the ratio of the subtracted flow vector to the locomotion component of the target motion vector [6] (in simpler words, the self-motion component that is subtracted divided by the actual self-motion component). The flow parsing gain serves as a measure of flow parsing effectiveness [6, 8, 9]. When gain = 1, the flow parsing is complete and all the self-motion component is factored out, When gain = 0, no flow parsing occurs. The gain has been reported to be generally below one [6, 8–10] (except for extreme cases such as a very short observation period [11]), suggesting that the flow parsing is incomplete, at least in the experimental conditions of these studies. The flow parsing gain has been found to be unaffected by object speed or the optic flow speed [6].

In the flow parsing process, optic flow plays a primary role in estimating scene-relative object movements [7]. However, integration with other sensory inputs may occur as well. MacNeilage et al. [12] showed that even vestibular inputs can be utilised for flow parsing. The role of binocular disparity in flow parsing has also been investigated sporadically. In one of the earliest flow parsing studies, Rushton and Warren [3] used lateral self-motion to show that observers were able to detect and discriminate target motion during simulated locomotion. In their stimulus [3], the target (a probe) was rendered either closer or farther than the fixation dot, and to correctly parse the moving direction of the target, observers had to use the correct distance order between the fixation dot and the target dot. This is because a lateral head movement produces motion parallax in opposite directions for a target closer to the observer than fixation compared to one further away than the fixation. Binocular disparity was chosen as the cue to the depth order in their experiment. Their observers could discriminate the target motion direction, which indicated that the visual system can capitalise on binocular disparity in this task. However, the stimulus was carefully chosen so that binocular disparity was the only available cue that could indicate the relative distances among their conditions. In the real world and in a more realistic VR scene, there could be multiple distance and depth cues, and some are monocular cues, such as object size and object height in the visual field. It is unknown whether observers still rely on binocular disparity in more realistic scenes, where multiple and potentially redundant cues are available.

Flow parsing involves removing the self-motion flow component from the target flow vector. When the locomotor speed increases, the optic flow increases and thus what needs to be cancelled increases. For accurate flow parsing, the discounted flow vector should scale with optic flow speed, which in turn scales with the locomotion speed. This is indeed what the visual system does, as shown by Niehorster and Li [6]. Binocular disparity has been found to enhance vection strength, perceived locomotion speed and perceived flow speed [13, 14]. If stereoscopic viewing changes the perceived locomotor speed, then this might impact flow parsing if, for example, this estimated locomotor speed is used to compute the component to be subtracted. Thus, stereoscopic information could have an indirrect effect, even if the optic flow pattern is unchanged.

In sum, there is evidence that flow parsing is a multisensory task [11, 12] in which binocular disparity has a role [3]. The locomotion component of the flow vector varies with distance, which can be estimated from the disparity [13, 14]. However, to the best of our knowledge, no study has compared monocular and binocular viewing conditions to closely examine how binocular viewing affects flow parsing. Realistic, complex scenes provide more depth cues than binocular disparity. Moreover, besides the provision of depth and distance cues, there can be other ways in which binocular viewing affects perception. Binocular summation is the term used to describe the improved performance when perceiving with two eyes than with one eye, and its proposed mechanisms include probability summation, neural summation, and improved accommodation [15]. Thus, a performance difference between monocular and binocular viewing conditions could be caused by binocular summation, binocular disparity, or both. Our experiments aimed to measure the effects of binocular viewing while carefully separating the possible effects caused by binocular summation and binocular disparity.

Experiment 1

In this experiment, participants detected and indicated the moving target among four possible candidates. We measured detection thresholds with an adaptive staircase method. We hypothesise that both binocular summation and binocular disparity benefit the motion detection task by reducing the detection thresholds.

Materials and methods

Participants.

Twelve participants, aged 19–54 years (M = 30, SD = 10.08, eight males, four females), were recruited. The recruitment period started 1 October 2021, and ended 31 December 2021. Participants were students and vision scientists from York University, including the authors. All participants, except for the authors, were naive to the purposes of the experiment. All participants had normal or corrected to normal visual acuity, and had stereo acuity better than 100 arcseconds (tested with the Stereo Fly Test from Stereo Optical). Their interpupillary distances (IPD) were measured, and the system was adjusted accordingly. Participants provided informed consent. The study was approved by the Office of Research Ethics (ORE), and followed extra health and safety guidelines during the COVID-19 pandemic.

Apparatus.

The experiment was conducted in the wide-field stereoscopic environment (WISE) at York University (Fig 1). WISE is a unique VR environment designed for vision experiments. It consists of a wide concave projective screen which covers almost the entire field of view of the user, and a movable platform with a viewer’s seat. The image on the projection screen was projected and hardware blended by 8 Christie Mirage WU-L stereoscopic projectors, each of them is controlled by a client computer (HP Z820 workstation with an nVidia Quadro k5000 graphics card). The projectors had a refresh rate of 120 Hz, which provides 60 Hz per eye when viewing stereoscopically with active shutter glasses. The lenses of active shutter glasses were opaque half of the time, so the luminance was greatly reduced. To optimise visual performance of the display and eliminate possible distractors, lights in the lab were turned off when WISE was in use. The frame timings among all projectors and the active shutter glasses were in hardware sync. Six WorldViz PPT infrared camera sensors tracked the two infrared markers mounted on the active shutter glasses. The observer’s head positions, computed by WorldViz PPT Studio, were passed to the VR software via VRPN (Virtual-Reality Peripheral Network) for real-time rendering based on the calculated eye-positions. The program for rendering the virtual environment was created with WorldViz Vizard, and the experiment was coded using Python 3 and the Psychopy package [16].

Download:

Fig 1. The wide-field stereoscopic environment (WISE) at York University.

The user sits in the seat and views the projective screen, on which eight stereoscopic projectors project images at 120 Hz, or 60 Hz per eye when viewed stereoscopically with active shutter glasses. Almost invisible in the figure, six infrared sensors are also installed for head tracking or motion tracking. The seating platform can be removed and replaced with a standing platform or a treadmill, but for experiments in this paper the platform was in place as shown.

https://doi.org/10.1371/journal.pone.0315392.g001

Stimuli and procedure.

An example stimulus is shown in Fig 2. Participants were presented with a virtual environment consisting of a background scene, four balls, and a fixation cross. On each trial, either a simulated forward locomotion or a stationary scene was presented for 0.5 s. The background scene was a hallway with a ceiling, a floor and pillars on both sides. The floor and ceiling had a grid pattern. The height of the ceiling was 3.4 m and the simulated eye height was 1.7 m. The fixation cross was placed at eye height in the centre of the observer’s FOV, 20 m in front of the observer. Participants were instructed to always keep fixation on the fixation cross whenever it was visible. Four golden balls (targets) which were 0.1 m in diameter, were placed in the four quadrants of the visual field, so that one target was inside each quadrant. Their distances to the sagittal plane and their distances to the horizontal plane that contains the cyclopean eye were both 0.4 m, and the initial distance to the frontoparallel plane that contained all four objects was 3 m. The eccentricity of each target was 10.7°, and the size was 3.7°, at the start of each trial. For a typical 60 mm IPD, the target has an initial horizontal binocular parallax of 0.95°, and the change in horizontal disparity due to locomotion would be 0.32° for a stationary target. On each trial, one of the targets moved relative to the scene in one of the following four directions:

Approaching: moving in the direction which pointed from the front to the back of the observer.
Receding: moving in the direction which pointed from the back to the front of the observer.
Expanding: moving in the frontoparallel direction which pointed away from the centre of the participant’s visual field.
Contracting: moving in the frontoparallel direction which pointed toward the centre of the participant’s visual field.

Download:

Fig 2. Stimulus of experiment 1.

https://doi.org/10.1371/journal.pone.0315392.g002

Since forward motion was simulated, a stationary, expanding or approaching target will move away from the focus of expansion (FOE) of the optic flow field. A receding or contracting target will only move toward the FOE if its speed is large enough, but this should rarely happen as the staircase converges to the threshold.

Each trial was divided into the following three phases. The next phase started immediately after the previous phase ended, and the next trial started immediately after the third phase of the previous trial ended.

Standby phase: During this phase, the background scene and the fixation cross were rendered. No target was visible, and the viewpoint was stationary. The duration of this phase was 0.5 second.
Motion phase: Once this phase started, all four targets appeared. A random target moved in one of the four possible directions. In the locomotion condition, the viewpoint started moving in the forward direction. The duration of this phase was also 0.5 second. The motion speed of the moving ball was controlled by the staircase algorithm (see below).
Response phase: The entire scene was covered with a grey colour, producing a blank display. The trial was paused until the participant responded. Then, this trial would end, and the next trial would begin immediately.

The task for the participants was to detect the moving target and report which one of the four targets moved during the trial. The probability of each target moving, and therefore chance performance, was 25%. The participants reported their choice by pressing the corresponding button on the controller. They did not need to specify which direction they saw the target moving. We used the threshold of the target motion speed for measuring participants’ motion detection performance. We expected this relationship to be monotonic within the range of motion used in this experiment: better performance corresponded to being able to detect the motion of the correct target under lower speed. We adopted an adaptive staircase algorithm, QUEST [17] to estimate the threshold. Since the task was a four-alternative forced choice (4AFC) task, the staircases targeted the 62.5% point on the psychometric function, which is the midpoint between 25%, guessing, and 100%, perfectly correct. One staircase of length of 50 trials was employed for each motion direction, making a total of 200 trials in a block. The staircases were interleaved randomly and the target to move was selected randomly, so that neither the moving target nor its moving direction was predictable. Participants were tested in two locomotion conditions:

Forward locomotion: the viewpoint moved forward at 1.4 m/s during the motion phase. This will cause the scene-stationary targets to move away from the focus of expansion at approximately 6.28°/s.
Stationary: the viewpoint was always stationary.

and three viewing conditions:

Stereoscopic: the stereoscopic scene was rendered with a binocular disparity corresponding to the participant’s IPD.
Synoptic: the same scene, which would be seen from a hypothetical central (cyclopean) nodal point, were presented to both eyes. Since the left eye and the right eye were presented the same images, no binocular disparity was introduced.
Monocular: the right eye was presented the same scene as was presented in the stereoscopic condition; the left eye only saw a grey field, which had a similar brightness as the right scene.

Each participant was tested in all three viewing conditions and two locomotion conditions across six blocks. The order of the blocks was counterbalanced across participants to mitigate possible order effects. Before the first block, participants were given a practice of 10 trials to familiarize themselves with the task, under the same conditions as their first block. The participants were allowed to practice for more trials if they wished to. The experimental blocks were self-paced, and they were mostly finished within 10–15 minutes. Participants had a two-minute break in between the blocks. They were allowed to take a longer break if desired.

Data analysis.

For each staircase, a threshold of target motion speed (v₁, in m/s) was estimated as a measure of performance. To translate it into an angular metric, we obtained the angular motion of the target by calculating the angle (α₁ in degree) between the egocentric direction of the target at the start and that at the end of the motion phase, if the target moved at v₁. Similarly, let α₀ be the angular motion of a scene-stationary target under the same locomotion. We define α₀ and α₁ to be positive when the movement vector points outward, and negative when it points inward. Directions can be assigned in this way because the optic flow patterns caused by target motions and locomotion were always radial. A simple example of a positive α₀ and α₁ is a scene-stationary target during a positive locomotion, in which α₀ and α₁ are both approximately 3.14°. The (angular) relative motion threshold (rmt) was defined as the following: Thus, no matter whether the observer stays stationary or moves forward, rmt remains a measurement of target movement relative to the scene, and at the same time it is an angular term, suitable for studying visual perception.

We can also express the threshold in terms of angular speed by taking the rmt per second. Thus, the (relative) target motion speed threshold is: (1)

The thresholds that are mentioned in the following sections are all rmst, and their unit is °/s, unless otherwise specified. The threshold for each condition was calculated for each participant, and an example is shown in Fig 3. To analyse performance, the mean rmst across the locomotion and viewing conditions were compared. We expect that 1. the synoptic detection threshold is lower than the monocular detection threshold, and that 2. the stereoscopic detection threshold is lower than the synoptic detection threshold.

Download:

Fig 3. Sample result of experiment 1.

This figure shows the data for contracting target movements, during forward locomotion of a participant. Left: The target motion speed as a function of trial number. Centre: The fitted psychometric functions in the three viewing conditions. Dashed lines show the fitted 62.5% thresholds. Each dot represents 10 trials. Right: Thresholds shown by viewing condition. Error bars show the 90% confidence intervals.

https://doi.org/10.1371/journal.pone.0315392.g003

A three-factor (locomotion × direction × viewing condition) repeated-measures ANOVA and paired t-tests were performed for comparing means. The ANOVA and paired t-tests were performed in R with the anova_test and pairwise_t_test functions from rstatix package. We used the “auto” option provided by anova_test for correction of the degrees of freedom (DF). This means that the Greenhouse-Geisser correction was applied when the sphericity assumption was not met. The t-tests were two-sided, and the p values were adjusted with the Holm method.

Results

Effects of target motion direction and its interaction with locomotion condition.

As depicted in Fig 4, thresholds were much smaller when observers were stationary, indicating a strong main effect of locomotion, which was supported by the ANOVA test (F(1,11) = 102.028, p < .001, ). ANOVA results did not support a significant main effect of target motion direction (F(1.05,11.5) = 1.695, p = .219, ); however, there was a significant interaction between target motion direction and locomotion (F(1.06,11.66) = 5.995, p = .030, ).

Download:

Fig 4. Result of experiment 1.

Target detection thresholds, measured by taking the velocity difference between the moving target’s motion speed vector and the motion speed vector of a stationary target. Lower and upper box boundaries represent 25 (Q1) and 75th (Q3) percentiles respectively, line inside box shows the median and red dot represents the mean. Lower and upper error lines show Q1—1.5 interquantile range (IQR) and Q3 + 1.5 IQR. We use a logarithmic scale to show the data from both conditions.

https://doi.org/10.1371/journal.pone.0315392.g004

To analyse the interaction between directions and locomotion, the directions can be grouped either by the apparent optic flow patterns (contracting or expanding), or by the planes the direction vectors lie in (parasagittal or frontoparallel). Threshold comparisons by optic flow patterns are presented in Fig 5. Paired t-tests show that when stationary, inward motion thresholds were significantly lower (t(71) = 11.9, p_adj < .001) than outward motion thresholds. However, during forward locomotion, inward motion thresholds were significantly higher (t(71) = 4.24, p_adj < .001). While stationary, the more sensitive detection of centripetal motions we observed is consistent with previous studies [18, 19]. During forward locomotion, the observers were less sensitive to centripetal motions, possibly because the globally available expansive optic flow suppressed the responses of centripetal-tuned neurons. When comparing thresholds between parasagittal and frontoparallel target motions, the difference was not significant (stationary t(71) = .667, p_adj = .508; locomotion t(71) = .146, p_adj = .884).

Download:

Fig 5. Grouping the motion directions in experiment 1.

Target detection thresholds grouped by their optic flow direction. Approaching or expanding motions are classified as outward. Receding or contracting motions are classified as inward. When stationary, observers detected inward motions better, but during forward locomotion, they detected outward motions better. For explanation of box plot see Fig 4.

https://doi.org/10.1371/journal.pone.0315392.g005

Effects of viewing condition.

The results grouped by viewing condition are shown in Fig 6. ANOVA indicated a significant main effect of viewing condition (F(2,22) = 19.183, p < .001, ) as well as an interaction effect between viewing condition and locomotion (F(2,22) = 19.409, p < .001, ). No significant interaction between viewing condition and target motion direction was found (F(6,66) = 1.973, p = 0.082, ). Since we found an interaction, we performed paired t-tests separately for stationary and locomotion conditions. When stationary, the monocular threshold was higher than both synoptic (t(47) = 5.088, p_adj < .001) and stereoscopic (t(47) = 4.475, p_adj < .001) thresholds. The stereoscopic threshold was 12% higher than the synoptic threshold (t(47) = 2.565, p_adj < .014). However, during locomotion, monocular and synoptic did not differ significantly (t(47) = 1.829, p_adj = .074), but the stereoscopic threshold was 24% lower than the synoptic threshold (t(47) = -5.738, p_adj < .001), and 29% lower than the monocular threshold (t(47) = -6.342, p_adj < .001).

Download:

Fig 6. Experiment 1 results by viewing conditions.

Refer to Fig 4 for explanation for box plot.

https://doi.org/10.1371/journal.pone.0315392.g006

Discussion

Tests of our main hypotheses and analysis of the interaction effect between the target motion direction and the locomotion condition suggests that (1) the participants were better at detecting inward (receding or contracting) motions when stationary, and (2) better at detecting outward (approaching or expanding) motions when moving forward. In our stimulus, when stationary, inward moving targets produced centripetal motion, while outward moving targets produced centrifugal motion. The first finding is consistent with the centripetal bias in sensitivity [18, 19]. In the forward-locomotion condition, an expanding optic flow was presented. The second finding is consistent with the study by Royden and Moore [20], which found that faster outward motions are detected more easily than slower outward motions in an expanding optic flow field.

Our finding of a facilitatory effect of binocular viewing was in line with existing evidence [3, 4] that binocular disparity can be integrated in the flow parsing task, and we extended it to linear forward locomotion. Furthermore, we separately examined the effect of binocular summation and binocular disparity. The observer’s performance could benefit from binocular viewing in at least two ways: binocular summation [15] of the monocular cues, and the integration of an extra depth cue—binocular disparity. We attempted to separate binocular disparity and binocular summation by testing three levels of viewing condition (monocular, synoptic and stereoscopic). The advantage of synoptic viewing over monocular viewing when stationary can be explained by binocular summation, and the advantage of stereoscopic viewing over synoptic viewing during locomotion can be explained by the facilitation from binocular disparity. Our results suggest that binocular viewing always had a significant facilitatory effect on motion detection regardless of locomotion conditions, but the cause of facilitation was different when stationary compared to during locomotion. The improvement, when stationary, reflects binocular summation. But during locomotion, it was binocular disparity. When stationary, all motion in the optic array was caused by object motion. The motion signals were available monocularly and could be strengthened by binocular summation. This can explain why binocular summation facilitated this task. The depth information inferred by binocular disparity would not be helpful when detecting relative image motion by comparison of the four motion signals. During locomotion, accurate flow parsing requires some information or assumption of the distance. For an observer moving forward at a linear speed v, the angular velocity ω of an object at an eccentricity of ϕ and a distance of r is: (2) Thus, the optic flow vector ω is inversely proportional to the distance r. In the flow parsing process, ω is what needs to be factored out. Binocular disparity could improve the accuracy of r to enable a more accurate flow parsing.

Experiment 2

In this experiment, the participant discriminated target motion directions during forward or backward locomotion, or when stationary. Similarly to experiment 1, we hypothesise that both binocular summation and binocular disparity facilitate the flow parsing task. To be specific, we hypothesise that there will be (1) an improvement in precision and accuracy between monocular and synoptic conditions, and (2) an improvement in precision and accuracy between synoptic and stereoscopic conditions.