## Figures

## Abstract

We measured perceived depth from the optic flow (a) when showing a stationary physical or virtual object to observers who moved their head at a normal or slower speed, and (b) when simulating the same optic flow on a computer and presenting it to stationary observers. Our results show that perceived surface slant is systematically distorted, for both the active and the passive viewing of physical or virtual surfaces. These distortions are modulated by head translation speed, with perceived slant increasing directly with the local velocity gradient of the optic flow. This empirical result allows us to determine the relative merits of two alternative approaches aimed at explaining perceived surface slant in active vision: an “inverse optics” model that takes head motion information into account, and a probabilistic model that ignores extra-retinal signals. We compare these two approaches within the framework of the Bayesian theory. The “inverse optics” Bayesian model produces veridical slant estimates if the optic flow and the head translation velocity are measured with no error; because of the influence of a “prior” for flatness, the slant estimates become systematically biased as the measurement errors increase. The Bayesian model, which ignores the observer's motion, always produces distorted estimates of surface slant. Interestingly, the predictions of this second model, not those of the first one, are consistent with our empirical findings. The present results suggest that (a) in active vision perceived surface slant may be the product of probabilistic processes which do not guarantee the correct solution, and (b) extra-retinal signals may be mainly used for a better measurement of retinal information.

**Citation: **Caudek C, Fantoni C, Domini F (2011) Bayesian Modeling of Perceived Surface Slant from Actively-Generated and Passively-Observed Optic Flow. PLoS ONE 6(4):
e18731.
https://doi.org/10.1371/journal.pone.0018731

**Editor: **Hans P. Op. de Beeck, University of Leuven, Belgium

**Received: **November 18, 2010; **Accepted: **March 11, 2011; **Published: ** April 14, 2011

**Copyright: ** © 2011 Caudek et al. This is an open-access article distributed under the
terms of the Creative Commons Attribution License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original author and
source are credited.

**Funding: **The authors have no support or funding to report.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

The current models of active Structure-from-Motion (SfM) are based on the Helmholtzian account of perception as inverse inference [1]–[5]. According to this approach, the goal of the perceptual system is to infer from the sensory evidence the environmental three-dimensional (3D) shape most likely to be responsible for producing the sensory experience. In order to obtain this goal, the current approach inverts the generative model for the optic flow. Mathematically, this corresponds to an application of Bayes' rule in which first-order optic flow information is combined with information about the observer's motion provided by proprioceptive and vestibular signals [6]. The solution of this “inverse-optics” problem can produce the correct result if some assumptions about the distal objects are satisfied and if the extra-retinal signals are measured with high precision [3].

An alternative theory hypothesizes that the visual system estimates the metric properties of local surface orientation by using retinal information alone. Retinal information “directly” specifies the 3D affine properties of the distal object (such as the parallelism of local surface patches or the relative-distance intervals in parallel directions), but it does not allow a unique determination of its Euclidean metric properties, such as slant [7] (see Figure 1). [8] proposed that the perception of local surface slant can be understood in terms of a probabilistic estimation process. Consider a property of the optic flow which is related to the distal property through a one-to-many mapping. Any estimate based solely on will produce an error equal to . Through learning, however, the visual system may select the function that minimizes , where indexes the instances that could have produced . This approach has proven adequate to explain human passive SfM, but it could be applied to active SfM as well.

A set of circular patches is used to illustrate the slant () and tilt () components of surface orientation [40]. The line at the center of each patch is aligned in the direction of the surface normal. The slant is defined by the tangent of the angle between the normal to the surface and the line of sight . The tilt is defined as the angle between the -axis of the image plane and the projection into the image plane of the normal to the surface (). The dotted lines identify patches with same slant but different tilt magnitudes.

A fundamental difference between these two approaches is that only the first one makes use of information about ego-motion. This difference is very important. In fact, empirical results show that perceived surface tilt depends critically on ego-motion information. A particularly convincing demonstration in this respect has been provided by [9]. In the “active” condition, the observer translated along the -axis while fixating a planar surface with 90 tilt and undergoing rotation about the horizontal axis. The rotation of the surface was paired with the observer's motion, so as to generate a pure compression optic flow. In the “passive” condition, the same optic flow was “replayed” to a stationary observer. Tilt perception was veridical when ego-motion information was available (perceived tilt was 90), but not for the passive observer (perceived tilt was either 0 or 180) – see also [10]–[13].

The purpose of the present investigation is to determine whether observers use information about the speed of head motion to estimate surface slant. To this purpose, we compared the judgments of local surface slant provided by active and passive observers to the estimates provided by two Bayesian models. The two models were constructed (a) by taking into account information about the observer head motion, and (b) without taking into account information about the observer head motion. The empirical data were obtained by asking observers to judge the local slant of virtual and physical planar surfaces from the optic flows generated by normal or slower head translation velocities.

### Surface slant and first-order optic flow

Consider a coordinate system centered at the observer's viewpoint, with the axis orthogonal to the observer's frontal-parallel plane (see Figure 2). Suppose that the observer fixates the surface's point located at , where is the viewing distance. If the observer translates in a direction orthogonal to the line of sight, with translational velocity , or the surface rotates with angular velocity , then the texture elements on the surface will project onto the image plane a velocity field which can be locally described by the following equation:(1)where is the retinal angular velocity, is the angular velocity resulting from the relative rotation between the observer and the surface, and is the relative depth of each surface point with respect to the fixation point.

In the present investigation, we only consider planar surfaces slanted by an
angle along the vertical () dimension. Such
surfaces are defined by equation(2)which, substituted in Eq. 1,
gives:(3)where is the vertical
elevation of a generic feature point. The deformation (*def*)
component (*i.e.*, the gradient) of the velocity field –
which is zero along the horizontal dimension for our stimuli – is given
by(4)Eq. 4 is a good approximation of the
local velocity field produced by a surface patch subtending up to
8 of visual angle. Importantly, Eq. 4 reveals that the
gradient of the velocity field is *not* sufficient to specify the
slant of the surface. In order to specify , in fact, the
knowledge of the relative rotation between the
observer and the planar surface is required. Note that, in general,
depends both on the surface's rotation about the
vertical axis () and on the
translation of the observer:(5)where denotes the
relative velocity of the surface resulting from the movement of the observer in
an egocentric reference frame.

Top panel: bird-view of the viewing geometry. The
and the
axes
represent the horizontal image dimension and the line of sight,
respectively; is the
viewing distance, is the
horizontal head translation velocity, and
is the
relative angular velocity produced by the motion of the observer's
head. Bottom panel: side view of the viewing geometry.
represents
the relative depth of the point **P** with respect to the
fixation point; represents
the elevation of the point **P** with respect to the optical
axis; is the tangent of the angle between the surface
(represented by the red segment) and the fronto-parallel plane.

In general, the ambiguity of *def* could be solved if the visual
system were able to accurately measure the second-order optic flow
(*i.e.*, the image accelerations), but several studies show
that this is not the case [14]–[18]. Alternatively,
*def* can be disambiguated by combining the information
provided by the first-order optic flow and the extra-retinal signals, if some
assumptions are met (see next Section).

### Bayesian slant estimation from retinal, vestibular, and proprioceptive information

The ambiguity of *def* can be overcome by the active observer
under the assumption that the object is stationary – a reasonable
assumption in many real-world situations [12]. If the object is
stationary, and the relative rotation between the observer and the
surface is equal-and-opposite to the observer's motion:
. If information about is obtained from
proprioceptive and vestibular signals, it is thus possible to estimate
.

The Bayesian model presented by Colas and collaborators formalizes this idea
[6]. The
uncertainty in the estimation of the relative motion
is described by a Gaussian distribution
centered at and having an
arbitrary standard deviation (here and in the
following we use capital letters to indicate random variables). The spread of
this Gaussian distribution encodes the noise in the measurement of the
vestibular and proprioceptive signals and the possibility that the surface
undergoes a rotation independent from the observer's motion. By centering
this probability distribution at , Colas et al.
implement the *stationarity assumption*, that is, they favor the
solutions in which the optic flow is produced by the observer's motion
[12].
Colas et al. also consider the possibility that the optic flow is not measured
accurately, or is produced by some degree of non-rigid motion. Under these
circumstances, the surface slant combined with the
relative motion does not produce a
unique *def* value. This further source of uncertainty is
described by a Gaussian distribution centered at
with an arbitrary standard deviation
. By centering this probability distribution at
, Colas et al. implement the *rigidity
assumption*, that is, they favor the solutions in which
*def* is produced by a rigid rotation. A further assumption
is that the slant of the surface does not depend on the relative motion between
the surface and the observer.

Under these assumptions, the problem of estimating local surface slant given the
knowledge of *def* and (the
observer's motion) becomes the problem of identifying the density function
. This probability density function can be found through
Bayes' theorem by applying the rules of marginalization and probability
decomposition.

From the definition of the conditional probability
, by marginalizing over , we
obtain(6)(7)By
the chain rule, we can write(8)Moreover,(9)because, under the rigidity assumption,
*def* depends only on the distal slant
and the relative rotation
;(10)because surface slant is independent from
the observer's relative motion and from the egocentric
motion;(11)because of the chain rule.
Therefore, can be re-written as(12)By
virtue of Eq. 12, Eq. 7 takes the form(13)(14)In
conclusion, Eq. 14 provides a possible solution to the “inverse
optics” problem of estimating local surface slant from the deformation of
the optic flow (see Figure
3). If and *def* are measured with no error,
then peaks at the true slant value
() when the distal surface is stationary. In the presence
of measurement errors, instead, the estimated slant will be biased. The
magnitude of this bias depends on the precision with which
(the observer's motion) and *def*
are measured: the larger , the larger the
under-estimation of slant.

Intensity corresponds to probability. The values reported in the plot
refer to the case of a static plane slanted by
80
() around
the horizontal axis and viewed by an active observer performing a
lateral head translation at a speed that produces a relative
angular-rotation velocity of 0.32 rad/s ().
**Panels a - e**: method for calculating the posterior
distribution. **a**. Prior for frontal-parallel
modeled as
a (half) Gaussian distribution centered at zero. **b**.
Likelihood function generated
by assuming that the *def* measurements are corrupted by
Gaussian noise. **c**. Uncertainty of the relative rotation
between the observer and the planar surface
modeled as
a Gaussian distribution centered on the true value
.
**d**. Product of the likelihood, the prior for
, and the
prior for .
**e**. Posterior distribution produced by the
marginalization over . The
median of the posterior distribution (dotted line) gives the optimal
estimate of surface slant based on the knowledge of *def*
and . The model's prediction (the value 5 in the
figure) gets more and more close to the “true” value of the
slant () as
decreases.

### Bayesian slant estimation from retinal information alone

We propose that the visual system estimates surface slant without considering the
information about head translation velocity (see Figure 4). With reference to the Bayesian
model discussed in the previous section, this means that
, where is the *a
priori* distribution of a random variable representing the amount of
relative rotation between the observer and the surface. In this case, Eq. 14
reduces to(15)

The values reported in the figure refer to the stimulus conditions used in Figure 3. Differently from the “inverse-optics” approach, in this case the distribution is non-informative. Note that, after computing the product of the likelihood, the prior for , and the prior for , the marginalization over produces a posterior distribution that is very different from what is shown in Figure 3. The estimate of surface slant is given by the median of the posterior distribution (dotted line). The posterior median corresponds the point on the hyperbola closest to the origin of the Cartesian axes. The Bayesian posterior median estimator (which is equal to ) is represented by the red dot in panel .

Domini and Caudek showed that this account is sufficient for predicting perceived slant from the optic flow in the case of the passive observer [19]–[28]. They showed that the center of mass of the distribution described by Eq. 15 is equal to , with depending on the spreads of the prior distributions of and of [8]. The center of mass as an estimate of is equivalent to the posterior median, which is the Bayes estimator for the absolute error loss. Indeed, it has been shown that Eq. 15 is a particular case of Eq. 14: The two accounts are indistinguishable when information about the head's translation is unavailable, like in the case of the passive observer [6]. Eqs. 14 and 15, instead, make different predictions for the active observer, when head translation velocity is manipulated.

The importance of for the perceptual
recovery of local surface slant from the optic flow has been highlighted by
[8], [24].
*def* is a one-parameter family of
(surface slant) and (relative angular
rotation) pairs, but not all possible ,
pairs are equally likely. Under the assumption of
uniform prior distributions (bounded between 0 and
, and between 0 and ) for
and , the conditional
probability of a ,
pair given *def* is not uniform, but it
has a maximum equal to , with
(see [24]).

### Rationale of the Experiments

Eqs. 14 and 15 provide two alternative models for the perceptual derivation of surface slant from the optic flow in active vision. The purpose of the present investigation is to contrast them by comparing their predictions to the behavioral data obtained when head translation velocity is manipulated.

In the present experiments, observers were required to produce two different head translation velocities. The first was comparable to the peak horizontal head velocity during normal locomotion [29], the second was 80% slower. This experimental manipulation (a) does not affect the estimate of local surface slant according to Eq. 14 (by assuming that the measurement noise of remains unaltered), and (b) can strongly affect the estimate of local surface slant according to Eq. 15 (because head translation velocity is proportional to ).

## Results

### Perceived surface slant

Active and passive observers judged the perceived slants of virtual or physical planar surfaces. The results indicate that the judgements made by the observers are systematically biased by the head translation velocity (see Figure 5). The same qualitative trends are found for the active and passive viewing of a virtual surface, and for the active viewing of a physical surface.

Fast and slow head translation velocities are coded by red and black,
respectively. The values are expressed in terms of the tangent. The
dashed lines indicate veridical performance. Vertical bars represent
1 standard
error of the mean. (a) *Passive-viewing with a virtual
surface*. The interaction Slant
Velocity
is significant, = 51.04,
= .001. The marginal effect of
the head's translation velocity is significant,
= 7.48,
= .006: On average, the amount
of reported slant is 51% lower for the slow than for the fast
head's translation velocity. (b) *Active-viewing with a
virtual surface*. The interaction Slant
Velocity
is significant, = 47.31,
= .001: The effect of simulated
slant on the response is larger for the faster head's translation
velocity. The marginal effect of head's translation velocity is
significant, = 5.62,
= .018: On average, the amount
of reported slant is 60% lower for the slow than for the fast
head's translation velocity. (c) *Active-viewing with a
physical surface*. The interaction Slant
Velocity
is significant, = 40.81,
= .001. The marginal effect of
head's translation velocity is significant,
= 12.96,
= .001: On average, the amount
of reported slant is 75% lower for the slow than for the fast
head's translation velocity.

According to the Bayesian model described in Eq. 15, perceived surface slant
depends only on the square root of *def*. For the active and
passive viewing of virtual planar surfaces, the observers' judgments of
slant complied with this prediction (see Figure 6). For the active viewing of physical
surfaces, was not the only determinant of the perceptual response,
but the additional contribution of the head translation velocity was negligible.
In the present investigation, therefore, there is no evidence that simulated
slant contributes to the perceptual response beyond what
can explain.

Fast and slow head translation velocities are coded by red and black,
respectively. The values are expressed in terms of the tangent. Vertical
bars represent 1 standard
error of the mean. (a) *Passive-viewing with a virtual
surface*. There is no significant interaction between
and head
translation Velocity, = 0.43,
= .513. There is no significant
effect of head translation Velocity, = 0.57,
= .449. In a no-intercept model
with as the only predictor, the slope is 1.04,
= 20.95,
= .001, and
= .74. No improvement of fit is
found if simulated slant is added to such model,
= 0.455. For a baseline model
with only the individual differences, is equal
to .45. (b) *Active-viewing with a virtual surface*.
There is no significant interaction between
and head
translation Velocity, = 2.50,
= .114; there is not
significant effect of head translation Velocity,
= 0.25,
= .620. In a no-intercept model
with as the only predictor, the slope is 1.88,
= 16.46,
= .001, and
= .69. No improvement of fit is
found if simulated Slant is added to such model,
= −0.22. For a baseline
model with only the individual differences,
is equal
to .33. When analyzing together the data of the passive and active
viewing of the virtual surfaces, significant effects are found for
,
,
= .001, and condition
(indicating a smaller response for the passive observer),
,
= .001. For this model,
is equal
to .90. When controlling for , the
effect of simulated Slant is not significant,
. For a
baseline model with only the individual differences,
is equal
to .43. The estimated standard deviation of the residuals is equal to
0.218 on the scale of the tangent of the slant angle. Given that mean
perceived slant is 0.493, the coefficient of variation is equal to
= 0.44. (c)
*Active-viewing with a physical surface*. For a
no-intercept model with the square root of *def* as the
only predictor, = .72. For a baseline model
with only the individual differences, = .44. If head translation
velocity is added to the model including *def* as
predictor, increases
to .74; increases
to .75 if the interaction between the two predictors is allowed. Even
though this increase in the model's fit is statistically
significant, = 30.15,
= .001, the effect size (as
measured by ) is very
small. No improvement of fit is found when adding the simulated Slant
predictor, = 0.42,
= .518. In the simpler
(no-intercept) model with the predictor,
the slope is 2.41, = 12.29,
= .001.

### Implementation of the Bayesian models

#### Estimation of surface slant by taking into account head translation velocity.

Figure 7 illustrates the process of slant estimation according to Eq. 14: The Bayesian estimator is plotted as a function of simulated slant (bottom left panel) and as a function of (bottom right panel). When the head translation velocity is measured accurately, the Bayesian estimator is veridical; as the uncertainty about increases, the Bayesian estimator becomes more and more biased. The slant estimates expressed as a function of lie on two different curves, regardless of the size of (bottom right panel). The offset between these two curves depends on the amount of measurement error and on the mean of the distribution , which depends on the head translation velocity.

**Top panels**: posterior distributions
for
five simulated slant magnitudes. Intensity corresponds to
probability. The posterior distributions are computed by setting the
mean of the Gaussian distribution to the
values of 0.32 or 0.07 rad/s. These values correspond to the
empirical average of, respectively, the normal (right) or slow
(left) head translation velocity in our experiment. **Middle
panels**: marginalization of the posterior distributions
over . The
simulated slant magnitudes are coded by color, ranging from black
(20) to
saturated blue/red (80). The
dotted lines identify the medians of the marginalized posterior
distributions. **Bottom panels**. Optimal estimation for
surface slant plotted as a function of the simulated slant
magnitudes (left) and
(right). Full lines: results obtained by setting
;
dashed lines: results obtained by setting
or
. Blue
and red colors code the slow and fast head translation velocities,
respectively.

#### Estimation of surface slant from retinal information alone.

Figure 8 illustrates the process of slant estimation according to Eq. 15: The Bayesian estimator is plotted as a function of simulated slant (bottom left panel) and as a function of (bottom right panel). When plotted as a function of , the slant estimates for the two translation velocities lie on the same straight line. The parameter is the slope of the linear relation between perceived slant and .

The Monte Carlo simulation was carried out by setting the stimulus
parameters to the same values as in Figure 7. **Top
panels**: posterior distributions
. The
outputs of the simulations for the slow and fast head translation
velocities are shown on the left and on the right, respectively.
**Middle panels**: marginalization of the posterior
distributions over . The
dotted lines identify the medians of the marginalized posterior
distributions. **Bottom panel**: estimates of surface slant
plotted as a function of simulated slant (left) and as a function of
(right). Blue and red colors code the slow and fast head translation
velocities, respectively. The parameter
represents the square root of the ratio between the standard
deviations of the prior distributions for
and
.

#### Perceived surface slant and Bayesian modeling.

Eq. 15 offers a clear advantage over Eq. 14 in predicting the observers' responses (see Figure 9). If the uncertainty about is not negligible, the Bayesian estimates of Eq. 14 expressed as a function of lie on two separate curves and are unable to reproduce the qualitative trends in the experimental data (see Figures 5, 6, and 7, 8). This lack of fit can be contrasted with the excellent correspondence between the slant estimates of Eq. 15 and the observers' judgments.

Points of the same color represent different simulated Slant
magnitudes. The predicted values are obtained by setting
(the
square root of the ratio between the standard deviations of the
prior distributions for and
) equal
to the slope of the linear relation between perceived slant and
in
each viewing condition. AVV: Active Viewing of a Virtual surface;
PVV: Passive Vision of a Virtual surface; AVP: Active Viewing of a
Physical surface. For the data in the figure, the least-squares
regression line has an intercept of −0.03, 95%
*C.I.* [−0.09, 0.02], and a slope
of 1.07, 95% *C.I.* [0.97, 1.16];
= .95.

## Discussion

Under some assumptions, the optic flow can be used, together with other signals, to
infer both the ordinal properties (*e.g.*, tilt) and the Euclidean
metric properties (*e.g.*, slant) of the visual scene. By using
sophisticated head-tracking techniques with high spatiotemporal resolution, we
manipulated the information content of the stimuli to generate optic flows
corresponding to (a) the active viewing of a virtual surface, (b) the passive
viewing of a virtual surface, and (c) the active viewing of a physical surface. We
also varied the head translation velocity (normal or slower). The observers'
judgments of perceived surface slant were then compared to the Bayesian estimates
computed with and without taking into account the translational velocity of the head
(Eqs. 14 and 15, Figures 7 and
8).

The observers' responses are markedly different from the Bayesian estimates derived by combining optic flow and head velocity information (Eq. 14, Figures 5 and 6). The empirical data from the active and passive viewing of virtual planar surfaces, conversely, are consistent with the Bayesian estimates computed without considering head velocity information (Eq. 15, Figures 9).

For the slant judgments of physical planar surfaces, the Bayesian model of Eq. 15 explains a large amount of the variance, but a very small portion of additional variance is accounted for by the head translation velocity (Figure 6, bottom panel). The Bayesian model of Eq. 14, which takes head velocity into account, fits the data much worse. In the present research, this effect is small but warrants further research. In a follow-up experiment (not described here), we found that the monocular cues provided by our physical stimuli were not sufficient for an immobile observer to successfully discriminate between two surfaces slanted +45 or −45 (surface tilt was constant). Together with the findings of our main experiment, these results suggest that, although uninformative by themselves, monocular cues can produce some form of “enhancement” of the perceptual response, when they are presented together with the optic flow and with vestibular and proprioceptive information [30].

The slope of the linear relation between perceived surface slant and
varies across the three viewing conditions: it is shallower
for the passive viewing of a virtual planar surface, it increases for the active
viewing of a virtual surface, and it is the largest for the active viewing of a
physical surface. We may expect a different visual performance for passive and
active SfM, and for virtual and physical stimuli. The present results suggest,
however, that more complete stimulus information does not necessarily result in
better (more veridical) performance: *A stronger effect of def does not
guarantee* a more accurate response. Perceived slant is strongly
affected by *def* despite the fact that there is no a
“one-to-one” correspondence between *def* and distal
surface slant.

Animal studies [31] and human experiments [32], [33] identify MT (MT+ in humans) as the brain area involved in SfM processing. It has also been shown that MST integrates MT inputs with vestibular signals originating from a different (currently unidentified) neural pathway [34], [35]. The integration of visual and vestibular information in MSTd is consistent with both the Bayesian models discussed here (Eqs. 14 and 15). Such integration could mean that (a) the visual system uses extra-retinal signals to discount head motion from the optic flow in order to encode a world-centered representation of the 3D objects [11], or (b) non-visual information about self-motion is used as a retinal stabilization factor for a better measurement of the optic flow [36], [37]. The present behavioral results, however, favor this second hypothesis.

In conclusion, the present data and simulations do not indicate so much that, by
disregarding vestibular and proprioceptive information, the visual system uses a
suboptimal strategy for estimating surface slant from the self-generated optic flow.
Instead, they suggest that, even though it does not always guarantee a veridical
solution to the SfM problem, the mapping between the deformation component of the
optic flow and the perceived surface slant may be the most efficient choice for a
biological system [38], [39]. An issue that remains to be investigated is whether and
how learning provides effective visual and haptic feedback for scaling
*def* information.

## Methods

### Ethics Statement

Experiments were undertaken with the understanding and written consent of each subject, with the approval of the Comitato Etico per la Sperimentazione con l'Essere Umano of the University of Trento, and in compliance with national legislation and the Code of Ethical Principles for Medical Research Involving Human Subjects of the World Medical Association (Declaration of Helsinki).

### Participants

Thirty-four undergraduate students at the University of Parma, Italy, participated in this experiment. All participants were naïve to the purposes of the experiment and had normal or corrected-to-normal vision.

### Apparatus

The orientation of the participant's head and the translational head displacements were recorded by an Optotrak 3020 Certus system. Two sensors recovered the 3D position data of two infrared emitting diodes (markers on an eyeglass frame) aligned with the observer inter-ocular axis. The signals emitted by the markers were used to calculate the , , coordinates of the observers' viewpoints in order to update the geometrical projection of a random-dot planar surface in real time. Displays were monocularly viewed through a high-quality front-silvered mirror (150 150 mm) placed at eye-height in front of the observer's central viewing position and slanted 45 away from the monitor and the observer's inter-ocular axis. The effective distance from the pupil to the center of the screen was 860 mm. Only the central portion of the surface was left visible to the observer through a black mask with an irregularly-shaped central aperture (about 70 70 mm) superimposed on the screen. A chin-rest was used to prevent head movements in the passive-vision condition.

A custom Visual C++ program supported by OpenGL libraries and Optotrak API routines was used for stimulus presentation and response recording. The same program also controlled the orientation of a physical planar surface that, in a separate block of trials, was placed at a distance of 760 mm in front of the observer. The boundary of the physical surface was occluded by the same mask used for the virtual displays. This aperture was closed when the surface's orientation was changed.

### Stimuli

The simulated displays were random arrangements of (1 1 mm) antialiased red dots simulating the projection of a static planar surface centered on the image screen and with a variable slant about the horizontal axis (virtual planar surfaces: 20, 35, 50, 65, and 80; physical planar surfaces: 10, 20, 40, and 50). The surface tilt was constant (90). About 100 dots were visible through the irregular aperture occluding the outer part of the screen. To remove texture (non-motion) cues, the dots were randomly distributed into the projected image (not on the simulated surface). On each frame of the stimulus sequence, the 2D arrangement of the dots was varied depending on the observer's head position and orientation with respect to the simulated surface. The dots on the simulated planar surface were projected onto the image plane (CRT screen) by using a generalized perspective pinhole model with the observer's right eye position as the center of projections. The position of the observer's right eye was sampled at the same rate as the monitor refresh and stimulus update rate.

The translation of the observer's head produced a relative rotation of the simulated planar surface of about 3.32 about the vertical axis, regardless of surface slant. The maximum lateral head shift was equal to 50 mm. In the passive-vision condition, the optic flows were generated by replaying the 2D transformations generated by the corresponding active-vision trials. The horizontal translation component of the optic flow was removed by assuming that the cyclopean line of sight of the active observers was always aligned with the centre on the planar surface, regardless of actual head position and surface slant [37].

The physical planar surface was painted black and randomly covered with
phosphorescent dots. With respect to the virtual surface, the physical surface
was covered by larger dots (about 5 mm) having an irregular shape, a lower
density (about 13 dots were visible through the irregular aperture), and
providing texture cues (*i.e.*, dot foreshortening) consistent
with a slanted 3D planar surface. Given the smaller viewing distance (760 mm),
the constant amount of lateral head shift produced a relative rotation of the
surface about the vertical axis of 3.76.

During the experiment, the room was completely dark. Peak head translation velocity was either 285.6 mm/s or 57.7 mm/s. Depending on the head translation velocity, on each trial the stimulus was visible for about 3.0 s or 11.1 s.

### Design

Each observer participated in three experimental blocks in the following order: Active-Vision with a Virtual surface (AVV), Passive-Vision with a Virtual surface (PVV), and Active-Vision with a Physical surface (AVP). Participants were randomly assigned either to the “normal” or to the “slow” head translation velocity conditions. Each AVV and PVV block comprised 25 trials (5 repetitions of 5 simulated slants magnitudes). The AVP block comprised 16 trials (4 repetitions of 4 slant magnitudes). In the PVV block, the stimuli generated in the AVV block were shown again in random order. The completion of each block of trials required about 30 minutes.

### Procedure

Participants were tested individually in total darkness, so that only the stimulus displays shown on the CRT screen, or the luminous dots on the physical surface, were visible. In the AVV and AVP blocks, observers viewed the stimuli while making back-forth lateral head translations. The observer's head was supported by an horizontally extended chin-rest allowing lateral movements of 60 mm. An acoustic feedback signaled whether the average head shift velocity exceeded the range of 83 mm/s 40 mm/s (“normal” speed) or 20 mm/s 10 mm/s (“slow” speed). The stimulus display appeared on the screen when participants completed 2 consecutive back-and-forth translations at the required velocity and disappeared after 5.5 back-and-forth translations. After the stimulus disappeared, participants stopped moving their head and provided a verbal judgment of the amount of perceived surface slant (0° indicating a frontal-parallel surface, 90° indicating a surface parallel to the , plane) – see Figure 10. In the PVV condition, participants were required to remain still for the entire duration of each trial.

Schematic representation of one trial of our experiment. Here we consider the case of the acting viewing of a virtual surface.

Each experimental session was preceded by a preparatory session in which the participant's inter-pupillary distance was measured, the instructions were provided, and training about the appropriate head translation velocity and the magnitude estimation task was provided. Participants were trained in the magnitude estimation task by completing two blocks of 20 trials each. In one block, they were required to generate an angle between two segments on a computer screen after being prompted by a random number in the range 0–360. In the other block, they were required to estimate a random angle depicted to the screen. The relationship between the response and the test values was analyzed with a linear regression. Only participants who met performance criteria of a slope in the interval [0.9, 1.1] and an intercept in the interval [−0.3, 0.3] entered the experimental session.

The maximum value of *def* was extracted in each trial from the
instantaneous profile of the deformation component of the optic flow by
following the procedure illustrated in Figure 11. These *def* values
were then used to test the prediction of Eq. 15.

The maximum of this temporal profile (averaged across the
*def* cycles) was computed for each trial of the
experiment and was used to test the Bayesian model of Eq. 15.

### Statistical Analyses

Statistical analyses were performed by means of Linear Mixed-Effects models with
participants as random effects and , simulated slant,
and head translation velocity (“normal”, “slow”) as
fixed effects. We evaluate significance by computing the deviance statistic
(minus 2 times the log-likelihood; change in deviance is distributed as
chi-square, with degrees of freedom equal to the number of parameters deleted
from the model) and with the help of 10,000 samples from the posterior
distributions of the coefficients using Markov chain Monte Carlo sampling. From
these samples, we obtained the 95% Highest Posterior Density confidence
intervals, and the corresponding two-tailed -values. Several
indexes have been proposed to measure the prediction power and the
goodness-of-fit for linear mixed models (*e.g.*, Sun, Zhu,
Kramer, Yang, Song, Piepho, & Yu, 2010). Here, we measure the goodness of
fit as , where is an
1 vector, are the fitted
values, is the mean of , and
is the mean of (Vonesh,
Chinchilli, & Pu, 1996). The statistic can be
interpreted as a measure of the degree of agreement between the observed values
and the predicted values. The possible values of lie in the range
.

## Acknowledgments

We thank John A. Assad for helpful comments on a previous version of the manuscript.

## Author Contributions

Conceived and designed the experiments: CC CF FD. Performed the experiments: CF CC FD. Analyzed the data: CC FD CF. Wrote the paper: CC FD CF.

## References

- 1.
Rao R, Olshausen B, Lewicki M (2002) Probabilistic Models of the Brain: Perception and Neural
Function. Cambridge, MA: MIT Press. R. RaoB. OlshausenM. Lewicki2002Probabilistic Models of the Brain: Perception and Neural
Function.Cambridge, MAMIT Press
- 2.
von Helmholtz H (2002) Handbook of physiological optics. New York: Dover. H. von Helmholtz2002Handbook of physiological optics.New YorkDover
- 3. Wexler M, van Boxtel J (2005) Depth perception by the active observer. Trends in Cognitive Sciences 144: 431–438.M. WexlerJ. van Boxtel2005Depth perception by the active observer.Trends in Cognitive Sciences144431438
- 4. Yuille A, Kersten D (2006) Vision as bayesian inference: analysis by synthesis? Trends in Cognitive Sciences 144: 301–308.A. YuilleD. Kersten2006Vision as bayesian inference: analysis by synthesis?Trends in Cognitive Sciences144301308
- 5. Zhong H, Cornilleau-Peres V, Cheong LF, Yeow M, Droulez J (2006) The visual perception of plane tilt from motion in small field and large field: psychophysics and theory. Vision Research 46: 3494–3513.H. ZhongV. Cornilleau-PeresLF CheongM. YeowJ. Droulez2006The visual perception of plane tilt from motion in small field and large field: psychophysics and theory.Vision Research4634943513
- 6. Colas F, Droulez J, Wexler M, Bessire P (2007) A unified probabilistic model of the perception of three-dimensional structure from optic flow. Biological Cybernetics 97: 461–477.F. ColasJ. DroulezM. WexlerP. Bessire2007A unified probabilistic model of the perception of three-dimensional structure from optic flow.Biological Cybernetics97461477
- 7. Koenderink JJ, van , Doorn AJ (1991) Affine structure from motion. Journal of the Optical Society of America A 8: 377–385.JJ KoenderinkvanAJ Doorn1991Affine structure from motion.Journal of the Optical Society of America A8377385
- 8. Domini F, Caudek C (2003) 3-d structure perceived from dynamic information: A new theory. Trends in Cognitive Sciences 7: 444–449.F. DominiC. Caudek20033-d structure perceived from dynamic information: A new theory.Trends in Cognitive Sciences7444449
- 9. Wexler M (2003) Voluntary head movement and allocentric perception of space. Psychological Science 14: 340–346.M. Wexler2003Voluntary head movement and allocentric perception of space.Psychological Science14340346
- 10. Wexler M, Lamouret I, Droulez J (2001) The stationarity hypothesis: an allocentric criterion in visual perception. Vision Research 41: 3023–3037.M. WexlerI. LamouretJ. Droulez2001The stationarity hypothesis: an allocentric criterion in visual perception.Vision Research4130233037
- 11. van Boxtel JJ, Wexler M, Droulez J (2003) Perception of plane orientation from self-generated and passively observed optic flow. Journal of Vision 3: 318–332.JJ van BoxtelM. WexlerJ. Droulez2003Perception of plane orientation from self-generated and passively observed optic flow.Journal of Vision3318332
- 12. Wexler M, Panerai F, Lamouret I, Droulez J (2001) Self-motion and the perception of stationary objects. Nature 409: 85–88.M. WexlerF. PaneraiI. LamouretJ. Droulez2001Self-motion and the perception of stationary objects.Nature4098588
- 13.
Zhong H, Cornilleau-Peres V, Cheong LF, Droulez J (2000) Visual encoding of tilt from optic flow: Psychophysics and
computational modelling. In: Vernon D, editor. Computer Vision – ECCV 2000, Springer Berlin/Heidelberg, volume
1843 of Lecture Notes in Computer Science. pp. 800–816.H. ZhongV. Cornilleau-PeresLF CheongJ. Droulez2000Visual encoding of tilt from optic flow: Psychophysics and
computational modelling.D. VernonComputer Vision – ECCV 2000, Springer Berlin/Heidelberg, volume
1843 of Lecture Notes in Computer Science800816
- 14. Calderone JB, Kaiser MK (1989) Visual acceleration detection: Effect of sign and motion orientation. Perception & Psychophysics 45: 391–394.JB CalderoneMK Kaiser1989Visual acceleration detection: Effect of sign and motion orientation.Perception & Psychophysics45391394
- 15. Snowdenand R, Braddick OJ (1991) The temporal integration and resolution of velocity signals. Vision Research 31: 907–914.R. SnowdenandOJ Braddick1991The temporal integration and resolution of velocity signals.Vision Research31907914
- 16. Todd JT (1981) Visual information about moving objects. Journal of Experimental Psychology: Human Perception and Performance 7: 795–810.JT Todd1981Visual information about moving objects.Journal of Experimental Psychology: Human Perception and Performance7795810
- 17. Watamaniuk SNJ, Heinen SJ (2003) Perceptual and oculomotor evidence of limitations on processing accelerating motion. Journal of Vision 3: 698–709.SNJ WatamaniukSJ Heinen2003Perceptual and oculomotor evidence of limitations on processing accelerating motion.Journal of Vision3698709
- 18. Werkhoven P, Snippe HP, Toet A (1992) Visual processing of optic acceleration. Vision Research 32: 2313–2329.P. WerkhovenHP SnippeA. Toet1992Visual processing of optic acceleration.Vision Research3223132329
- 19. Caudek C, Domini F (1998) Perceived orientation of axis of rotation in structure-from-motion. Journal of Experimental Psychology: Human Perception and Performance 24: 609–621.C. CaudekF. Domini1998Perceived orientation of axis of rotation in structure-from-motion.Journal of Experimental Psychology: Human Perception and Performance24609621
- 20. Caudek C, Rubin N (2001) Segmentation in structure from motion: modeling and psychophysics. Vision Research 41: 2715–2732.C. CaudekN. Rubin2001Segmentation in structure from motion: modeling and psychophysics.Vision Research4127152732
- 21. Caudek C, Domini F, Di Luca M (2002) Short-term temporal recruitment in structure from motion. Vision Research 42: 1213–1223.C. CaudekF. DominiM. Di Luca2002Short-term temporal recruitment in structure from motion.Vision Research4212131223
- 22. Caudek C, Proffitt DR (1993) Depth perception in motion parallax and stereokinesis. Journal of Experimental Psychology: Human Perception and Performance 19: 32–47.C. CaudekDR Proffitt1993Depth perception in motion parallax and stereokinesis.Journal of Experimental Psychology: Human Perception and Performance193247
- 23. Domini F, Braunstein ML (1998) Recovery of 3d structure from motion is neither euclidean nor affine. Journal of Experimental Psychology: Human Perception and Performance 24: 1273–1295.F. DominiML Braunstein1998Recovery of 3d structure from motion is neither euclidean nor affine.Journal of Experimental Psychology: Human Perception and Performance2412731295
- 24. Domini F, Caudek C (1999) Perceiving surface slant from deformation of optic flow. Journal of Experimental Psychology: Human Perception and Performance 25: 426–444.F. DominiC. Caudek1999Perceiving surface slant from deformation of optic flow.Journal of Experimental Psychology: Human Perception and Performance25426444
- 25. Domini F, Caudek C (2003) Recovering slant and angular velocity from a linear velocity field: modeling and psychophysics. Vision Research 43: 1753–1764.F. DominiC. Caudek2003Recovering slant and angular velocity from a linear velocity field: modeling and psychophysics.Vision Research4317531764
- 26. Domini F, Caudek C, Proffitt DR (1997) Misperceptions of angular velocities influence the perception of rigidity in the kinetic depth effect. Journal of experimental psychology: human perception and performance 23: 1111–1129.F. DominiC. CaudekDR Proffitt1997Misperceptions of angular velocities influence the perception of rigidity in the kinetic depth effect.Journal of experimental psychology: human perception and performance2311111129
- 27. Domini F, Caudek C, Richman S (1998) Distortions of depth-order relations and parallelism in structure from motion. Perception & Psychophysics 60: 1164–1174.F. DominiC. CaudekS. Richman1998Distortions of depth-order relations and parallelism in structure from motion.Perception & Psychophysics6011641174
- 28. Domini F, Caudek C, Turner J, Favretto A (1998) Distortions of depth-order relations and parallelism in structure from motion. Perception & Psychophysics 60: 1164–1174.F. DominiC. CaudekJ. TurnerA. Favretto1998Distortions of depth-order relations and parallelism in structure from motion.Perception & Psychophysics6011641174
- 29.
Moore S, Hirasaki E, Raphan T, Cohen B (2008) The human vestibulo-ocular reflex during linear
locomotion. In: Goebel J, Highstein S, editors. The vestibular labyrinth in health and disease. New York: New York Academy of Sciences. S. MooreE. HirasakiT. RaphanB. Cohen2008The human vestibulo-ocular reflex during linear
locomotion.J. GoebelS. HighsteinThe vestibular labyrinth in health and diseaseNew YorkNew York Academy of Sciences
- 30. Domini F, Caudek C, Tassinari H (2006) Stereo and motion information are not independently processed by the visual system. Vision Research 46: 1707–1723.F. DominiC. CaudekH. Tassinari2006Stereo and motion information are not independently processed by the visual system.Vision Research4617071723
- 31. Bradley DC, Chang GC, Andersen RA (1998) Encoding of three-dimensional structure from-motion by primate area mt neurons. Nature 392: 714–717.DC BradleyGC ChangRA Andersen1998Encoding of three-dimensional structure from-motion by primate area mt neurons.Nature392714717
- 32. Orban GA, Sunaert S, Todd JT, van Hecke P, Marchal G (1999) Human cortical regions involved in extracting depth from motion. Neuron 24: 929–940.GA OrbanS. SunaertJT ToddP. van HeckeG. Marchal1999Human cortical regions involved in extracting depth from motion.Neuron24929940
- 33. Vanduffel W, Fize D, Peuskens H, Denys K, Sunaert S, et al. (2002) Extracting 3d from motion: differences in human and monkey intraparietal cortex. Science 298: 413–415.W. VanduffelD. FizeH. PeuskensK. DenysS. Sunaert2002Extracting 3d from motion: differences in human and monkey intraparietal cortex.Science298413415
- 34. Assad JA, Maunsell JH (1995) Neural correlates of inferred motion in primate posterior parietal cortex. Nature 373: 518–521.JA AssadJH Maunsell1995Neural correlates of inferred motion in primate posterior parietal cortex.Nature373518521
- 35. Gu Y, Angelaki DE, DeAngelis GC (2008) Neural correlates of multi-sensory cue integration in macaque area mstd. Nature Neuroscience 11: 1201–1210.Y. GuDE AngelakiGC DeAngelis2008Neural correlates of multi-sensory cue integration in macaque area mstd.Nature Neuroscience1112011210
- 36. Cornilleau-Peres V, Droulez J (2004) The visual perception of three-dimensional shape from self-motion and object motion. Vision Research 34: 2331–2336.V. Cornilleau-PeresJ. Droulez2004The visual perception of three-dimensional shape from self-motion and object motion.Vision Research3423312336
- 37. Fantoni C, Caudek C, Domini F (2010) Systematic distortions of perceived planar surface motion in active vision. Journal of Vision 10: 1–20.C. FantoniC. CaudekF. Domini2010Systematic distortions of perceived planar surface motion in active vision.Journal of Vision10120
- 38. Di Luca M, Caudek C, Domini F (2010) Inconsistency of perceived 3d shape. Vision Research 50: 1519–1531.M. Di LucaC. CaudekF. Domini2010Inconsistency of perceived 3d shape.Vision Research5015191531
- 39. Thaler L, Goodale MA (2010) Beyond distance and direction: the brain represents target locations non-metrically. Vision Research 10: 1–27.L. ThalerMA Goodale2010Beyond distance and direction: the brain represents target locations non-metrically.Vision Research10127
- 40. Stevens KA (1983) Slant-tilt: The visual encoding of surface orientation. Biological Cybernetics 46: 183–195.KA Stevens1983Slant-tilt: The visual encoding of surface orientation.Biological Cybernetics46183195