Natural scene statistics predict how humans pool information across space in surface tilt estimation

doi:10.1371/journal.pcbi.1007947

Fig 1.

3D surface orientation is fully described by slant and tilt.

Slant is the angle indicating how much a surface is rotated out of the fronto parallel plane. Tilt is the direction of slant, as quantified by the angle between the x-axis in the frontoparallel plane and the surface normal projected into the frontoparallel plane. A Signed tilt, defined on [0°,360°), and unsigned slant. B Unsigned tilt, defined on [0°,180°), and signed slant.

More »

Expand

Fig 2.

Spatial statistics of tilt in natural scenes.

A Stereo images and stereo distance maps of real-world scenes. The distance data is co-registered to the image data at each pixel. B Groundtruth tilt corresponding to the image in A. Groundtruth tilt at each pixel is computed directly from the data in the distance maps. C Prior distribution of groundtruth tilt, computed from 600 million groundtruth tilt samples in the natural scene database. D Mean absolute tilt difference from the center target tilt as a function of spatial location. The color represents the tilt difference across all pixels in all images in the natural scene database. E Mean absolute tilt difference conditioned on the groundtruth tilt at the target location.

More »

Expand

Fig 3.

Human tilt estimation experiment.

A Human observers binocularly viewed real-world scenes through a circular aperture with a 3º diameter that was positioned stereoscopically in front of the scene. B Example of stimuli. Left-eye, right-eye, and left-eye images (for both uncrossed and crossed fusion). The patches are surrounded by a graphical probe (white circle and three tick marks). Observers rotated the probe to align the middle tick mark with the perceived tilt direction for the surface at the very center of the window; note that when the middle tick mark is aligned with the perceived tilt direction, the other two tick marks are aligned with the perceived slant axis.

More »

Expand

Fig 4.

Constructing local and global models of tilt estimation.

A Image cues and groundtruth tilt in natural scenes. Image cues are derived directly from photographic stereo images (top). Groundtruth tilt at each pixel is computed directly from the range data (cf. Fig 2A). Here, groundtruth tilt is depicted with local surface normals instead of a colormap (cf. Fig 2B). B The local model estimates tilt based on local image cues. Local estimates are obtained via lookup tables that store conditional means (i.e., posterior means) given all possible combinations of three quantized unsigned image cue values (i.e., 64³ unique cue combinations), and one quantized signed image cue value (i.e., 64 unique cue values), as computed from the natural image database. We have previously verified that quantizing the cue values is not a primary limiting factor on the performance of the model [16]. C Pooling local estimates in a spatial pooling region centered on a target location. D Each global estimate is obtained by pooling local estimates over a spatial neighborhood. Each local estimate is obtained by combining cues that are computed from multiple pixels in the image. Note that the area of the image that contributes to the global estimate is slightly larger than the purported area of the global pooling region, because each local estimate is computed from image gradients across an image region with non-zero spatial extent.

More »

Expand

Fig 5.

Local and global models for tilt estimation.

A The local model obtains a local tilt estimate given three local image cues. B The fixed circular pooling model uses a circular pooling region with the same size for all target groundtruth tilts (cf. Fig 2D). C The adaptive elliptical pooling model uses an adaptive pooling region with a different size, aspect ratio, and orientation for each groundtruth tilt (cf. Fig 2E). As the average area of the adaptive elliptical pooling region changes, the relative area, orientation, and aspect ratio of the pooling regions are held fixed.

More »

Expand

Fig 6.

Conditional distributions of groundtruth tilt given the value of the tilt estimate.

Each subplot shows the distribution of groundtruth tilt given a particular local estimate value . For example, the fifth subplot in the first row shows the distribution of groundtruth tilts given that the local tilt estimate had a value of 120º (i.e., ). The fact that the conditional distributions of groundtruth tilt are approximately shift-invariant indicates that each local tilt estimate, regardless of its value, provides approximately equally reliable information about groundtruth tilt. Gray regions represent 95% confidence intervals from Monte Carlo simulations of 1000 experimental datasets. Confidence intervals at non-cardinal tilts (e.g., = 30º, 60º, 120º, 150º, etc.) are larger in part because the local model produces fewer non-cardinal tilt estimates, in keeping with the prior probability distribution over tilt, which has peaks at the cardinal tilts (e.g., τ = 0º, 90º, etc.; see Fig 2C).

More »

Expand

Fig 7.

Groundtruth tilt estimation error from the global model with fixed circular pooling.

Mean estimation error is plotted as a function of the diameter of pooling region. Mean estimation errors are computed across all tilts. The black dashed line indicates the mean estimation error for the local model; the local model does not pool local estimates and thus has a pooling diameter of 0º. The gray dashed line indicates the estimation error for a “local” model that computes the image cues from an area matched to that implicitly used by the best global model (see Discussion). Monte Carlo simulations on 1000 randomly sampled stimulus sets were used to obtain 95% confidence intervals on the mean estimation error (gray area). Data from Exp 1 and Exp 2 are shown in the left and right columns, respectively.

More »

Expand

Fig 8.

Human prediction error from the global model with fixed circular pooling.

Mean prediction error is plotted as a function of the diameter of the pooling region. Mean prediction errors are computed across all tilts and human observers. The black dashed line indicates the mean prediction error for the local model. The gray dashed line indicates the prediction error for a “local” model that computes the image cues from an area matched to that implicitly used by the best global model (see Discussion). Data from Exp 1 and Exp 2 are shown in the left and right columns, respectively.

More »

Expand

Fig 9.

Estimation error of adaptive elliptical pooling model.

A The adaptive elliptical pooling areas dictated by target tilt. B The relative elliptical pooling area for different target tilts. As the average equivalent diameter increases or decreases, the relative sizes of the pooling area remain in a fixed proportion. C Estimation error (model estimate vs. groundtruth tilt) as a function of equivalent diameter. The insets show simulation results that compare performance of the adaptive elliptical pooling model vs. the fixed circular pooling model on 1000 matched randomly sampled stimulus sets. Computing the estimation errors on matched stimulus sets isolates the impact of the model, and prevents stimulus variability from unduly affecting the results. The adaptive pooling model (blue) outperforms the fixed circular pooling model (black) on nearly all stimulus sets (i.e., data is below positive diagonal). D Simulation results, just as in C insets, except that estimation error is shown as a function of groundtruth tilt (subpanels). The fact that the majority of points lie below the dashed unity line, indicates that adaptive elliptical pooling outperforms fixed circular pooling in groundtruth tilt estimation at all groundtruth tilts.

More »

Expand

Fig 10.

Prediction error of adaptive elliptical pooling model.

Human prediction error (model estimate vs. human estimate) is plotted (blue) as a function of pooling area (i.e., equivalent diameter). For comparative purposes, performance is also plotted for the fixed circular pooling model (black; same data as Fig 8).

More »

Expand

Fig 11.

Pooling sizes from natural scene statistics vs. pooling sizes that maximize estimation and prediction performance.

Adaptive pooling regions predicted by natural scene statistics predict the pooling regions that independently maximize performance at each groundtruth tilt. A Equivalent pooling diameters fit to the natural scene statistics (black; same data as Fig 9B) and equivalent pooling diameters that minimize estimation error are plotted as a function of groundtruth tilt. The left and right columns represent data from Exp 1 and Exp 2, respectively. B Best estimation diameters are correlated with the diameters fit to the natural scene statistics. C Equivalent pooling diameters fit to the natural scene statistics (black) and equivalent pooling diameters that minimize prediction error (blue), plotted as a function of groundtruth tilt. D Best prediction diameters are correlated with the diameters fit to the natural scene statistics. All correlations were significant at the level of p<0.05; all but one were significant at the level of p<0.001.

More »

Expand

Fig 12.

Pooling sizes that maximize groundtruth tilt estimation performance vs. pooling sizes that maximize the prediction of human performance.

Pooling diameters that maximize estimation performance predict those that maximize the prediction of human performance. Each data point represents the diameter that maximizes performance for a different groundtruth tilt at the target location (cf. Fig 11). The actual sizes of the pooling regions that maximize estimation performance are similar to the sizes that maximize the prediction of human performance.

More »

Expand