Modeling second-order boundary perception: A machine learning approach

doi:10.1371/journal.pcbi.1006829

Fig 1.

Examples of second-order boundary stimuli, with varying texture density and modulation depth.

(a) Top: A natural occlusion boundary formed by animal fur (foreground, lower right) occluding the forest floor (background, upper left). This boundary has similar average luminance on both sides, but clearly visible differences in texture. Bottom: A contrast-defined boundary (used in Experiment 3) with identical mean luminance in each region, but different contrasts of the texture elements. In this example the contrast modulation envelope has a left-oblique (-45 deg. w.r.t vertical) orientation. (b) Examples of contrast-modulated micropattern stimuli, for three densities of micropatterns and three modulation depths, all having a right-oblique (+45 deg.) boundary orientation. Boundary segmentation is typically easier with increasing modulation depth and micropattern density. (c) Schematic illustration of the two psychophysical tasks used here (orientation identification: Experiments 1 and 3, orientation discrimination: Experiment 2), in both cases two-alternative forced-choice judgements of left- vs. right-oblique boundary orientation. Stimuli shown are representative of those used in Experiments 1 and 2. In the identification task boundaries are oriented at (+/- 45 deg.), and in the discrimination task boundaries are oriented slightly off vertical (+/- 6–7 deg.).

More »

Expand

Fig 2.

Neural network model implementing a FRF (Filter-Rectify-Filter) arrangement for texture segmentation.

Parameters shown in blue are learned from data, those in black are fixed. Stimulus image I is filtered by a bank of first-stage energy filters, resembling V1 complex cells, whose downsampled responses provide a feature vector input x to two second-stage filters. An optimization algorithm adjusts connection weights w_L, w_R (blue lines), producing second-stage filters which are selective for left-oblique (L) or right-oblique (R) contrast-defined boundaries, to give responses consistent with human psychophysical data. The output of each second-stage filter is passed through a nonlinearity h(u) = |u|^α (blue curve) whose shape parameter α is also estimated from the psychophysical data. Fixed output weights v_R = +1, v_L = -1 lead to the decision variable u = s_R − s_L + v₀, which is input to a sigmoid function to determine the probability of the observer classifying a given boundary as being right-oblique.

More »

Expand

Fig 3.

Estimated second-stage filter weights for two representative observers in Experiment 1 (orientation identification).

(a) Second-stage filter weights learned by the model for representative observer AMA in Experiment 1-VAR (varying modulation depth) for two different priors (left: ridge; right: ridge + smooth) with 16x16 AVG downsampling. Top panels show the 2-D filter weights (averaged over 4 training folds) and bottom panels show these 2-D weights collapsed into 1-D profiles (black dashed lines) by averaging along the matrix diagonals (left-oblique) or anti-diagonals (right-oblique). Thick lines (red: ridge; green: ridge + smooth) denote averages over 30 resampled bootstrapped training sets, and thin dashed lines show +/- 1 SEM. (b) Same as (a) but for observer JJF in Experiment 1-FIX (fixed, near-threshold modulation depth). (c) Results for ideal observer for Experiment 1-VAR. Organization as in (a), (b) except thick black lines denote averages over 4 training folds and thin dashed lines show fits to individual folds.

More »

Expand

Fig 4.

1-D profiles of the second-stage filter weights for all observers in Experiment 1.

(a) 1-D profiles for individual observers in Experiment1-VAR (colored lines) plotted with the average across observers (thick black lines) for three sampling rules (AVG, MAX, SUB) and both priors. (b) Same as (a) but for Experiment 1-FIX.

More »

Expand

Fig 5.

Normalized 1-D profiles of the average second-stage filter weights (30 bootstrapped re-samples).

Thick black lines are for the ideal observer, thin black dashed lines indicate confidence intervals (+/- 1.96 SEM). Colored lines indicate normalized profiles for individual human observers. All models used AVG pooling. (a) Experiment 1-VAR (top) and Experiment 1-FIX (bottom). (b) Experiment 1-HFC (higher density of smaller micropatterns).

More »

Expand

Fig 6.

Psychometric functions for two versions of model (DET, STO), for three representative human observers (AMA, SEL, VHB) in Experiment 1-VAR.

Blue lines denote human observer performance on the test set (stimuli not used for model estimation) as a function of modulation depth, together with 95% binomial proportion confidence intervals (+/- 1.96 SEM, blue dashed lines). Red and green lines denote model predictions for test stimuli for different Bayesian priors (red: ridge; green: ridge + smooth). Left column shows model operating deterministically (DET), right column shows model operating stochastically (STO).

More »

Expand

Fig 7.

Analysis of model accuracy for Experiment 1-VAR.

(a) Plots of model vs. observer performance (proportion correct), averaged across observers (left sub-panels, circles, N = 7) or test folds (right sub-panels, solid symbols, N = 28). Observer and model performance were compared on a set of novel test stimuli (N = 500) not used for model estimation or hyper-parameter optimization. Top: Deterministic model with AVG (average) downsampling (AVG-DET). Correlation coefficients (r values) are color coded for each choice of Bayesian prior (red: ridge; green: ridge + smooth). Middle: Stochastic model with AVG downsampling (AVG-STO). Bottom: Deterministic model with downsampling implemented with subsampling (SUB-DET). (b) Difference between observer and model performance for each individual test fold (4 folds per observer) for all models shown in (a). Lines show 95% confidence intervals of the difference (binomial proportion difference test). Colors indicate different choices of Bayesian prior (red: ridge; green: ridge + smooth).

More »

Expand

Fig 8.

Double-pass analysis for 5 observers in Experiment 1-VAR.

(a) Models operating in deterministic (DET) mode for three different downsampling rules. Left panels: Observer-observer (horizontal axis) and observer-model (vertical axis) proportion agreement on the double-pass set. Error bars denote 95% binomial proportion confidence intervals. Right panels: Difference between observer-observer and observer-model proportion agreement for each of 5 observers. Error bars denote 95% binomial proportion difference confidence intervals. (b) Same as (a), but for model with AVG downsampling operating in stochastic mode (AVG-STO).

More »

Expand

Fig 9.

Decision variable correlation analysis.

Decision variable correlation (DVC) computed on 6 test folds (two measurements per fold) for three observers (CJD = blue, MAK = magenta, JJF = black) in Experiment 1-FIX for both priors. DVC values for each observer are sorted by magnitude for visual clarity. Thin lines denote +/-1 SEM (200 bootstrap re-samples).

More »

Expand

Fig 10.

Simulated ground-truth observers (left columns) and filter shapes recovered (right columns) from simulated datasets generated by these observers.

(a) Ideal observers implementing various sub-optimal spatial filtering models for Experiment 1. Top row: Simulated observer monitors two adjacent “pizza slice” shaped regions (2-slice). Second row: Simulated observer monitors three adjacent regions (3-slice). Third row: Simulated observer only monitors one boundary (1-filter). Bottom row: Simulated observer randomly monitors 1 of 4 pairs of informative adjacent pizza slices (2-slice—random). (b) Ideal observer implementing a spatial filtering model comprised of two perceptual filters, each monitoring one potential boundary (2-filter). The recovered filters (right) are most similar to those observed in our results (Fig 3).

More »

Expand

Fig 11.

Second-stage filter weights for one representative observer, JJF, from Experiment 2 (orientation discrimination, Fig 1c, right).

(a) Experiment 2-VAR. Top: Estimated 2-D weight maps averaged over 4 training folds. Bottom: 1-D profiles showing mean weight magnitude (thick lines), averaged over 30 bootstrapped re-samplings. Thin dashed lines show +/- 1 SEM. (b) Same as (a), but for Experiment 2-FIX.

More »

Expand

Fig 12.

Model comparison for Experiment 3, boundary orientation identification with textures composed of two kinds of micropatterns.

(a) Two competing models of FRF architecture fit to psychophysical trial data. Top: Model 1 (“late summation”) assumes that each first-stage orientation channel is analyzed by its own pair of (L/R) second-stage filters and then pooled. Bottom: Model 2 (“early summation”) assumes that each (L/R) second-stage filter integrates over both first-stage channels. (b) Bayesian model comparison making use of all data reveals a strong consistent preference (as measured by the Bayes factor—see text) for Model 2 (“early summation”) for all three observers (blue line and symbols). Thick black line indicates the conventional criterion for a “very strong” preference for Model 2. (c) Bootstrapping analysis, in which model likelihood is evaluated on novel data not used for model training, also reveals a preference for Model 2. Plotted points indicate mean +/- SEM for 50 bootstrapped samples. (d) Difference between predicted and observed proportions where observer chooses “right oblique” for all observers and both models. Negative modulation depths indicate left-oblique stimuli.

More »

Expand

Fig 13.

Hypothetical extension of our modeling framework with additional second-stage filters and a more complete implementation of the first-stage filters.

Such a model could potentially be fit to human observer performance on boundary perception tasks involving complex stimuli such as natural occlusion boundaries.

More »

Expand