A feedforward mechanism for human-like contour integration

doi:10.1371/journal.pcbi.1013391

Fig 1.

Visual grouping and contour integration in human perception.

(A) Demonstration of the phenomena where observers can group boundaries from the same object despite occlusions. This panel was generated using OpenAI’s DALL.E. (B) Psychophysical stimuli to examine the ‘continuity’ principle underlying contour integration (Field et al., 1993). The left panel displays a contour-present stimulus containing a subset of gabor patches (i.e., contour elements) systematically aligned to suggest an extended, coherent contour amidst randomly oriented background patches. The right panel displays a contour-absent stimulus with identical gabor patches, except the contour patches are also randomly oriented, eliminating the perception of the continuous contour. Contour patches are highlighted for illustrative clarity.

More »

Expand

Fig 2.

Contour integration capacity in feedforward CNNs.

(A) Accuracy on contour detection for the held-out test set across different readout layers – gray indicating randomly initialized models, orange for models pretrained on Imagenet for object recognition, and blue for finetuned models. Error bars denote 95% confidence intervals for readout accuracy. (B) Saliency maps from two fine-tuned models, highlighting pixel relevance for detecting the contour within an example image; location of the contour is highlighted in red for illustrative purposes. (C) Example image pair with misaligned and aligned contour elements. Contour patches are highlighted for illustrative clarity. (D) Plot showing the fine-tuned model’s sensitivity to elements making up the contour (for aligned display), or elements at the same locations in misaligned display. Each pair is connected via a gray line. Overall the plot depicts the sensitivity to local alignment of contour elements.

More »

Expand

Fig 3.

Impact of receptive field size and progression on contour integration in feedforward models.

(A) left: shows the receptive field progression over the layers (blue lines), relative to the standard Alexnet model (gray); right: shows the size of the receptive fields of units in the 5^th Convolutional block the final stage of the backbone before the fully-connected layers. (B) Top-1 object recognition accuracy on the ImageNet validation set for PinholeNet models with varying receptive field sizes (blue bars), as well as the standard Alexnet model (gray bar). (C) Contour detection accuracy for readout from the 5^th convolutional layer (left), and the 2^nd fully-connected layer (right) in PinholeNets (blue bars) and the standard Alexnet model (grey bar) on the held-out test set. The error bars denote the 95% confidence intervals for readout accuracy.

More »

Expand

Fig 4.

Human sensitivity to global curvature.

(A) Variation in global curvature (β) across a range of contour stimuli used in the study, with β values set at 15°, 30°, 45°, 60°, and 75°, demonstrating straighter to more curved contours. (B) Sequence of a 2-IFC contour detection trial where participants identify the display containing the contour. (C) Mean accuracy of participants for contour detection across varying β conditions, with error bars representing 95% confidence intervals of the mean accuracy bootstrapped across participants. (D) Bar graph showing the variability in human performance across individual trials within each β condition.

More »

Expand

Fig 5.

Model and human behavioral correspondence for contour integration.

(A) Contour stimuli containing global curvatures (β) spanning a broad range. (B) Scatter-plot depicting the correlation between the broadly-tuned model’s contour signal strength and human percent correct across trials, showing weak correspondence (Pearson’s r = 0.1907). (C) Line plot illustrating the broadly-tuned model’s performance against human performance for different global curvature levels, highlighting the model’s insensitivity to increasing curvature. The broadly-tuned model’s performance is shown in red and human performance is shown in grey. (D) Line plot illustrating the correlation of models, that were trained on curvatures within a specific narrow range (resulting in narrowly-tuned models), with humans, peaking at β = 20° and approaching noise ceiling (r = 0.785). (E) Scatter-plot depicting the correlation between the narrowly-tuned (at 20°) model’s contour signal strength and human percent correct across trials, showing strong correspondence (Pearson’s r = 0.768). (F) Line plot illustrating the narrowly-tuned model’s (at 20°) performance against human performance for different global curvature levels, highlighting the human-like sensitivity to curvature. The narrowly-tuned model’s performance is shown in green and human performance is shown in grey.

More »

Expand

Fig 6.

Fine-tuning a purely feedforward network reveals a capacity for visual uncrowding.

(A) A schematic of the uncrowding phenomenon. Identifying the offset of a vernier target is easy when presented in isolation (baseline), becomes difficult when surrounded by a single flanker (crowding), and becomes easier again as more identical flankers are added to the configuration (uncrowding). (B) The out-of-distribution training and testing paradigm. Models were trained on non-overlapping stimuli, where the vernier and flanker configurations appeared in the same image but were spatially separate and tested on overlapping (crowded) stimuli, where the vernier was centered within the flanker configuration. (C) Performance of a VGG19 architecture with a frozen, pretrained backbone. While the model can identify the vernier in isolation, its accuracy drops to chance level for all crowded conditions, regardless of the number of flankers. The model fails to exhibit uncrowding, consistent with prior reports on the limits of pretrained feedforward architectures. (D) Performance of a VGG19 network with a fine-tuned backbone. Accuracy is high for the isolated vernier, drops with a single flanker, and then systematically increases as more flankers are added. The results shown are from the model with the clearest emergent uncrowding.

More »

Expand

Fig 7.

Computational mechanisms underlying human contour integration.

(A) Low-level computations amplify responses to local elements that are part of a contour in the retinal image. This is facilitated by lateral connections between units with collinear tunings conceptualized as Association Fields (B) Mid-level feedforward computations focus on identifying potential candidates for an extended contour in the retinal image. This is facilitated by units with progressively increasing receptive fields (RFs) that are tuned to low orientation differences, allowing for the integration of local features into coherent extended contours.

More »

Expand