An image-computable model of human visual shape similarity

doi:10.1371/journal.pcbi.1008981

Fig 1.

ShapeComp: a multidimensional perceptual shape similarity model.

We readily perceive how similar shape (A) is from others (numbered 1–5). (B) Outline of our model, which compares shapes across >100 shape descriptors (6 examples depicted). The distance between shapes on each descriptor was scaled from 0 to 1 based on the range of values in a database of 25,712 animal shapes. Scaled differences are then linearly combined to yield ‘Full Model’ response. Applying MDS to >330 million shape pairs from the Full Model yields a multidimensional shape space for shape comparison (‘ShapeComp’). We reasoned that many descriptors would yield a perceptually meaningful multidimensional shape space due to their complementary nature. (C) Some shape descriptors are highly sensitive to rotation (e.g., Major Axis Orientation), while (D) other descriptors are highly sensitive to bloating (e.g., Solidity). (E) Over 100 shape descriptors were evaluated in terms of how much they change when shapes are transformed (‘sensitivity’).

More »

Expand

Fig 2.

The high-dimensionality of real-world shapes.

(A) t-SNE visualization of 2000 animal silhouettes arranged by their similarities according to a combination of 109 shape descriptors. Colour indicates basic level category. Insets highlight local structure: bloated shapes with tiny limbs (left); legged rectangular shapes (middle); small spiky shapes (right). To test whether human shape similarity is predicted in the high-dimensional animal space, we gathered human shape similarity judgments on horses (purple), rabbits (yellow), and other animals. (B) Human similarity arrangements of horse silhouettes, and (C) of silhouettes across multiple categories of animals (multidimensional scaling; dissimilarity: distances, criterion: metric stress). Similarity arrangement for (D) horse silhouettes and (E) multiple categories of animals in the full model based on 109 shape descriptors (multidimensional scaling; dissimilarity: distances, criterion: metric stress). Shapes with same colour across B and D or C and E are also the same. (F). Human arrangements correlate with the model for horse (purple), rabbit (yellow), and multiple animal silhouettes (gray) (r = 0.63, p < 0.01). (G). Across 25,712 animal shapes, 22 dimensions account for >95% of the variance (multidimensional scaling; dissimilarity: distances, criterion: metric stress). We call these 22 dimensions ShapeComp. (H) The space spanned by these ShapeComp dimensions regularly occurs across combinations of different animal sets (‘Animals’) and shape descriptors (‘Descriptors’). The pairwise distances across 200 test shapes are highly correlated across ShapeComp computed from 10 different sets of 500 randomly chosen animal shapes (‘Animals’), and also, but to a lesser degree, across 10 different sets of randomly selected shape descriptors (‘Descriptors’; 55 out of 109).

More »

Expand

Fig 3.

GANs produce novel naturalistic shapes.

(A) Cartoon depiction of a Generative Adversarial Networks (GANs) that synthesizes novel shape silhouettes. GANs are unsupervised machine learning systems with two competing neural networks. The generator network synthesizes shapes, while the discriminator network, distinguishes shapes produced by generator from a database of over 25,000 animal silhouettes. With training, the generator learns to map a high-dimensional latent vector ‘z’ to the natural animal shapes, producing novel shapes that the discrimantor thinks are real rather than synthesized. Systematically moving along the high-dimensional latent vector z produces novel shape variation and interpolations across a shape space (B, C, and D). (E) A normalized histogram with the number of unique responses across 100 GAN shapes and 20 animal shapes shows that category responses across GAN shapes tend to be much more inconsistent across participants than animal shapes, confirming that GAN shapes appear more unfamiliar than animal shapes.

More »

Expand

Fig 4.

Interpreting ShapeComp dimensions.

Example GAN shapes that vary along the first 6 MDS dimensions. Two shapes (in black) are varied along one dimension (in different colours, dimensions 1–6) while the remaining dimensions are held roughly constant. The different GAN shapes that varied in their MDS coordinates were optimized with a genetic algorithm from MATLAB’s global optimization toolbox to reduce RMS error between a GAN shapes 22-D representation and a desired 22-D representation.

More »

Expand

Fig 5.

ShapeComp predicts human shape similarity across small sets of shapes.

(A) Example shape pairs that varied as a function of ShapeComp distance. (B) Shape similarity ratings averaged across 14 observers for 250 shape pairs highly correlate with distance in ShapeComp’s 22-dimensional space. Inset: The variance in the similarity ratings accounted for by the different ShapeComp dimensions. Many ShapeComp dimensions on their own account for some of the variance in human shape similarity ratings. Shaded error bars are estimated via 1000 bootstrapping across participant responses. (C) Pixel similarity was defined as the standard Intersection-over-Union (IoU; [37, 72]) (D) Observers viewed shape triads and judged which test appeared more similar to the sample. (E) ShapeComp distance between test and sample were parametrically varied but pixel similarity was held constant. (F) Mean probability across participants, that the closer of two test stimuli was perceived as more similar to the sample, as a function of the relative proximity of the closer test shape. Blue: psychometric function fit; orange: prediction of IoU model. (G) Results of experiment in which distances from test to sample were equated for one ShapeComp dimension at a time. Mean psychometric functions slopes were much steeper than predicted if observers relied only on the respective dimension. These results, and that the variance in the similarity ratings is accounted for by many ShapeComp dimensions, inset in B, support the idea that human shape perception is based on a high-dimensional feature space.

More »

Expand

Fig 6.

ShapeComp predicts perceptual distortions in human shape similarity across shape arrays.

Four example shape sets (A, B, C, D) sampled uniformly in GAN space (top row). To test whether subtle perceptual distortions in humans were systemically deviated away from GAN space towards ShapeComp, these shape sets were selected such that the pairwise distances of shapes in ShapeComp varied slightly from GAN (with Pearson correlation values between 0.5 < r < 0.75). The arrays are distorted by ShapeComp (second row) in similar ways to humans (third row; mean across 16 participants). Across arrangements, shapes with same colour are also the same. (E) Non-uniformities for individual participants (dots) in 4 shape sets (A-D, colours). Squares show average across subjects for given set, where error bars show ± 2 standard errors. ShapeComp accounted for perceptual distortions away from the original GAN coordinates better than GAN+noise model. (F) Correlation of ShapeComp distortion with human distortion as a function of the diversity of shapes across the shape set (measured as cumulated variance in shape set across ShapeComp dimensions). Human distortions better line up with ShapeComp when there is more diversity across shape sets as predicted by ShapeComp. Grey reference line shows y = x.

More »

Expand

Fig 7.

ShapeComp predicts perceptual uniformities in human shape similarity across shape arrays.

(A,B,C,D) The top row shows four example 2D shape arrays that are roughly uniform in ShapeComp and highly correlated to the GAN arrangement (r>0.9). The bottom row shows the mean arrangement by 16 human observers. (E) In 3 out of 4 shape sets that are highly correlated in terms of GAN and ShapeComp arrangements, human responses are nearly indistinguishable from the predictions of ShapeComp (blue), given the inherent noise across observers measured as the lower noise ceiling (red; 95% confidence interval showing correlation of each participant’s data with mean of others). Error bars (in black) show 95% confidence interval around human-model correlation.

More »

Expand

Fig 8.

ShapeComp neural network for estimating a shape’s 22-Dimensional ShapeComp coordinates.

Neural networks in (A) MATLAB (MatNet) and (B) Python (KerNet1) were trained on 800,000 shapes to get as input the shape x,y coordinates and output the 22D high-dimensional shape space. (C) Kernet2, also in Python, was trained to output the ShapeComp coordinates from 40×40 image patches. (D) The networks 22-dimensional distances across all pairwise comparisons of 1000 untrained shapes are highly correlated to the pattern of distances from the original ShapeComp solution.

More »

Expand

Fig 9.

Using ShapeComp to evaluate shape similarity in existing shape sets.

Even with novel shapes from, as an example, the (A) validated circular shape space set (human data; from [90]), (B) ShapeComp’s predictions show many similarities to humans. While ShapeComp’s arrangement is more compressed, ShapeComp correctly predicts (i) large gaps between shapes 1 and 15, and 1 and 2, (ii) the circular nature of the data set, (iii) subjective difference between 1 and 11 is smaller than between 14 and 8, yielding the elongated arrangement. (C) Correlation between ShapeComp and human similarity judgments for the distances between all possible (105 pairs) (r = 0.78, p <0.01). Given the noise uncertainty across observers–which is unknown for the circular shape set—ShapeComp appears to be a good model of human behaviour. Note, given that some shapes in the circular shape set (e.g., 5 or 6) have multiple minimum x-values, we used KerNet2 which is based on images to compute the ShapeComp solution.

More »

Expand

Fig 10.

Synthesizing perceptual uniform shape spaces.

ShapeComp paired with GAN can be used to create perceptually uniform shape spaces (A-C) along a triangular (A, C) or uniform (B) grid or in selecting test shapes that have similar shape similarities (D, near, medium, or far in terms of their distances in ShapeComp) to the central sample shape.

More »

Expand

Fig 11.

Model comparison.

ShapeComp is more predictive of human shape similarity than standard object recognition neural networks across pairs of novel GAN shapes and shape sets. In (A) models are compared to human shape similarity ratings across pairs of shapes (data from Fig 5B). In (B) models are compared to individual observers’ similarity arrangements (data from Fig 7). For any given shape set, each human observer’s similarity matrix was correlated with the mean of the other observers (y-axis) and several models (ResNet101, GoogLeNet, or ShapeComp). The black line shows when an observer is equally correlated to other observers and the model. Only ShapeComp approaches this line, showing that it is a better model of human shape similarity across novel shape sets. Network shape similarity was defined as Euclidean distance in their final fully-connected layer (with 1000 units).

More »

Expand