Deep convolutional networks do not classify based on global object shape

doi:10.1371/journal.pcbi.1006613

Fig 1.

Demonstration of the importance of global shape in object recognition.

(a) Silhouette of a bear; (b) Scrambled natural image of a bear (See text). Image URLs are in S2 File.

More »

Expand

Fig 2.

Sample stimuli used in Experiment 1.

The bounding shape of an object was combined with the texture of a different object to generate each image. a) Shape: Teapot | Texture: Golf ball; b) Shape: Vase | Texture: Gong; c) Shape: Airplane | Texture: Otter; d) Shape: Obelisk | Texture: Lobster; e) Shape: Cannon | Texture: Pineapple; f) Shape: Ram | Texture: Bison; g) Shape: Camel | Texture: Zebra; h) Shape: Orca | Texture: Kimono; i) Shape: Otter | Texture: Speedometer; j) Shape: Elephant | Texture: Sock. The full image set is displayed in Figs 3–6.

More »

Expand

Fig 3.

Network classifications for the stimuli presented in Experiment 1 Part 1.

The left most column shows the image presented. The second column in each row names the object from which the shape was sampled. The third column names the object from which the texture silhouette was obtained. Probabilities assigned to the object name in columns 2 and 3 are shown as percents below the object label. The remaining five columns show the probabilities (as percents) produced by the network for its top five classifications, ordered left to right in terms of probability. Correct shape classifications in the top five are shaded in blue and correct texture classifications are shaded in orange.

More »

Expand

Fig 4.

Network classifications for the stimuli presented in Experiment 1 Part 2.

More »

Expand

Fig 5.

Network classifications for the stimuli presented in Experiment 1 Part 3.

More »

Expand

Fig 6.

Network classifications for the stimuli presented in Experiment 1 Part 4.

More »

Expand

Fig 7.

Comparison of probabilities assigned to image shapes and textures for animals.

On the x-axis, the shape and texture of each object are given as shape-texture. Filled black bars display the probability given by the network to the correct shape. Outlined bars display the probability given by the network for the correct texture.

More »

Expand

Fig 8.

Comparison of probabilities assigned to image shapes and textures for artifacts.

On the x-axis, the shape and texture of each object are given as shape-texture. Filled black bars display the probability given by the network to the correct shape. Outlined bars display the probability given by the network for the correct texture.

More »

Expand

Fig 9.

Sample stimuli used in Experiment 2.

More »

Expand

Fig 10.

VGG-19 classifications for glass figurines Part 1.

The leftmost column shows the image presented to the VGG-19 DCNN. The second column shows the correct object label and the probability generated by the network for that label. The other five columns show probabilities for the network’s top five classifications, ordered left to right from highest to lowest. Correct classifications are shaded in blue.

More »

Expand

Fig 11.

VGG-19 classifications for glass figurines Part 2.

More »

Expand

Fig 12.

Five additional glass Pianos.

VGG-19 incorrectly classified each of these five images despite correctly classifying the glass piano shown in Fig 11.

More »

Expand

Fig 13.

Sample outline stimuli used in Experiment 3.

More »

Expand

Fig 14.

VGG-19 classifications for object outlines Part 1.

The leftmost column is the image presented to the DCNN. The second column from the left is the correct object label and the classification probability produced for that label. The other five columns show probabilities for the VGG-19’s top five classifications, ordered left to right in terms of the probability given by the network. Correct classifications are shaded in blue.

More »

Expand

Fig 15.

VGG-19 classifications for object outlines Part 2.

More »

Expand

Fig 16.

VGG-19 classifications for object outlines Part 3.

More »

Expand

Fig 17.

VGG-19 classifications for object outlines Part 4.

More »

Expand

Fig 18.

Sample stimuli used in Experiment 4.

More »

Expand

Fig 19.

VGG-19 classifications for black object silhouettes Part 1.

The leftmost column shows the image presented to VGG-19. The second column from the left shows the correct object label and the classification probability produced for that label. The other five columns show probabilities for the network’s top five classifications, ordered left to right in terms of the probability given by the network. Correct classifications are shaded in blue.

More »

Expand

Fig 20.

VGG-19 classifications for black object silhouettes Part 2.

More »

Expand

Fig 21.

VGG-19 classifications for black object silhouettes Part 3.

More »

Expand

Fig 22.

VGG-19 classifications for black object silhouettes Part 4.

More »

Expand

Fig 23.

Stimuli used in Experiment 5a.

Top row: the original silhouette images, all correctly classified by VGG-19 (appearing in top-five). Bottom row: Scrambled images on which the network was tested.

More »

Expand

Fig 24.

VGG-19 classifications for part-scrambled silhouettes.

The leftmost column shows the image presented to the DCNN. The second column shows the correct object label and the classification probability produced by the network for that label. The other five columns show probabilities for the network’s top five classifications, ordered left to right from highest to lowest. Correct classifications are shaded in blue.

More »

Expand

Fig 25.

VGG-19 for unscrambled and part-scrambled images.

Bars show probabilities for correct responses for each of the objects. Probability is plotted on a logarithmic scale to make small values visible.

More »

Expand

Table 1.

Human observers’ performance on individual items for part-scrambled objects.

More »

Expand

Fig 26.

Stimuli used in Experiment 5b.

Top row: the original silhouette images, all correctly classified by the network. Bottom row: images with local contour features disrupted.

More »

Expand

Fig 27.

VGG-19 classifications for serrated edge silhouettes.

The leftmost column shows the image presented to the DCNN. The second column shows the correct object label and the classification probability produced by the network for that label. The other five columns show probabilities for the network’s top five classifications, ordered left to right from highest to lowest. Correct classifications are shaded in blue.

More »

Expand

Fig 28.

Comparison of VGG-19 performance for locally perturbed contours with unscrambled and part-scrambled images.

Bars show probabilities for correct responses for each of the objects. Probability is plotted on a logarithmic scale to make small values visible.

More »

Expand

Table 2.

Human observers’ performance on individual items for perturbed-contour objects.

More »

Expand