Human-like face pareidolia emerges in deep neural networks optimized for face and object recognition

doi:10.1371/journal.pcbi.1012751

Fig 1.

Neural correlates of pareidolia with the idealized models measured using RSA.

A Idealized model RDMs reflecting two hypotheses on the representations of face pareidolia: i) ‘Pareidolia ~ Faces’ hypothesis posits that pareidolia faces are represented similarly to real faces but differ from matched objects (left), and ii) ‘Pareidolia ~ Objects’ hypothesis suggests that pareidolia faces are represented similarly to objects but differ from faces (right). The icons used in this figure have been obtained from The Noun Project (https://thenounproject.com/) under a royalty-free license. B Partial correlations between all time points of the neural MEG data and each idealized model RDM, controlling for the other model RDM. Partial correlations involve measuring the unique variance in the neural data explained by each idealized model, independent of the variance explained by the other model. These correlations reveal that the ‘Pareidolia ~ Faces’ model uniquely explains variance in the neural data at early time points, indicating that pareidolia faces are initially represented similarly to real faces (peak at 165 ms after stimulus onset; blue line). However, the ‘Pareidolia ~ Objects’ model uniquely explains increasing variance at later time points, suggesting that pareidolia faces progressively resemble matched objects and become segregated from real faces (peak at 255 ms after stimulus onset; orange line). Throughout the MEG time course, the partial correlation values for the ‘Pareidolia ~ Objects’ model (orange line) are generally higher than those for the ‘Pareidolia ~ Faces’ model (blue line), indicating that pareidolia faces are represented more similarly to objects than to real faces overall. Shaded areas indicate SEMs. Horizontal lines (in blue and orange) indicate time points of significant partial correlations (p < 0.05) for each hypothesis, as determined by permutation clustering. The black line denotes time points of significant differences between the models.

More »

Expand

Fig 2.

Experimental methods and analyses.

A We compared the neural MEG responses with representations in CNNs using the 96 images from Wardle et al. [12]. The figure displays iconic placeholders representing the three different stimulus categories (i.e., faces, pareidolia, and matched objects), along with an example image used in the experiments. The icons used in this figure have been obtained from The Noun Project (https://thenounproject.com/) under the royalty-free license. Note that we do not have the right to display the human face images used in the experiments. The face image example shown in this figure is a similar photograph taken of one of the authors, who has granted permission for the publication of his identifiable image. The pareidolia and object images were sourced from Wardle et al. [12] (https://www.nature.com/articles/s41467-020-18325-8) and are used under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). For legal compliance, parts of the images containing logos or brands have been covered with a white box. All stimuli used in the experiments are available at the Open Science Framework (https://osf.io/9g4rz). B We used five task-optimized CNNs based on the VGG16 architecture, each trained on different combinations of face and object tasks: 1. Dual-task CNN: trained on face identification and object categorization; 2. Face-identification CNN: trained solely on face identification; 3. Face-identification-and-Object-detection CNN: trained on face identification and object detection; 4. Object-categorization-and-Face-detection CNN: trained on object categorization and face detection; and 5. Object-categorization CNN: trained solely on object categorization. C To generate a representational dissimilarity matrix (RDM), we initially passed the stimulus set through each CNN and extracted activations from specific layers of the network to obtain feature vectors. We then computed pairwise correlations between all feature vectors and subtracted them from 1, resulting in layer-wise RDMs. D We used multidimensional scaling (MDS) to visualize the obtained layer-wise RDMs. E By using representational similarity analysis (RSA), we measured the similarity (i.e., Spearman correlation between the upper triangles of the RDMs) between the layer-wise RDMs and the RDMs derived from different time steps of neural MEG data and idealized model RDMs.

More »

Expand

Fig 3.

RSA between MEG responses and task-optimized CNNs across time.

A Correlations between all time points of the neural MEG responses and the penultimate layer (layer-15) of each task-optimized CNN. CNNs including optimization for object categorization (i.e., Dual-task CNN, Object-categorization CNN, and Object-categorization-and-Face-detection CNN) showed a higher correlation than CNNs lacking optimization for object categorization (i.e., Face-identification CNN and Face-identification-and-Object-detection CNN). We have included the same analysis for all CNNs and layers in the Supplementary (see Supporting information). B Commonality index representing the unique variance shared between MEG data, task-optimized CNN, and idealized model RDM representing the similarity between pareidolia and faces (Pareidolia ~ Faces model RDM), while controlling for the other model RDM (Pareidolia ~ Objects model RDM). The similarity between pareidolia and faces contributes most to the correlation between the Dual-task CNN and the MEG data, followed by the Object-categorization and the Object-categorization-and-Face-detection CNN. C Commonality index representing the unique variance shared between MEG data, task-optimized CNN, and the idealized model RDM representing the similarity between pareidolia faces and objects (Pareidolia ~ Objects model RDM), while controlling for the other model RDM (Pareidolia ~ Faces model RDM). At early time points the similarity between pareidolia and objects contributes most to the correlation between the Face-identification CNN and the MEG data, while at later time points it mostly contributes to the correlation between the Object-categorization, followed by the Dual-task CNN and the Object-categorization-and-Face-detection CNN. Note the different scales used for B and C indicating that the commonality index of the ‘Pareidolia ~ Faces’ model is substantially lower than that of the ‘Pareidolia ~ Objects’ model. Horizontal lines below each plot indicate periods of significant correlation (cluster-based permutation test, p < 0.05) for each CNN.

More »

Expand

Fig 4.

Effects of task optimization on pareidolia representations in CNNs.

We used representational similarity analysis (RSA) to investigate how task optimization influences the representation of pareidolia faces, real faces, and matched objects in CNNs. For each CNN, we computed layer-wise representational dissimilarity matrices (RDMs) and calculated partial correlations with the two idealized models: ‘Pareidolia ~ Faces’ (blue lines) and ‘Pareidolia ~ Objects’ (orange lines). Partial correlations involve correlating each CNN RDM with one idealized model while controlling for the other, isolating the unique contribution of each model. Panels A - E displays the results for each CNN: A: Dual-task CNN (trained on face identification and object categorization), B: Face-identification CNN, C: Face-identification-and-Object-detection CNN, D: Object-categorization-and-Face-detection CNN, E: Object-categorization CNN. In CNNs that included object categorization training (Panels A, D, E), the representational similarity between pareidolia faces and real faces (‘Pareidolia ~ Faces’; blue lines) increased across layers, indicating that these networks progressively represent pareidolia faces more like real faces in higher layers. Conversely, in CNNs lacking object categorization training (Panels B, C), this similarity was weak, only reaching significance at early layers, and decreased across layers. The Dual-task CNN showed a unique representational pattern where pareidolia faces were represented closer to matched objects than to real faces (‘Pareidolia ~ Objects’; orange line), while still being more similar to real faces than objects (‘Pareidolia ~ Faces’; blue line) from mid-layers (layer 5) onward, matching pareidolia representations in MEG (c.f. Fig 1). A multidimensional scaling (MDS) analysis of the Dual-task CNN is provided in the Supplementary Materials (see Supporting information). The shaded areas represent the standard error of the mean (SEM), bootstrapped across images. Colored circles indicate layers with significant correlations, determined using two-sided permutation tests with Bonferroni correction (p < 0.05).

More »

Expand

Fig 5.

Visualization of critical features used by the Dual-task CNN to classify face pareidolia stimuli.

A Visualization of the critical features in sample pareidolia stimuli (top row) that the Dual-task CNN uses to classify these stimuli as either ‘face’ (middle row) or ‘object’ (bottom row). Green areas indicate positive class attribution, essential for classification, while red areas signify negative class attribution, detrimental for classification. The pareidolia images were sourced from Wardle et al. [12] (https://www.nature.com/articles/s41467-020-18325-8) and are used under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). Parts of the object and pareidolia images containing logos or brands have been covered with a white box for legal compliance. Example masks are used to quantify the use of face-like features in classifying pareidolia stimuli as ‘face’ or ‘object’. For each pareidolia stimulus, we generated masks corresponding to the entire face (‘Face’), specific face features like eyes (‘Eyes’) and mouth (‘Mouth’), and the area outside the face (‘Outside Face’). We calculated the mean pixel ratio within each mask (ranging from 0, indicating the lowest, to 1, the highest) for classifying the stimulus as ‘face’ (red bars) or as ‘object’ (yellow bars). B, C, D, E, F This mask-based ratio is calculated for all the CNNs. The analysis revealed that the Dual-task CNN primarily relies on facial features, especially the eye region, to classify the pareidolia stimulus as a face. Areas outside the face were similarly used in both classifications. Such clarity is not observed in other CNNs. Error bars denote SEM across stimuli.

More »

Expand