Psychologists used two pretrained, well known CNN models (i.e. AlexNet & VGG 19), but they were trained on realistic images of real world objects. Article does not seem to indicate, that researchers trained these two models on any images containing objects resembling modern art (Did not have time to read the entire 43 page article). I am actually truly amazed how well the these two model score these modern art images. Take e.g. the image of an elephant overlaid with a sock pattern, model scores it as a probable poncho, which seems reasonable. Actually, the first choices are still quite often correct. I think, the authors have taken Gestalt psychology a bit too far in this paper.