Fig 1.
Example of contextual variations of a motif.
Artists have painted the skull in different epochs, genres, and techniques. It appears, for example, in Hans Holbein’s The Ambassadors, Caravaggio’s painting of Saint Jerome, and still lives by Pieter Claesz or Modern artist Paul Cézanne. While it is always the same object, its symbolic function and meaning intended by the artist varies. For example, in Christian iconography, the skull relates to Saint Jerome, but more often, it refers to the concept of ‘memento mori’– a reminder of human mortality. Finding and tracking such motifs through time and space helps art historians to identify relations between artworks, the meaning and popularity of motifs, or how they have been adapted and sometimes altered in form and content. Image (a) is a slight modification of [4] licensed under CC BY 2.0. The remaining image material is shared by Wikimedia Commons [5] as public domain.
Fig 2.
Overview of the interface architecture.
The application is divided into a front-end and back-end, where the front-end is served as a single-page application and represents the interface for the user. The back-end consists of a REST API and the search back-end, where the search back-end consists of multiple initialization and search workers. The image material is shared by Wikimedia Commons [5] as public domain.
Fig 3.
Visualization of the interface workflow.
First, the dataset to be searched is selected, or a new dataset is created and initialized (a). Then an image can be chosen, and one or multiple search boxes can be selected and searched across the dataset (b). The retrievals are displayed in an ordered list (c) or with a two-dimensional t-SNE embedding [69] (d). Image material is shared by Wikimedia Commons [5] either as public domain or under a CC0 license.
Fig 4.
Overview of the multi-style feature aggregation.
It consists of three main parts: First, we extract the features of all images using a style classification network, cluster them in the feature space, and select the cluster centers as style templates. Second, given an input image, it is stylized concerning all style templates using a universal style transfer model. Then, all stylized image features are extracted and aggregated to the final multi-style feature representation. Third, given the feature map of this representation and a set of image patches, their feature descriptors are extracted using multi-scale ROI pooling. Please see the text for more details. Image material is shared by Wikimedia Commons [5] either as public domain or under a CC0 license.
Fig 5.
General structure of the retrieval algorithm.
The retrieval algorithm consists of an offline preparation and an online search stage. During the offline preparation stage, features of local patches on multiple-scales are extracted for all images (F1). Then, they are compressed and stored in the search index (F2). During the online search stage, descriptors of discriminative local patches within the query region are extracted (N1). Their k-nearest neighbors across the whole dataset are determined, and our voting procedure aggregates multiple local matches to retrieval bounding boxes (N2). Finally, the results are refined using local query expansion and re-voting (N3). Please, see the text for more details. Image material is shared by Wikimedia Commons [5] either as public domain or under a CC0 license.
Fig 6.
Overview of our voting based on local matches.
The voting consists of two main steps. First, for each local query patch, we search for its k-nearest neighbors across the dataset, order all images based on their sum of local matching scores, and focus on the images with the highest score. Second, each local match votes for a specific center and scale of the retrieval bounding box, which are aggregated to the final retrieval results.
Fig 7.
Search results for Rubens’ horse motif.
Detail view of our interface showing the search for the motif of the horse in frontal view, a depiction popularized by Peter Paul Rubens. The selected query region is displayed on the left and retrievals marked as particularly interesting are displayed on the right. Favorites were selected from the first 40 results (first two retrieval pages). Additionally, we show the zoomed-in view given by our interface for some images and added the artist’s name. Image material is shared by Wikimedia Commons [5] either as public domain or under a CC0 license.
Fig 8.
Search results for the horse and the group of lions.
The selected query regions are displayed on the left, retrievals marked as particularly interesting are displayed on the right. Favourite retrievals were selected from the first 40 results (first two retrieval pages). Additionally, we added the artist’s name for a better orientation. “Brueghel” was used as an umbrella term and includes members of the Brueghel family, their workshops and successors who painted in their style. For these works, the attribution is uncertain. Image material is shared by Wikimedia Commons [5] either as public domain or under a CC0 license.
Fig 9.
Examples of paintings in the Brueghel dataset.
The depicted images show the diversity of artworks in the Brueghel dataset. It includes artworks of various subject matters (i.e. landscapes, genre paintings), in different techniques (i.e. drawing, oil, print) and materials (i.e. paper, canvas). Due to copyright issues, we exchanged several images. We made sure that replacements are as similar as possible to the originals. The image material is shared by Wikimedia Commons [5] as public domain.
Table 1.
Comparison of feature representations and fusion strategies.
Table 2.
Effect of the number of style templates.
Table 3.
Effect of the multi-scale region-of-interest pooling.
Table 4.
Effect of the iterative voting.
Fig 10.
Effect of the local patch selection.
Retrieval performance was measured for different average numbers per image (a), different sizes (b), and covered scales (c,d) for the patches on different datasets. We have successively reduced the number of scales starting with the smallest (c) and starting with the largest (d) local patches. We have marked our default configuration with a dashed line in each plot.
Table 5.
Comparison of search speed and index size.
Table 6.
Comparison of retrieval performances on different benchmarks.
Fig 11.
Qualitative comparison of our method with Artminer.
Search examples of our approach (Ours) and Shen et al. [17] (Artminer) on the Brueghel dataset. Queries are shown in blue on the left and retrievals on the right. If the IoU is greater or smaller than 0.3, we draw green or red bounding boxes, respectively. For a better overview, we draw only the first and four additional retrievals with equidistant ranks. We set the distance between ranks to the number of ground truth matches of this query divided by four. Due to copyright issues, we replaced several images. We made sure that replacements are as similar as possible to the originals. The original figure is also available on our project website. The image material is shared by Wikimedia Commons [5] as public domain.
Fig 12.
Retrieval examples on Brueghel, LTLL and Oxford5K.
Rows 1–2, 3–4 and 5–6 show search examples of the Brueghel, LTLL and Oxford5K dataset, respectively. To allow for a better comparison, the first rows show retrievals in the full image (Full) and the second rows display enlarged versions (Zoom). Queries are shown in blue on the left and retrievals on the right. We highlight correct and incorrect retrievals with green and red bounding boxes, respectively. For a better overview, we draw only the first and four additional search results with equidistant ranks. We set the distance between between ranks to the number of ground truth matches of the query divided by four. Due to copyright issues, we replaced several images. We made sure that replacements are as similar as possible to the originals. The original figure is also available on our project website. Image (L4), (L7), (O1), (O2), (O4), (O6) and (O7) are slight modifications of [82–88] licensed under CC BY 2.0. Image (L1), (L2), (L3), (L5), (L6), (O3) and (O5) are slight modifications of [89–95] licensed under CC BY 3.0. The remaining image material is shared by Yale University Art Gallery [96] or Wikimedia Commons [5] either as public domain or under a CC0 license.
Fig 13.
Qualitative comparison with TinEye, Google and Bing image search.
We show retrieval examples for a given query on a modified Brueghel dataset for our algorithm and TinEye [22] and on the web for Google [20] and Bing [21] image search, where we made sure that all algorithms can find all visualized retrievals. We show results for a holistic image search (Holistic) as well as on the cropped (Cropped) and marked region (Region). In addition, we visualize retrievals in full screen (first row) and zoomed-in version (second row) for our results of the regional search. Due to copyright issues, we replaced several images. We made sure that the replacements are as similar as possible to the originals. The original figure is also available on our project website. Image (B1) and (B2) are slight modifications of [97, 98] licensed under CC BY 3.0. Image (B3) is a slight modification of [99] licensed under CC BY 2.0. The remaining image material is shared by Wikimedia Commons [5] either as public domain or under a CC0 license.