Fig 1.
A collection of images is composed of multiple views depicting the same object instance from different perspectives.
Table 1.
Overview of previous work utilizing multi-view classification.
Fig 2.
Considered multi-view fusion strategies: (a) general architecture of a deep multi-view CNN; (b) investigated fusion strategies; and (c) fusion strategies mapped onto the ResNet-50 architecture.
Vertical lines mark the insertion of a view-fusion layer.
Fig 3.
Example collections of the three multi-view datasets: (a) CompCars, (b) PlantCLEF, and (c) AntWeb.
Photographs of the ant specimen CASENT0281563 by Estella Ortega retrieved from www.AntWeb.org [32].
Table 2.
Top-1 accuracy refers to the best reported result in previous single-view studies using comparable evaluation protocols.
Fig 4.
Distance matrices for the three datasets.
Matrix diagonal elements refer to intra-class distance, off-diagonal elements to inter-class distances. Elements are sorted from well-separable classes to less-separable classes as computed from the class-wise silhouette scores.
Table 3.
Multi-view classification results across the three datasets.
Fig 5.
Distribution of class-averaged top-1 classification accuracy for the single-view baseline and the multi-view classification strategies.
White dots indicate median accuracy whereas black bars display interquartile ranges. Thin black lines indicate lower and upper adjacent values at 1.5× the interquartile range.
Table 4.
Top-5 accuracy for single-view and multi-view classifications.
Table 5.
Dataset demographics for the Flora Incognita dataset.
Table 6.
Multi-view classification results for the Flora Incognita dataset.