Fig 1.
Example of a rendered 3D skull image and specimen photograph.
(a) A rendered 3D skull image of a Canis lupus specimen. (b) A photograph of a different C. lupus specimen.
Fig 2.
Confusion matrices for the (a) baseline (synthetic only) and (b) supplemented (synthetic and photograph subset) models’ classifications of the Carnivora skulls housed at the Canadian Museum of Nature.
The rows are the species’ true classifications, while the columns represent the times the model made a classification as that species. Cells are shaded according to the proportion of the true labels classified as each species (i.e., shaded by row).
Table 1.
Skull species classification accuracy for each model, as measured on the synthetic image and photograph testing datasets. The synthetic image dataset was composed of renders of 3D skull assets and the photograph dataset was composed of photographs taken from skulls directly. The highest accuracy score for each dataset is underlined and italicised. The “Epochs” column represents the number of epochs each model was trained for. The MMD + Fine-tuning model combines the number of epochs used for the MMD model with the number of subsequent fine-tuning epochs.
Fig 3.
Visualisation of the feature space of six skull classification models using t-SNE.
Given that the absolute axis values of t-SNE plots did not contain meaningful information, they are not shown. Each t-SNE plot was generated from using the activations of the model’s post-convolution flattened layer. Blue ‘x’ points represent synthetic images, and the red dots represent photographs. All images were from the test dataset.
Table 2.
t-SNE silhouette scores based on clusters formed from the t-SNE embeddings of each model, and labelled using the ground truth labels from each dataset. The t-SNE embeddings were calculated from each model’s activations of the final convolutional layer. The combined score measures silhouette score when the synthetic image and photograph testing datasets were combined. The highest silhouette score for each dataset is underlined and italicised.
Fig 4.
Grad-CAM scores for each model.
The average scores for each model are printed at the top of the model’s bar. High Grad-CAM scores (i.e., closer to 3) indicated that the model frequently focussed on skull morphology to make classifications opposed to background features. Exact Grad-CAM scoring criteria can be found in (S2 Table, S1 File).