Table 1.
Mosquito species and number of male and female imaged.
Fig 1.
Representative specimen images of Anopheles stephensi.
Picture on the left shows a mosquito stored by freezing at -80C. Picture on the right shows a mosquito desiccated at room temperature.
Fig 2.
Sample images from our image dataset.
For each species member of the An. gambiae complex, 3 random females and 3 random males are displayed.
Fig 3.
Dataset preparation for applying machine learning.
The original dataset of images was split into three partitions, training, validation and test.
Fig 4.
Data augmentation applied to a single image.
48 images synthetically generated (augmented) from an input image. The first 2 rows show 24 rotations applied to the original image and the last two rows show the 24 rotations applied to a horizontal flip of the original image.
Table 2.
Split of the original data into three different partitions.
Fig 5.
Dense convolutional network (DenseNet).
In this architecture each layer is connected to every other layer within the dense blocks in a feed-forward fashion. DenseNets improve gradient-flow during training, strengthen feature propagation, and substantially reduce the number of weights. The input to the network is an image from the dataset and the output is a vector of probabilities for each of the 17 classes.
Table 3.
Overall training and validation accuracies for different CNN architectures with variable numbers of layers.
Table 4.
Overall training and validation accuracies of a DenseNet with 201, varying the selected optimizer.
For each optimizer, accuracies of the best learning rate are reported.
Table 5.
Confusion matrix for the 17 categories of the held-out set of images using the best hyperparameter configuration.
Rows and columns represent predicted and actual classes respectively.
Table 6.
Confusion matrix for the 17 categories of the held-out set of images using the best hyperparameter configuration.
Rows and columns represent predicted and actual classes respectively. Note that images of male specimens were excluded from this experiment.
Fig 6.
Highest predicted probabilities for true positives.
Blue dots represent probabilities for each of the images that were correctly classified in the test set. Red dots represent images from the member species of the Anopheles gambiae complex.
Fig 7.
Clustering feature representations.
Resulting dendrogram after applying Hierarchical Agglomerative Clustering. Colors indicate major branches.
Fig 8.
Visualizing the feature space.
t-SNE projection applied to the features extracted from the last convolutional layer. Colors are used to denote different clusters.
Fig 9.
Applying Grad-Cam++ to test images.
Five images from the were selected from the test set to be analyzed with Grad-Cam++. This is a visualization method for identifying the regions in the image that can explain the final classification made by the network.