Deep learning approaches to landmark detection in tsetse wing images

doi:10.1371/journal.pcbi.1011194

Fig 1.

Image of a tsetse wing containing 11 landmarks indicated by white numbered points.

The image also contains a scale that can be useful for placing later errors into context. Each pixel equates to approximately 0.007mm based on images taken of a physical measure.

More »

Expand

Fig 2.

Flow chart of the entire procedure used in this study, including the tier 1 and 2 development and deployment.

The data were first recorded on paper and laminated with the physical tsetse wings, separated into volumes consisting of pages illustrated on the top left of the diagram. Biological lab recordings and tsetse wings were then digitally captured in a CSV file and nested folders (i.e. Vol/page) of images. This study used a subset of the full data set (Vol. 20 and 21) to establish a method for recording landmarks automatically. Labeller refers to the manual labelling stage to train machine learning models. Sample statistics were performed to understand the proportion of different categories of incomplete wings, which was used to inform an appropriate classification model. In addition, sample statistics were performed on misaligned pages to estimate the number of misalignment pages we expect to find. The tier 1 and 2 processes at the bottom of the figure explain the deployment process. Tier 1 decides whether a wing is complete and can be sent to tier 2, where landmarks are localised. The two-tier landmark detection system is deployed on the unannotated data set of all images in Volumes 20 and 21. The final Misalignment analysis is fully described in Fig A of S1 Fig.

More »

Expand

Table 1.

Biological data captured in lab dissection.

More »

Expand

Fig 3.

ResNet50 is modified by removing the final 2 layers and replacing them with a randomly initialised convolutional layer, followed by a fully connected layer of size 22, representing the output.

The output corresponds to an x and y coordinate for each of the 11 landmarks.

More »

Expand

Fig 4.

The network is composed entirely of convolutional layers.

It can be divided into downsampling and symmetric upsampling blocks. The output is of dimension 11 × 224 × 224, where each output segmentation map is a binary image with a disk centred at a particular landmark.

More »

Expand

Fig 5.

Box-and-whisker plots for the baseline model, regression and segmentation networks.

The regression and segmentation network show significant improvement over the baseline (5C). The regression network (5A), has a slightly higher mean pixel distance error and higher maxima when compared to the segmentation network, but has fewer egregious outliers. For the segmentation network (5B), four outliers ranging from 50 to 570 are not displayed for clarity.

More »

Expand

Table 2.

Effects of data augmentations on average landmark errors.

More »

Expand

Fig 6.

Procrustes disparity from the mean landmark shape and predicted vs the mean pixel distance error.

The predictions using the regression network (A) has slightly higher correlation and pixel distance error then the segmentation network (B). The segmentation network also has a lower error interval indicated in light cyan, the 95% confidence interval is indicated in sky blue and the best fit line in royal blue.

More »