Adaptive representations of sound for automatic insect recognition

doi:10.1371/journal.pcbi.1011541

Fig 1.

Two spectrograms of the same recording of Gryllus campestris.

Spectrogram A displays the frequency axis linearly in Hz. Spectrogram B uses the mel frequency scale, which compresses the frequency axis to show higher resolution in lower frequency bands than in higher bands, mimicking the human perception of frequency. Both spectrograms display the same spectrum of frequencies. Due to the mostly high-frequency information and empty low frequencies in this recording, the mel spectrogram B obscures a large amount of information compared to the linear spectrogram A.

More »

Expand

Table 1.

InsectSet32: 335 files from 32 species with a total recording length of 57 minutes and four seconds were selected from two different source datasets (Orthoptera dataset by Baudewijn Odé and Cicadidae dataset by Ed Baker).

Number of files (n) and total length of recordings (min:s) per species.

More »

Expand

Table 2.

InsectSet47: 1006 files from 47 species with a total recording length of 22 hours were selected mainly from xeno-canto.org, as well as two private collections (Orthoptera dataset by Baudewijn Odé and Cicadidae dataset by Ed Baker).

Number of files (n) and total length of recordings (min:s) per species.

More »

Expand

Table 3.

InsectSet66: 1554 files from 66 species with a total recording length of 24 hours and 32 minutes were selected from five different source datasets (Orthoptera and Cicadidae datasets from iNaturalist, Orthoptera dataset from xeno-canto, Orthoptera dataset by Baudewijn Odé and Cicadidae dataset by Ed Baker).

Number of files (n) and total length of recordings (h:min:s) per species.

More »

Expand

Fig 2.

Example of the data augmentation workflow used on the training set (InsectSet47 and InsectSet66). Noise is added at a randomized signal-to-noise ratio and frequency distribution.

Then an impulse response from an outdoor location is applied at a randomized mix ratio.

More »

Expand

Table 4.

Test and validation scores for all trained models with mel and LEAF frontends on insect sound datasets of three different sizes.

The median as well as the lower and upper limits are reported from training multiple runs of the same model with different randomization seeds and four convolutional layers (five runs each for InsectSet32, three runs each for InsectSet47 and InsectSet66). The best performing models were also trained with an additional convolutional layer, indicated by the number in the model name.

More »

Expand

Fig 3.

Classification outcome for all 32 species in the test set using the best run of the mel frontend performing at 67% classification accuracy.

The vertical axis displays the true labels of the files, the horizontal axis shows the predicted labels, sorted alphabetically. Classifications within the two biggest genera Platypleura (green) and Myopsalta (red) are highlighted for comparison to the LEAF confusion matrix.

More »

Expand

Fig 4.

Classification outcome for all 32 species in the test set using the best run of the LEAF frontend performing at 78% classification accuracy.

The vertical axis displays the true labels of the files, the horizontal axis shows the predicted labels, sorted alphabetically. Classifications within the two biggest genera Platypleura (green) and Myopsalta (red) are highlighted for comparison to the mel confusion matrix.

More »

Expand

Fig 5.

Center frequencies of all 64 filters used in the best performing LEAF run on InsectSet32.

Plots A and D show the initialization curve before training, which is based on the mel scale. Plots B and E show the deviation of each filter from their initialized position after training. Plots C and F show the filters sorted by center frequency, and demonstrate the overall coverage of the frequency range, but do not represent the real ordering in the LEAF representations. Violin plots show the density of filters over the frequency spectrum, the orange line shows the initialization curve for comparison.

More »

Expand

Table 5.

Test and validation scores for the trained models using the leafFB frontend.

The median as well as the lower and upper limits are reported from training three runs of the same model with different randomization seeds and four convolutional layers.

More »

Expand