The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics

doi:10.1371/journal.pone.0278522

Fig 1.

Most species of bird have little bioacoustic data.

X-axis shows, on a log scale, the number of songs present per species. Y-axis shows the number of species per song bin. Top panel: Xeno-Canto data. Bottom panel: Borror Laboratory of Bioacoustics (BLB) data; note the x-axis has a gap to show an exceedingly high number of species with no data, as BLB is North America focused.

More »

Expand

Fig 2.

Species in this study varied in their song types and phylogenetic relatedness.

On the left is a phylogeny describing the phylogenetic distances between ten focal species. Species are shown as artist’s renditions of birds at tips of branches. Note that birds are not to scale. Numbers in circles at nodes show approximate time of divergence (in Mya; branches and nodes not to scale). On the right, spectrograms of exemplar songs per species are shown, with frequency on the y-axis (in kHz) and time on the x-axis (in seconds). Scientific, common names, and recording identification are shown above the spectrogram, where BLB = Borror Laboratory of Bioacoustics and XC = Xeno-Canto. Red boxes show the annotations that were manually put on per taxon.

More »

Expand

Fig 3.

Flowchart describing the procedure from acquiring song recordings to having segmented syllables for downstream analysis.

Boxes show general steps, arrows show flow of steps, and diamonds show decision point for data quality.

More »

Expand

Fig 4.

Frame rate accuracy and syllable error rate of models trained in TweetyNet across species.

Far left shows phylogeny from Fig 1 with species at tips. Left: frame rate accuracy between species trained on (top to bottom) and species tested on (left to right). Species given with same symbols as in Fig 1 plus all nine species (circular cluster). For accuracy, darker values indicate higher accuracy. Right: segment error rate between species trained on and tested on. For segment error rate, darker values indicate higher error (worse performance) on a log scale, where 0 = 0% error and 263 = 263% error.

More »

Expand

Table 1.

Models trained in TweetyNet across multiple species vary in training time, dataset size, and type of model.

“9 Species” models were trained on all data from the “Single species” type models.

More »

Expand

Fig 5.

Model performance by species drops with estimated divergence, but training on multiple species performs better than expected given average divergence time.

X-axis gives the estimated divergence time. Y-axis gives the accuracy (left) or segment error rate (right). Hollow black points show single-species models. Filled red points show multiple-species models. Blue square outlines show models where the test dataset was all species simultaneously. Red line shows the overall line of best fit (see S1 and S2 Figs in S1 File for individual species).

More »

Expand

Fig 6.

Model performance on Melozone fusca drops with estimated divergence, but also shows recovery when trained on multiple species.

X-axis gives the estimated divergence time. Y-axis gives the accuracy (left) or segment error rate (right). Hollow black points show single-species models. Filled red points show multiple-species models. Red line shows the overall line of best fit. Solid line shows a significant relationship, while dotted line is not significant (see S8 Table in S1 File).

More »

Expand

Table 2.

Performance of models trained in TweetyNet for Melozone fusca.

More »

Expand

Fig 7.

Example hypervolume diagram for one species, Passerina amoena.

Left: principal component analysis of SoundShape for 500 syllables (50 per 10 species); with PC1 (x-axis) and PC2 (y-axis). Passerina amoena syllables are shown with red points and a 2-dimensional convex polygon is given to display the 2-dimensional hypervolume. Hypervolume calculation was done 50 times with 50 syllables per species. Right: SoundShape diagram for point indicated in black box on left plot, representing syllable 5 from BLB 7710. X-axis shows the frequency of syllables (kHz). Y-axis shows time (seconds); note this is on a log scale. Z-axis shows amplitude; amplitude is also color-coded on the plot with warmer colors indicating louder amplitude. Inset: original spectrogram representation (see Fig 2).

More »

Expand

Table 3.

Species vary with respect to song complexity, song variation, and hypervolume diversity.

More »

Expand

Fig 8.

Performance of models on the same data vary between training species.

Spectrogram depicts recording of Zonotrichia leucophrys BLB 23876, with time (seconds) on the x-axis and frequency (kHz) on the y-axis. Top row: the seven manual annotations used as the “true” segmentation in red; note the third through fifth syllable are separated by very small gaps and that there are notes from other species unlabeled as background noise. The following rows from top to bottom are predictions output from models trained on all species (9 Species Balanced), the same species (Zonotrichia leucophrys), another species from the same family (Melozone fusca), another species from the same suborder (Cardinalis sinuatus), another species from the same order (Myiarchus tuberculifer), and finally a species from a completely different order (Calypte anna). Predictions are given in blue with “true” values superimposed in red.

More »

Expand