Fig 1.
The spectrogram of the song of the bird “lny64”, used as an example throughout this paper.
This image was made by superposing the spectra of our 2818 aligned songs. Our example detection points, , are shown as red lines, with example recognition regions of 30 ms × 1–8 kHz marked as rectangles.
Table 1.
For each bird, detectors were trained for the specified timepoints, using 1000 songs and 1000 non-song samples of the same length.
The test set consisted of the remaining songs and an equal number of non-songs.
Fig 2.
Each plot shows one network output unit’s responses to all 2818 presentations of lny64’s song shown in Fig 1.
We show only the syllables ,
, and
, and we do not show the non-response to presentation of non-song. The horizontal axis is time relative to the beginning of the aligned song, and the vertical axis is an index for the 2818 individual song presentations. The grey shading shows the audio amplitude of song Y at time T. Detection events on training songs are shown in cyan, with detections of unseen test songs in red. To provide an intuition of intra-song variability, songs have been stably sorted by the time of detection events; thus, each of the three detection graphs shows the songs in a different order.
Fig 3.
Accuracy variability over 100 different training runs for each of the test detection points.
Each dot shows the test-set accuracy for an independently trained detector. Because the horizontal positions have been randomised slightly so as not to occlude same-valued measurements, test syllable is also indicated by colour. The means are given in Table 2.
Table 2.
Mean values for the detection accuracies shown in Fig 3.
Fig 4.
Timing varies as the FFT frame interval changes.
Here we show results for the ideal detector and the LabVIEW and Swift+serial implementations, for the constructed δ-syllable and for trigger of lny64’s song. The lines show latency; error bars are standard deviation (jitter). Points have been shifted horizontally slightly for clarity; original positions are [0.5 1 1.5 2 4] ms.
Table 3.
Latency and jitter variability (95% confidence) for lny64’s six test syllables.
Fig 5.
Timing data for lny64’s 6 test syllables, for the ideal and the Swift+serial detectors, with an FFT frame rate of 1.5 ms.
Point centres show latency; error bars show jitter.
Table 4.
Latency and jitter for each of our detector implementations on the synthetic δ-syllable and on syllable from lny64.
Fig 6.
The different detectors for the constructed δ-syllable and for lny64’s song at .
Point centres show latency; error bars show jitter.
Fig 7.
Raw timing curves for all detectors measured during detection of lny64’s using 1.5-ms frames.
We extract the trigger events from each curve, from which we obtain the mean—latency—and standard deviation—jitter.