Table 1.
Breakdown of how the data was aggregated to get the different temporal aggregation levels.
Table 2.
Summary of BirdNET performance (F1 score) for each parameter, averaged across all combinations of the other parameters and aggregation levels. Values represent the range and mean ± standard deviation of observed F1 scores.
Table 3.
Best-performing BirdNET parameter settings resulting in the highest F1 scores within each temporal aggregation level, based on 1,944 model configurations compared to expert identifications per temporal aggregation level (7,776 comparisons total).
Fig 1.
F1 scores measuring BirdNET accuracy compared to expert identification when including or not including week of the year, tested across four temporal aggregation levels with various parameter settings.
Fig 2.
Predicted F1 score across temporal aggregation levels and BirdNET parameters.
Predicted F1 score as a function of temporal aggregation level, overlap, sensitivity, and minimum confidence from a linear mixed-effects model based on 1,944 BirdNET parameter configurations, each tested against expert identifications within four aggregation levels (7,776 comparisons total). Each panel shows variation in predicted F1 scores across aggregation level, minimum confidence, and overlap (left) or sensitivity (right).
Fig 3.
Accuracy of BirdNET species predictions based on manual checking.
The proportion of BirdNET detections from the dataset level that were manually checked by an expert and determined to be correct or incorrect. A maximum of 50 random detections per species were checked. If a species was detected less than 50 times, all detections were checked. White numbers are the number of detections checked (if less than 50, only that number were available). The red label denotes the species missed in our expert identification that BirdNET identified.
Table 4.
Recommended BirdNET parameter settings and optional validation/postprocessing based on recording characteristics and research objectives.