Tracking individual honeybees among wildflower clusters with computer vision-facilitated pollinator monitoring

Monitoring animals in their natural habitat is essential for advancement of animal behavioural studies, especially in pollination studies. Non-invasive techniques are preferred for these purposes as they reduce opportunities for research apparatus to interfere with behaviour. One potentially valuable approach is image-based tracking. However, the complexity of tracking unmarked wild animals using video is challenging in uncontrolled outdoor environments. Out-of-the-box algorithms currently present several problems in this context that can compromise accuracy, especially in cases of occlusion in a 3D environment. To address the issue, we present a novel hybrid detection and tracking algorithm to monitor unmarked insects outdoors. Our software can detect an insect, identify when a tracked insect becomes occluded from view and when it re-emerges, determine when an insect exits the camera field of view, and our software assembles a series of insect locations into a coherent trajectory. The insect detecting component of the software uses background subtraction and deep learning-based detection together to accurately and efficiently locate the insect among a cluster of wildflowers. We applied our method to track honeybees foraging outdoors using a new dataset that includes complex background detail, wind-blown foliage, and insects moving into and out of occlusion beneath leaves and among three-dimensional plant structures. We evaluated our software against human observations and previous techniques. It tracked honeybees at a rate of 86.6% on our dataset, 43% higher than the computationally more expensive, standalone deep learning model YOLOv2. We illustrate the value of our approach to quantify fine-scale foraging of honeybees. The ability to track unmarked insect pollinators in this way will help researchers better understand pollination ecology. The increased efficiency of our hybrid approach paves the way for the application of deep learning-based techniques to animal tracking in real-time using low-powered devices suitable for continuous monitoring.


Introduction 39
Studying animal behaviour helps address key questions in ecology and evolution, however, collecting 40 behavioural data is difficult [1]. While direct observation by ethologists is useful, this approach has low 41 sampling resolution [2] and create bias due to attentional limitations [3], which makes it difficult to 42 monitor fast moving animals such as insects [4]. Additionally, the accuracy of data may later be 43 questioned since visual records of incidents are not preserved [5]. Video recordings potentially help 44 overcome some methodological limitations by preserving observations. Unfortunately, manually 45 extracting animal behaviour from video remains time consuming, and error prone due to the attentional 46 limitations of human processing [3]. Recent advances in automated image-based tracking tackle these 47 problems by extracting and identifying animal behaviours and trajectories without human intervention 48 [5,6]. Whilst these techniques promise improved sampling of data, performance is still limited, in this 49 case by environmental and animal behavioural complexity, and computational resources.

50
One area in which accurate, fine-scale behavioural data is particularly valuable is the study of insect 51 pollination. Pollination is an integral requirement for horticulture and ecosystem management -insect 52 pollinators impact 35% of global agricultural land [7], supporting over 87 food crops [8]. However, due 53 to their small size and high speed whilst operating in cluttered 3D environments [4], insect pollinator 54 monitoring and tracking is challenging. Since pollination is an ongoing requirement of crops and 55 wildflowers alike, it would be ideal to establish field stations that can provide ongoing data on pollinator 56 behaviours. To be practical, such a solution would need to be cheap to assemble and install. They would 57 need to provide low cost, reliable and continuous monitoring of pollinator behaviour. These 58 requirements exclude many current approaches to insect tracking, but the challenge is suitable for 59 innovations involving imaging and AI.

165
The hybrid detection algorithm consists of a modular architecture allowing state-of-the-art deep 166 learning and background subtraction algorithm plug-ins to be incorporated as these tools advance.

167
Details of deep-learning and background subtraction algorithms we use appear below.

168
Deep learning-based detection 169 We use a convolutional neural network (CNN)-based YOLO (You Only Look Once) [49] object 170 detection algorithm to detect insects in a video frame because it is well supported and convenient.

171
Background subtraction-based detection 172 We Identifying occlusions 188 In the event that the focal insect is undetected, our algorithm analyses the variation in insect body area 189 before its disappearance to identify a possible occlusion. Background subtraction is used to measure 190 this change from the video. Variation of visible body area is modelled linearly using a least squares 191 approach (Equation 1) to determine whether the insect is likely to have been occluded by moving under Where m is the gradient of the linear polynomial fit, is the number of frames considered, is frame Identifying an insect exiting the field of view 209 To identify an insect's exit from view, we use Algorithm 1 to calculate an exit probability value when 210 it has been undetected for a threshold of consecutive frames. If is higher than a predefined threshold 211 value , the algorithm pauses tracking the focal insect, and begins to search for new insects to track. If 212 an insect is re-detected near the point of disappearance of the original focal insect before a new insect 213 appears, the algorithm resumes tracking it, assuming this to be the original focal insect (see Discussion

214
on Identity Swap management). Otherwise, the algorithm terminates and stores the previous track. Any For applications discussed above, our algorithm tracks one insect at a time from its first appearance 220 until its exit from view, before it is re-applied to track subsequent insects in footage. As a given frame 221 may contain multiple insects simultaneously foraging in a region, a "predict and detect" approach is 222 used to calculate the focal insect's track over successive frames. In a set of three successive frames, the 223 predicted insect position in the third is calculated from the detected positions in the first two frames, 224 assuming constant insect velocity over the three frames [35,52]. The predicted position of the insect 225 in frame of the video is defined as:

228
Where, In equation (2) and refer to coordinates of the predicted position of the insect in the frame 231 and [ -1 , -2 ] are the detected positions of the insect in the two previous frames. When an insect is first detected, the predicted position for the next frame is assumed to be the same as 233 its current position (as there are no preceding frames). In the case of occlusions or frames in which no 234 insect is detected, the predicted position is carried forward until the insect is re-detected.

235
In cases where multiple insects are detected within a single video frame using the hybrid algorithm, it 236 is necessary to assign the predicted position of the focal insect to an individual detection within the 237 frame. This is done using a process derived from the Hungarian method [53] which minimises the

255
In this section, we evaluate the performance of our method (HyDaT) on honeybees (Apis mellifera).

256
Honeybees are social insects that forage in wild, urban and agricultural environments. They are 257 widespread, generalist pollinators of extremely high value to the global economy and food production

258
[5], making honeybees particularly relevant organisms suited for testing our tracking.

259
We selected a patch of Scaevola (Scaevola hookeri) groundcover as the experimental site to evaluate in S1 Data.

292
Experiment 1: Detection rate and tracking time 293 We evaluated the detection rate and tracking time of HyDaT using a data set of seven video sequences 294 of honeybees foraging in Scaveola. These videos were randomly selected from continuous footage of 295 foraging honeybees. Each video was between 27 and 71 seconds long, totalling 6 minutes 11 seconds 296 of footage in all. HyDaT was tuned to track the path of a honeybee from its first appearance in the video 297 to its exit. All videos contained natural variation in background, lighting and bee movements. Fig. 6 298 provides an explicit representation of each video sequence's changeability. One or more honeybee 299 occlusions from the camera occurred in all of the videos.

304
Detection rate is our measure to evaluate the number of frames where the position of the insect is 305 accurately recorded with respect to human observations. For the purpose of the experiment, frames 306 where the honeybee is fully or partially hidden from the view were considered to be occlusions. If the 307 algorithm recorded the position of the honeybee in an area that was in fact covered by the body of the 308 bee, this was considered as a successful detection. The time taken by the algorithm to process the video 309 was recorded as the tracking time.

310
We also compared the detection rate and tracking time of HyDaT to the stand-alone deep learning-

321
HyDaT processed the seven videos totalling 6 minutes 11 seconds (22260 frames at 60 fps) of footage 322 in 3:39:16 hours, a reduction in tracking time of 52% over YOLOv2. This improvement in speed is 323 possible because 91% of detections by HyDaT were made with background subtraction which requires 324 much lower computational resources than purely deep learning based models. Ctrax, an existing animal 325 tracking package we used for comparison, was completely unable to differentiate the movement of the 326 honeybee from background movement. Its attempts to locate the honeybee were unusable and it would 327 be meaningless to attempt to compare its results in these instances. In addition, when the honeybee was occluded for an extended period, Ctrax assumed it had left the field of view and terminated its track.

329
Therefore, in these cases also it is meaningless to compare Ctrax's outputs with HyDaT. Example data analysis 367 To demonstrate the value of our approach for extracting meaningful data from bee tracks, we studied 368 the behaviour of honeybees foraging in a Scaevola (Scaevola hookeri) as already discussed, and also in 369 Lamb's-ear (Stachys byzantine) ground cover. We extracted spatiotemporal data of foraging insects and 370 analysed their changes in position, speed and directionality. We tested our setup on both Scaevola and

371
Lamb's-ear to assess the capability of our system to generalise, while simultaneously testing its ability 372 to extend to tracking in three-dimensional ground cover, within the limits imposed by the use of a single 373 camera.

374
We followed methods presented in Data collection for experiments section to collect study data. A

392
Our algorithm was able to extract honeybee movement data in both two-dimensional (Scaevola) and 393 three-dimensional (Lamb's-ear) ground covers. However, since our approach with a single camera is 394 primarily suited to two-dimensional plant structures, the occlusion detection algorithm was unable to 395 estimate the honeybee position in 36.5% of instances in the Lamb's-ear, compared to 8.8% of the 396 instances in Scaevola (Fig. 9c). We did not plot speed or turn-angle distributions for Lamb's-ear since 397 a single camera setup cannot accurately measure these attributes for three-dimensional motion, a 398 limitation we discuss below.

400
To address concerns about insect pollination in agriculture and ecosystem management, it is valuable 401 to track individual insects as they forage outdoors. In many cases, such a capacity to work in real world 402 scenarios necessarily requires handling data that includes movement of the background against which 403 the insects are being observed, and movement of insects through long occlusions. We tackle this 404 complexity using a novel approach that detects an insect in a complex dynamic scene, identifies when 405 it is occluded from view, identifies when it exits the view, and associates its sequence of recorded 406 positions with a trajectory. Our algorithm achieved higher detection rates in much less processing time 407 than existing techniques.

408
Although we illustrated our method's generalisability in two differently structured ground covers, there 409 remain several limitations associated with our method suited for further research. Our algorithm tracks 410 one insect in sequence and must be restarted to track subsequent insects within a video. Future work 411 could address this by considering models of multi-element attention [59], however this is unnecessary 412 for the applications for which the software is currently being applied and was out of our scope.

413
Regarding species other than honeybees; although we trained and tested our algorithm with honeybees 414 as this is our research focus in the current study, tracking other species is feasible after retraining the 415 YOLOv2 model and adjusting parameters for the area an insect occupies in the video frame and the 416 , maximum detection threshold. Another potential subject for future study relates to identity 417 swaps during occlusions, in which a single track is generated by two insects. This is likely to be a