Deep inference of seabird dives from GPS-only records: Performance and generalization properties

At-sea behaviour of seabirds have received significant attention in ecology over the last decades as it is a key process in the ecology and fate of these populations. It is also, through the position of top predator that these species often occupy, a relevant and integrative indicator of the dynamics of the marine ecosystems they rely on. Seabird trajectories are recorded through the deployment of GPS, and a variety of statistical approaches have been tested to infer probable behaviours from these location data. Recently, deep learning tools have shown promising results for the segmentation and classification of animal behaviour from trajectory data. Yet, these approaches have not been widely used and investigation is still needed to identify optimal network architecture and to demonstrate their generalization properties. From a database of about 300 foraging trajectories derived from GPS data deployed simultaneously with pressure sensors for the identification of dives, this work has benchmarked deep neural network architectures trained in a supervised manner for the prediction of dives from trajectory data. It first confirms that deep learning allows better dive prediction than usual methods such as Hidden Markov Models. It also demonstrates the generalization properties of the trained networks for inferring dives distribution for seabirds from other colonies and ecosystems. In particular, convolutional networks trained on Peruvian boobies from a specific colony show great ability to predict dives of boobies from other colonies and from distinct ecosystems. We further investigate accross-species generalization using a transfer learning strategy known as ‘fine-tuning’. Starting from a convolutional network pre-trained on Guanay cormorant data reduced by two the size of the dataset needed to accurately predict dives in a tropical booby from Brazil. We believe that the networks trained in this study will provide relevant starting point for future fine-tuning works for seabird trajectory segmentation.

AR: As suggested we removed the sentence describing good indicators.
L. 32-34 RC: I certainly do not agree with the statement of these 2 sentences. At-sea predation do exist and bycatch as well; they are many papers showing how fishing vessels can be tracked by seabirds and also be negatively impacted (by-catch and easy food). In addition, since they are breeding they foraging is done to secure prey for their brood and this is clearly a constraint on their movements; for instance most penguins do long range movements for their chicks and short and local movements at-sea for their own needs. Please correct the parag accordingly.
AR: We agree that it was a shortcut from our side. For this reason we remove these sentences, and focused instead on the importance of dives' distribution in seabird ecology L. 44 L. 87 RC: The authors mentioned that they interpolated missing data. How many were missing? Testing the impact of interpolation on the final analyses is advised.
AR: We refer to missing data as 'gaps', and the amount of gaps is detailed on Table 3. We changed the description of this point for more clarity. We also added a sentence in the text for referring to Table L. 104. Concerning the linear interpolation, it may indeed have an impact on the quality of dives estimation, as for linearlyinterpolated gaps, turning angle is zero and step speed is constant. As suggested, we discussed the robustness of the benchmarked approaches with respect to the linearly-interpolated gaps. Especially, our results suggest that neural networks are more robust than HMMs in interpreting these gaps in the data L. 269.

L.93
RC: Although the splitting % is fine and quite standard for deep learning, it does not give much replicates, which is critical for these data hungry approaches. Seventy % of 234 foraging trips, is 163.8, 20% is 46.8 and 10% is 23.4. Since sample size limit is a function of the complexity of your model (and yours is certainly one), it would be appropriate to quantify the performance of your DL algorithm in response to the amount of data (many models with similar complexity require > 1000 of replicates and to me, the foraging trips are the sample size as the location points within them are not independent). Not only this will help show the readers that your approach is robust, but it will serve as a benchmark for others in the future who may have a different number of foraging bouts. Often in ecological studies, researchers do not have the means to equip that number of individuals so this can help having other using your approach in the future with their own (limited) dataset.
AR: This is a very interesting remark. Supervized learning techniques are known to require a lot of data, whereas ecological dataset are often relatively small. However, for seabird diving prediction the deep network we used does not require hundreds of trajectories to converge. We quantify the performance of deep learning algorithm in response to the number of trips in an additional figure Figure 7. We show that for training a deep network from scratch a training dataset of about 30 trajectories is enough (i.e. in our case 20k to 50k GPS positions). For this reason, the splitting that we used provides enough replicates in the training dataset. However, we could argue that the testing datasets might not capture all data variability when using only 10% of trips. Thus we changed the splitting and we used respectively 50%, 30% and 20% of the trips for training, validation and test datasets.

Discussion
RC: A dedicated parag on the ecological aspects of the datasets and the consequences and potential applications for other types of data is warranted.
AR: We added a paragaph at the beginning of the discussion where we discuss the different foraging strategies of the seabird that we studied L. 249. Besides the illustration of fine-tuning techniques to take advantage of pre-trained models we discuss the potential application of these deep networks for segmenting other seabird trajectories L. 316.
Editorial issues: RC: The numbering of the references is all wrong; they are not cited in order; e.g. you start by citing ref AR: This has been fixed.

Reviewer #2
RC: The input of UNet (Fig.3) is time-series of longitude, latitude, and coverage. However, the longitude and latitude are meaningless to detect diving events. To recognize events of moving objects, speed and bearing (angle) are usually used. I consider that when the authors simply use time-series of speed, bearing, and coverage as the input of UNet, the method can achieve good performance comparable to DME-UNet.
AR: As detailed in the first part of this letter, with further analysis we concluded that you are right. For this reason, we removed all results and discussion concerning this UNet, and we focus the analysis to models that take as input the time-series of speed, bearing, and coverage. Thanks a lot for this crucial comment.
RC: I'm also afraid that the contribution of DME is limited because, as shown in the right graph of Fig.3, the performances of DME-UNet and UNet are similar. Can you make this graph using the cormorant data?

1.
RC: It is good to investigate the contribution of DME deeply. As mentioned above, please use speed and bearing speed (radian per time unit) as additional inputs of UNet. Please also make a graph like Fig 3 using the other test data sets.

2.
RC: The authors try to detect diving events using only GPS data (without using water depth sensor and accelerometer). However, the motivation is not described in the introduction section.
AR: When external data is available, there is no need to use our approach. However, there is a substantial amount of tracking datasets that consist in GPS tracks only. The development of tools dedicated to animal trajectories