Figures
Abstract
Ecologists often use a hidden Markov model to decode a latent process, such as a sequence of an animal’s behaviours, from an observed biologging time series. Modern technological devices such as video recorders and drones now allow researchers to directly observe an animal’s behaviour. Using these observations as labels of the latent process can improve a hidden Markov model’s accuracy when decoding the latent process. However, many wild animals are observed infrequently. Including such rare labels often has a negligible influence on parameter estimates, which in turn does not meaningfully improve the accuracy of the decoded latent process. We introduce a weighted likelihood approach that increases the relative influence of labelled observations. We use this approach to develop hidden Markov models to decode the foraging behaviour of killer whales (Orcinus orca) off the coast of British Columbia, Canada. Using cross-validated evaluation metrics and a detailed simulation study, we show that our weighted likelihood approach produces more accurate and understandable decoded latent processes compared to existing hidden Markov models and single-frame machine learning methods. Thus, our method effectively leverages sparse labels to enhance researchers’ ability to accurately decode hidden processes across various fields.
Citation: Sidrow E, Heckman N, McRae TM, Volpov BL, Trites AW, Fortune SM, et al. (2025) Incorporating sparse labels into hidden Markov models using weighted likelihoods improves accuracy and interpretability in biologging studies. PLoS One 20(6): e0325321. https://doi.org/10.1371/journal.pone.0325321
Editor: Vitor Hugo Rodrigues Paiva, MARE – Marine and Environmental Sciences Centre, PORTUGAL
Received: October 17, 2024; Accepted: May 9, 2025; Published: June 18, 2025
Copyright: © 2025 Sidrow et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All code and data are available at https://github.com/evsi8432/PHMM/.
Funding: ES thanks the University of British Columbia and the Four-Year Doctoral Fellowship program for its support. MAM thanks the BC Knowledge Development Fund and the Canada Foundation for Innovation’s John R. Evans Leaders Fund under grant 37715. MAM acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), Discovery grant RGPIN-2017-03867. MAM also thanks the Canadian Research Chairs program for Statistical Ecology. NH acknowledges the support of the Natural Sciences and Engineering Research Council of Canada (NSERC), Discovery grant RGPIN-2020-04629. We thank the Canadian Statistical Sciences Institute (CANSSI) for its support. We also acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) as well as the support of Fisheries and Oceans Canada (DFO). This project was supported in part by a financial contribution from the DFO and NSERC (Whale Science for Tomorrow). These sponsors did not play any role in the study design, data collection and analysis, decision to publish, or preparation of this manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The hidden Markov model, or HMM, is a common statistical model that is increasingly being used to understand the movements and behaviours of animals [1–3]. An HMM is a generalization of a mixture model that is used to decode a latent process of interest (e.g., a sequence of animal behaviours) from an observed time series (e.g., biologging data from tags attached to the animal). They have been used to uncover a wide variety of animal behaviours, including foraging activity [4, 5] and habitat selection [6].
Many ecological studies employ unsupervised HMMs, meaning that the true behaviours of the study animals are never directly observed and instead are predicted entirely from biologging data [7–10]. However, ecologists are often interested in predicting complicated animal behaviours (e.g., successful prey captures) that are difficult to identify from movement data alone [11]. For these behaviours, the relationship between an animal’s behaviour and its movement is so complex that it is rarely fully characterized by a statistical model.
Foraging behaviour can be especially rare and difficult to identify, but it is often of prime interest in ecology [12, 13]. For example, understanding foraging behaviour is vital for the conservation of northern and southern resident killer whales (Orcinus orca) off the coast of British Columbia [4, 14, 15]. Although both sub-populations have dietary and spatial overlap, northern residents (threatened) have a positive growth trajectory compared to southern residents (endangered) [15, 16]. Studies have shown that various factors contribute to these population trends, including prey availability, pollutants, and vessel disturbances, but the exact causal mechanisms are not fully understood [4, 14, 17]. Each of these factors affects foraging ecology differently, so understanding how often and how successfully these sub-populations hunt may help explain differences in their population trajectories [15, 18].
One solution to better identify animal behaviour is to fully observe and incorporate the animal’s behaviours into the underlying model, in which case the HMM is fully supervised. Krogh et al. [19] showed that fully supervised HMMs can exhibit better predictive performance than unsupervised HMMs for gene finding, and fully supervised HMMs are used in fields ranging from speech recognition to medicine [20, 21]. To our knowledge fully supervised HMMs are rare in ecology, but some animal behaviour studies use other fully supervised machine learning techniques [22, 23]. However, these studies often focus on captive animals that are much easier to continuously observe compared to wild animals.
While fully observing an animal’s behaviour in the wild can be prohibitively difficult or expensive, many ecological studies have behavioural information for a small subset of time. Occasional observations of an animal’s behaviour can be incorporated into a semi-supervised HMM, and some notable ecological studies have used semi-supervised HMMs. For example, McClintock et al. [24] labelled a subset of hidden behavioural states of a grey seal (Halichoerus grypus) using its proximity to known “haul-out" and foraging sites. Alternatively, Pirotta et al. [9] assumed that northern fulmars (Fulmarus glacialis) begin every journey in some known behavioural state. Other studies that used semi-supervised HMMs include McRae et al. [25], who used drone footage to directly label the behaviour of killer whales for a subset of observation times, and Saldanha et al. [13], who used a multi-sensor approach to derive behavioural labels for red-billed tropicbirds (Phaethon aethereus). All of these studies demonstrate that incorporating partial labels into an ecological HMM can significantly improve its performance.
While behavioural labels can improve an HMM’s prediction accuracy, many ecological studies only have access to labels for a small proportion of observations (e.g., <10%) [13, 25]. In these cases, the labelled data often do not meaningfully affect the parameter estimates of an HMM because the likelihood is dominated by the unlabelled data [26, 27]. A study by Ji et al. [28] used a weighted likelihood approach to increase the influence of labelled examples, but it assumed that labels correspond to independent time series. This approach does not apply when labels occur within a time series, which is often the case for ecological studies [13, 25].
We introduce a novel weighted semi-supervised learning approach for hidden Markov models that allows practitioners to adjust the influence of sparse labels within a time series. We first review the definition of an HMM and current semi-supervised learning techniques for mixture models that lack time dependence. We then formalize a partially hidden Markov model, or PHMM, which is designed to account for time series that are partially labelled, before introducing a weighted likelihood approach to balance the influence of labelled and unlabelled data within the model. Next, we present two case studies that use labels derived from video data and our weighted likelihood approach to achieve higher cross-validated accuracy compared to traditional HMMs and single-frame machine learning methods. Finally, we conduct a simulation study to investigate how different data regimes affect PHMM performance and guide practitioners when deciding how heavily to weight the likelihood of a PHMM.
Background
Hidden Markov models
Hidden Markov models describe time series that exhibit state-switching behaviour. They model an observed time series of length T, , by assuming that each observation Yt is generated from an unobserved hidden state
. The hidden states Xt are discrete and the observations Yt are usually (but not always) continuous. The sequence of all hidden states
is modelled as a Markov chain. The unconditional distribution of X1 is denoted by the row-vector
where . Further, the distribution of Xt given Xt−1 for
is denoted by the N-by-N transition probability matrix
where . For simplicity, we assume that
does not change over time (i.e.
for all t) unless stated otherwise.
Each observation Yt is a random variable, where Yt given all other observations and hidden states
depends only on Xt. If Xt = i, then the conditional density or probability mass function of Yt is
, where
are the parameters describing the state-dependent distribution of Yt. The collection of all state-dependent parameters is
. The probability density of
, following an HMM with initial distribution
, transition matrix
, and state-dependent parameters
, evaluated at
, is
where is an N-dimensional row vector of ones and
is an
diagonal matrix with entry (i,i) equal to
. Parameter estimation for HMMs often involves maximizing Eq 3 with respect to
,
, and
. Fig 1 shows an HMM as a graphical model. For a more complete introduction to HMMs, see Zucchini et al. [29].
Xt corresponds to an unobserved latent state at time t whose distribution is described by a Markov chain. Yt corresponds to an observation at time t, where Yt given all other observations and hidden states
depends only on Xt.
Semi-supervised mixture models
Semi-supervised learning is a paradigm in machine learning that harnesses both labelled and unlabelled data to enhance model performance [26]. There is a large taxonomy of semi-supervised learning techniques, but here we focus on generative mixture models because an HMM is a generalization of a mixture model that includes serial dependence between its hidden states [30, 31]. Unfortunately, many semi-supervised learning techniques for mixture models do not account for the time dependence of HMMs. As such, we build off of current approaches for mixture models and develop a novel semi-supervised learning technique for HMMs.
A mixture model is a simpler version of a hidden Markov model where the hidden states are modelled as independent categorical random variables instead of a Markov chain. The distribution of Xt is denoted by the row-vector
where . A sequence of observations
then has the probability density function
Now, suppose that a subset of time indices have corresponding labels
. Labels are often observed at random times (e.g., aerial drones observe whale behaviours at random times), but we assume that
is fixed, as is common for many semi-supervised learning techniques [26, 32]. Like Yt, each label Zt is a random variable generated from its corresponding hidden state Xt. The state space of Zt is general, but for simplicity we assume that
. Given all other labels (
{Zt}), observations (
), and hidden states (
), we assume that Zt depends only on Xt for each
. If Xt = i, then the label Zt has probability mass function
, with parameters
. Denote a fixed realization of labels
as
. Then, the joint probability density of
and
for semi-supervised mixture models is
where . To write Eq 5 in a simpler form, we define
for all unlabelled observations (i.e., for all
) and set
. This abuse of notation results in a relatively simple probability density function for semi-supervised mixture models:
Eq 6 can be used to construct a likelihood and maximized with respect to ,
, and
to perform semi-supervised inference on mixture models [26]. In some scenarios, subject matter experts can identify the labels
with certainty. In this case,
for all
, the parameters
do not need to be inferred, and
takes the form
This formulation of implies that, for all labelled observations, if the hidden state Xt is equal to i, then the corresponding label Zt also equals i with probability 1. We define
as in Eq 7 in our case studies. However, if subject matter experts are not confident in their labels, or if a hidden state Xt could generate one of multiple labels Zt, we recommend parameterizing
and inferring the parameters
.
Weighted likelihood for semi-supervised mixture models
One issue in semi-supervised learning occurs when the number of observations T is much larger than the number of labels . In this case, the labelled data do not meaningfully affect maximum likelihood parameter estimates [26]. As a solution, Chapelle et al. [26] introduce a parameter
which represents the relative weight given to unlabelled observations. In particular, they define a weighted likelihood
based on Eq 6 with weights
as follows:
Using this formulation, setting removes all unlabelled data, setting
removes all labelled data, and setting
returns a likelihood that corresponds to the joint density from Eq 6. It is unlikely that a practitioner would prefer setting
, as this weights unlabelled observations more heavily than labelled observations. In practice, researchers often select
by performing cross validation with an appropriate model evaluation metric [26].
The weighted likelihood is a specific instance of a much more general class of relevance-weighted likelihoods that has been studied extensively. Hu et al. [32] provide a comprehensive review of weighted likelihoods. Under their paradigm, the probability density of the labelled data
is given by Eq 5, but the probability density of the unlabelled data
is some unknown density that “resembles" the density of the labelled data in some sense. In particular, Hu et al. [32] formally define the notion of ‘resemblance’ using Boltzman’s entropy, and the weight
corresponds to how much the density of the unlabelled data resembles the density of the labelled data under this definition. Hu et al. [33] prove the consistency and asymptotic normality of maximum weighted likelihood estimators under certain regularity conditions. The relevance-weighted likelihood literature thus gives useful theoretical guarantees related to the weighted likelihood for mixture models. Unfortunately, these guarantees usually assume that the observations
are independent, which is not true for HMMs.
Weighted likelihood for semi-supervised learning in hidden Markov models
Our weighted likelihood approach for semi-supervised learning in HMMs begins by writing down the probability density associated with a partially observed HMM. We use the same notation as described above, namely random labels , where Zt is generated from hidden state Xt and, conditioned on
,
, and
{Zt}, Zt depends only on Xt. As before, a fixed realization of
is denoted as
, and we abuse notation by setting
for all unlabelled observations (i.e.,
) and
. The joint density of the observations
and labels
for an HMM is thus
where is an
diagonal matrix where entry (i,i) is
. We refer to this model as a partially hidden Markov model, or PHMM.
Incorporating partial labels in an HMM to define a PHMM is relatively straightforward, but defining a weighted likelihood for PHMMs is more complicated. Recall that each term in Eq 9 is a scalar value raised to the power of some weight. However, each term in Eq 10 is a matrix, so it is not straightforward to raise each term to the power of a (possibly fractional) weight. While it is possible to calculate fractional powers of matrices, doing so can be computationally expensive and the result can be difficult to interpret [34]. Alternatively, Hu et al. [35] derive a relevance-weighted likelihood for dependent data using the same paradigm as Hu et al. [32]. Although their method is broadly applicable, it does not apply to HMMs. Namely, they adopt a paradigm where each observation Yt has a corresponding set of parameters , and they assume that Yt depends only on its corresponding parameters and the previous observations
. However, this assumption is violated for an HMM because Yt depends on Xt, which in turn depends upon the previous parameters
. See Eq 3 of Hu et al. [35] for more details.
A weighted likelihood for PHMMs should have three desired properties. First, the weighted likelihood should reduce to Eq 10 for some “natural" weight, just as does for the weighted likelihood in Eq 9. Second, some weight should correspond to ignoring all unlabelled data, just as
does for the weighted likelihood in Eq 9. These two properties allow practitioners to intuitively select a weight that balances a natural weighting scheme with one that completely ignores all unlabelled data. Finally, the weighted likelihood should be relatively simple and intuitive compared to the standard likelihood from Eq 10. We thus propose the weighting parameter
and the following weighted likelihood for partially hidden Markov models:
This formulation satisfies the three desired properties listed above. First, if , then the term corresponding to an unlabelled observation t is
. Therefore, the likelihood of a PHMM with
is identical to the likelihood of an HMM that treats all unlabelled observations as totally missing [29]. Next, if
, then the term corresponding to an unlabelled observation t is
. In this case, the weighted PHMM likelihood in Eq 12 corresponds to the standard PHMM density in Eq 10. Finally, we argue that this formulation is intuitive, as it weights unlabelled observations using some power of
and leaves labelled observations unaltered. Fig 2 shows graphical representations of PHMMs for several values of
with only Z1, Zt−1, and Zt + 1 observed for some fixed t.
(a) PHMM with that gives equal weight to all observations in the likelihood function. (b) PHMM with
that down-weights unlabelled observations without ignoring them altogether. (c) PHMM with
that ignores all observations that do not have associated label information. The colour (white, light grey, and dark grey) indicates how much a given observation affects the weighted likelihood. White corresponds to treating the random variable as unobserved, dark grey corresponds to treating the variable as fully observed, and light grey corresponds to treating the random variable as observed, but weighting it in the likelihood. The latent states are denoted as
, the observations are denoted as
, and labels are denoted as
. In this example, Z1, Zt−1, and Zt + 1 are observed, while all other labels are unobserved (i.e.,
for
).
Thus far we have examined the likelihood of a single time series, but in many practical applications multiple independent time series are observed from the same process (e.g., multiple killer whale tag deployments). In this case, we denote the total number of time series by S and index them with s. Then,
and
make up time series s with length Ts. The total likelihood for the model with weight
and parameters
and
is thus
, where
is defined in Eq 12.
Cross validation for selecting 
To determine the optimal value of , we recommend the following cross validation procedure. First, define a set of candidate values for
. At a minimum, we suggest evaluating
, as the choice
approximately balances the contributions of labelled and unlabelled observations in the likelihood function. Although it is technically possible to consider values of
, we advise against this, as it assigns greater weight to unlabelled observations relative to labelled ones. Another complication occurs if there exists some hidden state
with no corresponding labels (i.e.,
for all
). In this case we advise against setting
because no labelled data can be used to estimate the state-dependent parameters
, so the PHMM is unidentifiable.
Next, partition the time series data set into multiple folds for cross validation. If possible, the folds should be the time series themselves to ensure independence between training and test sets. For each fold, train the PHMM using the entire data set minus the selected fold. Apply the forward-backward algorithm to the held-out fold (excluding its associated labels) to estimate the hidden state probabilities
. Repeat this procedure across all folds, yielding cross-validated hidden state probability estimates for the entire data set. The estimated probabilities, together with the true labels, can then be used with standard model validation techniques to assess PHMM performance and determine the most appropriate choice of
. Fig 3 displays a diagram of this cross validation process.
Dive profiles from two killer whales are divided into four subprofiles, treated as independent time series, and used as folds in cross validation. Each fold is held out, a PHMM is trained on the remaining data set, and the forward/backward algorithm is run on the held out data using the fitted PHMM. The labels of the held out time series are ignored when running the forward-backward algorithm. The estimated labels of the held-out time series are shown in the far right column and used to calculate accuracy measures such as sensitivity, specificity, area under the ROC curve, etc.
Due to its computational complexity, this procedure can be computationally expensive, particularly for large data sets. However, computational efficiency can be improved by training PHMMs on different folds in parallel. In our case studies, we leveraged the Cedar Compute Canada cluster to execute cross validation in parallel, significantly reducing computation time.
Case studies
We conducted two case studies that use PHMMs to model the behaviour and foraging success of 11 resident killer whales (nine northern, two southern) off the coast of British Columbia, Canada. These case studies were primarily intended to test the predictive performance of the PHMM and demonstrate the process of applying it to ecological data. However, we also performed these case studies for their ecological significance. As mentioned earlier, understanding killer whale foraging and successful prey capture has been a research focus for years, as differences in foraging success may explain why the southern resident killer whale population is doing poorly compared to the northern residents [15, 18]. Thus, the results from these case studies can help ecologists correctly predict foraging behaviours that are meaningful for conservation.
The data for these case studies were collected with written approval under the University of British Columbia Animal Care Permit no. A19-0053, Fisheries and Oceans Canada Marine Mammal Scientific License for Whale Research no. XMMS 6 2020, and United States Department of Commerce, NOAA, National Marine Fisheries Service Permit No. 23220. Collection took place in August and September 2020 off the coast of British Columbia in Queen Charlotte Sound, Queen Charlotte Strait, Johnstone Strait, and Juan de Fuca Strait. All resident killer whales were tagged with suction-cup attached Customized Animal Tracking Solutions (www.cats.is) as described by McRae et al. [25]. These tags were deployed using an adjustable 6–8 metre carbon fibre pole and detached using galvanic releases. Post-deployment, the instruments were retrieved utilizing a combination of a Wildlife Computers 363C SPOT tag (providing Argos satellite positions), a goniometer, an ultra high frequency receiver, and a yagi antenna. The tags were equipped with an array of instruments, including 3D kinematic sensors (accelerometer, magnetometer, gyroscope), a time-depth recorder, a 96 kHz HTI hydrophone, and a camera. The time-depth recorder (TDR) and inertial sensors were set to sample at a frequency of 50 Hz. Depth readings were calibrated using a MATLAB package developed by Cade et al. [36], which allowed for the extraction of heading, pitch, and roll, as well as three-dimensional dynamic acceleration within the reference frame of the killer whale.
Using these data, we developed two PHMMs to identify killer whale foraging behaviours at different scales. The first PHMM estimated killer whale dive types using individual dives as observations, with some dives labelled as resting, travelling, or foraging using aerial drone recordings of the tagged whales. The second PHMM estimated prey capture events using high-frequency biologging data as observations, with some observations labelled using underwater video and audio recordings of prey capture. For both case studies, we modelled each killer whale as independent, but with shared parameters for their associated PHMMs.
Case study 1: behavioural classification of killer whale dives
For our first case study, we assigned a latent behaviour to every killer whale dive. To this end, we modelled the data from each killer whale as a sequence of dives and modelled each sequence with a PHMM. The hidden Markov chain was a sequence of dive types and the observations were summary statistics of each dive. We used three well-known killer whale behaviours as possible dive types: resting, travelling, and foraging [37, 38]. We also used aerial and underwater recordings to label a small subset of dives with one of these dive types [25].
Data processing.
We defined a dive as any period in which the killer whale was below a depth of 0.5 metres for at least 30 seconds, which includes only biologically meaningful dives and excludes surface behaviours. In line with previous studies [7, 25], we summarized each dive with its maximum depth and total duration. Formally, the observation associated with dive profile s and dive t was denoted as , where ms,t corresponds to maximum depth in metres and ds,t corresponds to dive duration in seconds.
Using drone videos, we visually identified three diving behaviours: resting, travelling, and foraging (classification criteria are given in McRae et al. [25]). Formally, the label associated with whale s and dive t was denoted as , where each value of zs,t corresponds to either no label (if
), resting (if zs,t = 1), travelling (if zs,t = 2), or foraging (if zs,t = 3). Scatter plots of the dive duration and dive depth of all 11 whales are shown in Appendix S2.
There were a total of 11 killer whales, but the distribution of labels was uneven between individuals (e.g., killer whale A113 had 71 labelled travelling dives, while all other whales combined had 12). To spread the labels between dive profiles more evenly, we randomly divided each killer whale dive profile into two ‘subprofiles’ so that the two subprofiles had an equal number of labels. Although the two subprofiles are dependent in time, we treated them as independent for computational simplicity. See Fig 3 for an example of splitting two profiles into four subprofiles for cross validation. This process resulted in S = 22 killer whale subprofiles, a total of T = 2169 dives, and labels of dive types.
Model formulation.
We used a PHMM with N = 3 dive types to match the drone-identified labels of resting, travelling, and foraging dives. For whale s and dive t, we denoted the hidden dive type as . Histograms and scatter plots revealed that dive duration and maximum depth looked to be distributed approximately as mixtures of log-normal distributions, and the two features were also highly correlated (see Appendix S2 for scatter plot). Therefore, we set the joint, state-dependent distribution of the observations to be a bivariate log-normal distribution.
Recall that we used thresholds of and
to define biologically relevant dives. However, we modelled these observations using a two-dimensional log-normal distribution whose sample space is
, so our model is misspecified. We could use a truncated multivariate log-normal distribution instead, but many HMM software packages do not incorporate truncated log-normal distributions by default [39, 40], and several ecological studies make this modelling choice as well [7, 11, 41]. We therefore use a non-truncated log-normal distribution for simplicity and reproducibility. Labels were identified with high confidence, so we defined
according to Eq 7.
Model evaluation.
We fit five different candidate PHMMs corresponding to five different values of . We tested
, which ignores all unlabelled data;
, which gives equal weight to all observations; and
−
, which approximately balances the contribution of labelled and unlabelled observations. For completeness, we also tested
, which averages
and
; and
, which averages
and
. All PHMMs, including those within the cross validation procedure, were fit using 10 random restarts and a custom version of the momentuHMM package in R [39, 42].
We used the 22 subprofiles as folds in cross validation to estimate the probability of each dive’s type conditioned on the observations (i.e., for
;
; and
). See Fig 3 for a diagram of this cross validation procedure. Next, we calculated the sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) associated with each dive type. We calculated sensitivity for dive type i as the average estimated probability that a dive has type i, averaged across all dives that were confirmed via drone to have type i. Likewise, we calculated specificity for dive type i as the average estimated probability that a dive does not have type i, averaged across all dives that were confirmed via drone to not have type i. AUC balances sensitivity and specificity and takes values between 0 and 1, where higher is better [43]. We also ran the Viterbi algorithm on each fold within the cross validation procedure and plotted the resulting dive profiles as a visual model evaluation tool [44].
As a baseline, we implemented three machine learning methods to predict dive types: multinomial logistic regression using the nnet package in R (MLR), random forests using the randomForest package in R (RF), and support vector machines with a radial kernel using the e1071 package in R (SVM) [45, 46]. Each model was trained using default package settings, and performance was evaluated using the cross validation procedure described earlier to estimate sensitivity, specificity, and AUC. These fully-supervised, single-frame methods used only labelled data and did not account for temporal structure. While we report the accuracy of these baselines as a comparison, PHMMs offer several key advantages. Most notably, PHMMs model both the hidden states and the underlying generative temporal process, unlike the baseline methods which focus solely on prediction.
Results.
Except for when , the PHMMs are either better than or comparable to the single frame methods across all criteria and behaviours (see Fig 4). The random forest and multinomial logistic regression models had worse AUC values for all behaviours compared to all PHMMs. The SVM model’s AUCs for resting (0.89) and travelling (0.91) were comparable to the PHMM with
(0.87 and 0.93, respectively), but the SVM model’s AUC for foraging (0.38) was much worse than all of the PHMMs (>0.9 for all
). This result is especially striking because foraging is a behaviour of particular ecological significance.
True values are determined via the drone-detected dive types. All models
The PHMMs with strictly between 0 and 1 tended to obtain better results for cross-validated sensitivity, specificity, and AUC, demonstrating the effectiveness of the weighted likelihood approach (Fig 4). Compared to the PHMMs with
and
, the PHMM with
had the best sensitivity for foraging dives and travelling dives, and it had the second-best sensitivity for resting dives. It had the best specificity for resting dives and travelling dives, and its specificity for foraging dives was comparable to the other models. In addition, the PHMM with
had an AUC for resting that was comparable or better than the other PHMMs (0.866), but it had the best AUC for foraging (0.955) and the best AUC for travelling (0.926). Results for the PHMMs with
and
are similar to the PHMM with
(see Appendix S1).
The PHMMs with were also more biologically interpretable compared to those with
and
. Namely, for the PHMMs with
, the time series of decoded dive types contained long sequences of a single dive type (Fig 5). This behaviour matches prior studies that model killer whale behaviours as lasting from tens of minutes to hours [25]. In comparison, the PHMM with
resulted in a dive profile that switched relatively frequently between foraging and resting dives (e.g., Fig 5D). The PHMM with
produced better results than the PHMM with
, but it still estimated some rapidly switching dive types (e.g., Fig 5B). Appendix S2 also displays scatter plots of maximum depth and dive duration data from all whales, coloured by cross-validated, Viterbi-decoded dive type.
Each PHMM was fit to the data set with the subprofile held out. Then the Viterbi algorithm was used on the held-out data set with its labels removed in order to test the predictive performance of each PHMM.
The PHMM that completely ignored unlabelled dives () obtained better results than the PHMM that fully included the unlabelled dives (
). While one may expect that more data should improve accuracy, including unlabelled data is known to degrade the performance of a classifier in certain situations [48]. It is thus natural that the optimal approach for this case study neither fully included nor totally ignored the labelled dive types.
Finally, we used the PHMM with to estimate how often the northern and southern resident killer whales engaged in foraging behaviour. In particular, we fit the PHMM to the entire data set, including all labels, and then ran the forward-backward algorithm on all subprofiles and dives to calculate
for
and
. We then labelled dive t of whale s as foraging if its decoded probability of foraging was above 50%. Using this procedure, we estimated that southern resident killer whales foraged for 5.47 hours, or 32.0% of the time, and that northern residents foraged for 17.90 hours, or 26.8% of the time. This sample size was very small (2 southern residents and 9 northern residents), and the sample of southern residents in particular was made up entirely of adult males killer whales, which are expected to forage more than other age-sex groups [15]. Nonetheless, our results are consistent with Tennessen et al. [15], who found that southern residents spent less time travelling and resting compared to northern resident killer whales.
Case study 2: identification of killer whale foraging
The second case study focused on identifying successful foraging events within the killer whale dives. Therefore, we divided all dives deeper than 30 metres into sequences of two-second windows and modelled each sequence with a PHMM. The hidden Markov chain was a sequence of unobserved two-second subdive states, and the observations were summary statistics calculated from each two-second window. We defined six possible subdive state behaviours: descent, bottom, prey chase, prey capture, ascent with a fish, and ascent without a fish.
Data processing.
We defined a killer whale dive identically to the previous case study, but only included dives deeper than 30 metres because Wright et al. [37] found that killer whale prey captures usually occur below that depth. Adult Chinook have also been found throughout the water column to depths exceeding 100 metres, particularly in areas where the risk of predation appears high [49, 50]. Dives shallower than 30 metres likely follow a substantially different distribution than foraging dives deeper than 30 metres, and we solely focus on modelling the deeper foraging dives in this case study.
We divided each dive into two-second windows and calculated summary statistics that were identified by Tennessen et al. [51] to be indicative of foraging behaviour: change in depth (ds,t for dive s and window t), heading total variation (hs,t), and jerk peak (js,t). Change in depth was defined as the last depth reading of the window minus the first depth reading of the window in metres. Heading total variation was determined by calculating the difference between heading readings every 1/50 of a second (in radians), taking the absolute value of that difference, and summing up all of the differences over the course of the two-second window. Jerk peak was calculated by taking the difference between acceleration vectors every 1/50 of a second (in metres per second squared), taking the magnitude of that difference, and then calculating the maximum over each two-second interval. To adjust for variation between dives and tags, we also divided the jerk peak of each two-second window by the median jerk peak for the bottom 70% of its corresponding dive [51]. In summary, the observation associated with dive s and window t was denoted as . This process resulted in a total of S = 130 dives and T = 15821 two-second windows.
To label windows associated with prey capture, we used a process inspired by previous work on killer whale foraging [37, 51]. First, crunching sounds associated with prey handling were identified from the hydrophone using the behavioural analysis software BORIS [52]. Crunching sounds associated with foraging were corroborated using video evidence of prey handling as well as audio observations of echolocation clicks. Next, Wright et al. [37] found that killer whales catch prey immediately before ascending, so we ignored crunches that occurred more than 30 seconds before a dive’s ‘ascent’ phase as defined by Tennessen et al. [51]. Namely, the ‘ascent’ phase began the moment after the killer whale achieved a depth at least 70% of its maximum dive depth. If the first non-ignored crunch occurred before ‘ascent’, we labelled its window as ‘prey capture’. If the first non-ignored crunch was heard during ‘ascent’, then the exact moment of prey capture was ambiguous, so we labelled the final window of the dive as ‘ascent with a fish’.
We were also able to obtain some window labels not associated with prey capture. First, if the video recorder was on for the entire dive, but there was no audible crunch or visual indication of foraging (e.g., scales), then we labelled the final window of the dive as ‘ascent without a fish’. Second, we labelled the first window of every dive as ‘descent’. Formally, the label associated with window t of dive s was denoted as , where each possible value of zs,t corresponds to no label (
), descent (zs,t = 1), bottom (zs,t = 2), chase (zs,t = 3), capture (zs,t = 4), ascent without a fish (zs,t = 5), or ascent with a fish (zs,t = 6). In total, we labelled 130 windows as ‘descent, five windows as “capture", two windows as ‘ascent with a fish’, and 19 windows as ‘ascent without a fish’. This resulted in
labels, which make up less than
of the T = 15821 windows in total. Scatter plots showing the observed data for all two-second windows are shown in Appendix S2.
Model formulation.
As mentioned before, we defined a PHMM with N = 6 subdive states: descent, bottom, chase, capture, ascent without a fish, and ascent with a fish. We assumed that all dives shared an initial distribution and transition probability matrix
as shown in Eq 13.
For an intuitive visualization of how the Markov chain can evolve, see Fig 6. In short, this transition probability matrix reflects that marine mammal dives have distinct descent, bottom, and ascent phases [51], and that killer whales begin their ascent phase immediately after prey capture [37]. We also divided the bottom phase to include a low activity state (bottom) and a high activity state (chase), which is in line with results from Sidrow et al. [53].
Arrows correspond to non-zero entries in , where arrows point from row number to column number. Within a dive, a killer whale can proceed from descent, to bottom, to chase, to capture, to ascent with fish. It can also ascend without a fish at any time before capture, and it can switch between the bottom and chase states freely.
Given that Xs,t = i, we modelled change in depth with a normal distribution, heading total variation with a gamma distribution, and normalized jerk peak with a gamma distribution. To enforce that the killer whale does not ascend or descend on average during the bottom of its dive, we set the state-dependent distributions of change in depth for the bottom, chase, and capture states to have a mean of zero. Further, we wanted to ensure that the model distinguished foraging based primarily on the ‘prey capture’ subdive state rather than differences between the ‘ascent with a fish’ and ‘ascent without a fish’ states. As such, we set the two state-dependent distributions associated with ascent to be identical. Given the hidden state of a window (Xs,t), we also assumed that all summary statistics were independent. Subdive state labels were identified with high confidence, so we defined according to Eq 7.
Model evaluation.
We fit five different PHMMs corresponding to α∈{0.0001,0.001,0.01,0.1,1} using 10 random restarts and a custom version of the momentuHMM package in R [39, 42]. We did not include in this case study because we did not have labels associated with subdive states 2 and 3 (bottom and chase). Thus, if
, there were no observations that could be used to estimate the parameters of the state-dependent distribution parameters
and
, and the model would not be identifiable.
In addition to the PHMMs, we also implemented several single-frame baseline methods to identify prey capture using dive-level summary statistics. First, Tennessen et al. calculated three summary statistics for each dive: (1) the maximum of jerk during the bottom 70% of the dive, divided by the median jerk during the same period, (2) the absolute value of the killer whale’s roll at the moment of jerk peak, and (3) the circular variance of heading during the bottom 70% of the dive. Then, Tennessen et al. took the minimum value of every summary statistic over all confirmed prey capture dives to obtain thresholds for each of the three dive-level summary statistics. Finally, a dive was labelled as a successful foraging dive if every dive-level summary statistic surpassed that threshold. Using these three dive-level summary statistics, we also implemented Firth’s bias-reduced logistic regression [54, 55], random forests [46], and support vector machines using a radial kernel [47]. Appendix S2 shows scatter plots of normalized jerk peak, average bottom heading total variation, and roll at jerk peak for all dives deeper than 30 metres.
We evaluated each model based on how well it predicted successful foraging dives. Note that the probability that dive s is a successful foraging dive is equal to the probability that it ends in either the ‘capture’ or ‘ascent with a fish’ state. In other words, it is the probability that window Ts of dive s has subdive state 4 (capture) or subdive state 6 (ascent with a fish), i.e. . To estimate this value, we randomly and equally split the seven labelled successful foraging dives and 19 labelled dives without successful foraging into four folds and performed cross validation with the forward-backward algorithm. We calculated AUC values within each fold and reported their averages. Finally, we also fit each PHMM to the entire data set to visualize their state-dependent distributions.
Results.
The PHMM with had an average AUC of 0.90, the PHMM with
had an AUC of 0.96, and the PHMM with
had a perfect AUC of 1 across all four folds (Fig 7). Thus, using
(namely
here) improved predictive performance over
and
.
Horizontal jitters have been added because dots occasionally fall on top of one another. Averages are shown as stars and connected with a line for the PHMM approaches. The ‘Baseline’ approach refers to Tennessen et al. [51].
Our method outperformed the baseline of Tennessen et al. [51], which had an average AUC of 0.79. The PHMM with had the same AUC value as two single-frame methods (logistic regression and support vector machine), but the PHMM method has the additional benefit of labelling dive phases in addition to determining successful foraging dives. One possible reason that several methods achieve a perfect AUC score is that this data set contains only 7 positive examples of successful foraging. Thus, determining a granular estimate of accuracy is difficult.
The biological interpretation of each PHMM heavily depended on the weight . For example, the ‘bottom’ and ‘chase’ states looked very similar for the PHMMs with
, but the two states were better separated within the PHMM with
(Fig 8). However, the ‘bottom’ subdive state had higher mean jerk peak and heading total variation than the ‘chase’ subdive state for the PHMM with
. These results are the opposite of what ecologists expect biologically, indicating that a large separation between state-dependent distributions is not always more biologically interpretable. We conjecture that the ‘bottom’ and ‘chase’ subdive states are particularly unintuitive partially because there are no labels associated with either of these subdive states (i.e.
for any s or t). As a result, there is little information for the model to differentiate the two subdive states.
Observations include change in depth (top panels), heading total variation (middle panels), and normalized jerk peak (bottom panels) for PHMMs with (top left),
(top right), and
(bottom). Densities are coloured according to their corresponding subdive state. Parameters were estimated using the entire killer whale data set (i.e. no cross validation was performed). For a given observation Ys,t, all features were assumed to be independent after conditioning on the subdive state Xs,t. Note that the ‘ascent with a fish’ and ‘ascent without a fish’ states were assumed to have identical distributions, so both are listed simply as ‘ascent’.
Differences between each model’s distribution for the capture state appeared to impact predictive performance. For example, the PHMM with estimated a relatively high mean and low standard deviation for heading total variation (middle panel of Fig 8a). As such, it failed to identify a prey capture event in which heading total variance was highly variable (Fig 9). Alternatively, the PHMM with
estimated a relatively low mean for heading variation (middle panel of Fig 8c). As a result, it was often too sensitive and decoded some dives that lacked successful foraging as containing a prey capture state (Fig 10). The PHMM with
estimated a high mean and a high standard deviation for heading total variation (middle panel of Fig 8b). It correctly identified a wide variety of labelled foraging dives, but it was not overly sensitive and correctly identified low-activity dives as lacking successful foraging.
PHMMs with (top left),
(top right) and
(bottom) are shown. Each subplot displays change in depth (top panel), raw depth (second panel), heading total variation (third panel), normalized jerk peak (fourth panel), and probability of ‘capture’ (bottom panel). Observations are coloured according to the most-likely sequence of hidden states as determined by the Viterbi algorithm. The estimated probability of successful foraging,
, is 0.069 for
, 1 for
, and 1 for
.
PHMMs with (top left),
(top right) and
(bottom) are shown. Each subplot displays change in depth (top panel), raw depth (second panel), heading total variation (third panel), normalized jerk peak (fourth panel), and probability of ‘capture’ (bottom panel). Observations are coloured according to the most-likely sequence of hidden states as determined by the Viterbi algorithm. The estimated probability of successful foraging,
, is 1 for
, 0.414 for
, and 1 for
.
Finally, we used the PHMM with to estimate the total number of successful foraging dives from southern and northern resident killer whales in this data set. In particular, we fit the full model to the entire data set, including labels, and then ran the forward-backward algorithm on all unlabelled dives to estimate the probability that each unlabelled dive was a successful foraging dive,
for
. Then, we labelled dive s as a successful foraging dive if the estimate of this probability was above 50%. This process resulted in 6 estimated successful foraging dives from southern resident killer whales and 37 estimated successful foraging dives from northern resident killer whales. After combining these results with those from the first case study, we found that southern resident killer whales caught an average of 1.03 fish per hour of foraging effort, while northern resident killer whales caught an average of 2.00 fish per hour of foraging effort. These results support the finding that northern resident killer whales have more foraging success compared to southern residents [15]. However, our sample size is small (2 southern resident killer whales and 9 northern resident killer whales), and the tag attachments were relatively short and thus provided only a snapshot in time. For example, some killer whales were tagged right after foraging, so a tag attachment of several hours recorded no foraging events. Future studies can use the methods outlined here with larger sample sizes and longer tag attachments to further investigate the differences between northern and resident killer whale foraging success.
Simulation study
We undertook a simulation study to investigate how different data-generating conditions influence the accuracy and optimal choice of within a PHMM. These simulations allow us to systematically assess the impact of key parameters on model performance and guide the selection of
in practical applications.
We ran a total of 11 experiments. Each experiment consisted of simulating 100 data sets, fitting PHMMs with multiple values of to those data sets, and evaluating the performance of the PHMMs. We first describe the process used to generate synthetic data in a series of controlled experiments. Then, we describe how we fit the PHMMs to each simulated data set and evaluated their accuracy.
Data simulation
As a control experiment, we simulated 100 data sets drawn from an HMM to obtain hidden states and observations
. Each data set was a single time series (i.e., S = 1) made up of T = 2000 observations from a PHMM with N = 2 hidden states. The probability transition matrix was set to
and the initial distribution was set to the stationary distribution of ,
. For the control experiment we set
. The state-dependent distributions were shifted t-distributions with
degrees of freedom, implying a standard deviation of
. We separated the two state-dependent distributions by
standard deviation, so
After simulating and
, we constructed the subset of labelled time indices
by randomly selecting a proportion of
of all time indices t. The time indices were selected such that at least two labels were selected from both hidden states. Then, we set
for all
and
for all
. Appendix S2 displays a time series generated using this process.
For each of the other 10 experiments, we carried out the same data generation procedure as the control experiment, but we picked exactly one of five settings (T, ,
,
, or
) and changed it to be either larger or smaller than the control. Namely, we altered the length of the time series from T = 2000 to
; the bottom row of the transition matrix in Eq 14 from
to
; the degrees of freedom from
to
; the density of labels from
to
; and the separation between state-dependent distributions from
to
.
Model fitting and evaluation
We modelled the simulated data sets using PHMMs with normal state-dependent distributions. This deliberately introduced model misspecification as the simulated observations were actually drawn from PHMMs with t-distributed state-dependent distributions. Each fitted PHMM had N = 2 hidden states, which matched the generating process.
For each of the 100 replicated data sets in each of the 11 experimental conditions, we fit six PHMMs corresponding to different weights . For evaluation, we also generated a separate test set using the same generative mechanism as the training set. We then ignored the test set’s latent state labels Z and applied the forward-backward algorithm using the estimated parameters from each of the six PHMMs. This yielded the estimated hidden state probabilities for all observations in the test set for each of the six PHMMs.
Finally, we evaluated each PHMM’s classification performance by comparing its estimated hidden-state probabilities to the true hidden states from the test set. Specifically, we computed the AUC for each combination of experiment, data set, and PHMM using the same calculation from the first case study.
Results
Fig 11 shows boxplots of the 100 AUC values from each experiment and each value of . For the control experiment (middle column), the value of
that maximizes the median AUC is
. As the degrees of freedom
grows large (e.g.,
), the optimal value of
approaches 1, indicating that it is more useful to ignore unlabelled data when the model is misspecified. As the separation of the state-dependent distributions increases, the AUCs of all PHMMs increase, but the AUCs for the PHMMs with
improve the most drastically. This indicates that it is more useful to down-weight unlabelled data when the hidden states are less separated. Next, as
increases from
to
, the classes become more balanced and the optimal value of
increases, indicating that down-weighting unlabelled data is useful when there is more class imbalance in the data set. In addition, as the proportion of labels
increases, the optimal value of
also increases, but when
, all PHMMs perform very well. This result is consistent with our conjecture that
approximately balances the contribution of labelled and unlabelled data in the likelihood. Finally, as T increases, the median AUC increases for all PHMMs except for those with
and
. This implies that even as the amount of data grows large, model misspecification can still lead to erroneous hidden state estimates when using an HMM that equally weights labelled and unlabelled data.
The middle column gives results from five runs of the control experiment. Namely, there are degrees of freedom, the separation between state-dependent distributions is
standard deviations, the bottom-right entry of the transition matrix
from Eq 14 is
, the proportion of labelled data is
, and the length of the time series is T = 2000. The left and right columns give results from the other ten experiments, each corresponding to changing the value of one setting as indicated by the subplot titles. We denote the degrees of freedom
as ‘dof’ in the figure. Each experiment was repeated on 100 independently generated time series, and the AUCs for all 100 test sets are plotted with boxplots.
Although we explore ways in which model misspecification can affect the optimal value of in a PHMM, our experiments are by no means exhaustive (see, for example, [56]). Since models on real data sets are always mispecified, we recommend that practitioners perform cross validation to determine the optimal value of
.
Discussion
In this work, we incorporated sparse labels into an HMM in a natural way that changes the influence of unlabelled observations and demonstrably improves predictive performance. In particular, we weighted observations without associated labels within the likelihood of the HMM using a parameter . On the extremes,
corresponds to an HMM which totally ignores unlabelled data and
corresponds to a traditional, unweighted HMM. We used cross-validated accuracy metrics to find an optimal value of
between these extremes.
We also conducted a simulation study to investigate how the observed data set affects the optimal weight . In particular, down-weighting unlabelled data is beneficial under challenging modelling conditions such as model misspecification, poor hidden state separation, sparse labelling, and extreme class imbalance. In such cases, choosing a value of
close to the proportion of labelled data (
) often yields good predictive performance, suggesting that this heuristic may serve as a useful starting point. However, the relationship between the observed data and the optimal value of
is complicated and unpredictable, so we recommend empirical validation (e.g., cross validation over
) in practice.
We used our weighted likelihood approach to effectively leverage underwater video and audio data and generate a more detailed description of killer whale foraging behaviour. Using cross validation, we showed that our weighted approach matches or (more commonly) outperforms the accuracy of previous baselines and single-frame machine learning methods. In addition to better performance, our method has a significant benefit over the baselines in that it infers an underlying temporal process that generates the time series itself. In this way, PHMMs allow researchers to understand latent behaviours in addition to categorizing them. The flexible structure of an HMM can also be extended in many ways, including to infer behaviours on multiple scales [7] or incorporate habitat covariates [57]. For example, in our second case study we used the structure of the PHMM to simultaneously infer fine-scale dive phases as well as successful foraging dives. We applied this approach to killer whales, but future work can focus on applying our methodology to other marine animals. We use a dive depth threshold of 30 metres to isolate killer whale foraging dives, but other marine animals may require different criteria to isolate foraging behaviour.
As is common in movement ecology, our case studies had a small number of labels, which can negatively impact the reliability of cross validation. Therefore, future work can apply this weighted likelihood approach to case studies with more labels, potentially from other fields. We also performed case studies in which we were confident in our labels. Future work can focus on how including labels affects the performance of the PHMM when researchers are less confident in their labels and infer as a parameter of the PHMM.
Although we did not have a large enough sample size or long enough tag deployments to draw definitive conclusions, the findings from our method are in line with those from Tennessen et al. [15], who conducted a comprehensive study of foraging behaviour in northern and southern resident killer whales. In particular, our results support their findings that northern resident killer whales tend to spend more time travelling and resting compared to southern residents, and that the northern residents tend to catch more fish per unit of effort compared to the southern residents.
Our weighted likelihood approach improves the estimates of unobserved labels from a time series, but it cannot ensure the interpretability and reliability of the labels themselves. For example, recent studies have found adult Chinook salmon migrating in the upper 30 metres of the water column in addition to the deep depths that we explored in this paper [58]. This suggests that killer whales likely employ two foraging strategies to exploit the dichotomy of Chinook swimming tactics. The first strategy that we and others have documented [15, 59] involves repeatedly diving to depths exceeding 30 m to locate and pursue Chinook in areas where salmon are holding or evading predators. The other may involve killer whales repeatedly searching medium water depths (10–30 m) while travelling using echolocation to identify and capture salmon near the surface or by pursuing them to depth. As such, the foraging labels from the first case study likely correspond only to deep foraging dives associated with this first foraging strategy. While we are confident that the PHMM accurately identified resting and foraging dives, we recognize that the killer whales may have been searching for Chinook during ‘travelling’ dives. In fact, we occasionally found echolocation clicks that coincided with dives labelled as travelling in our case study. It may thus be possible to further refine and validate the PHMM using echolocation data recorded by the tags to better categorize the behavioural states of killer whale travelling dives based on their depths and durations relative to the recent information on adult Chinook migratory behaviour.
We primarily focused on identifying the behaviour of killer whales, but incorporating sparse labels into complex HMMs is a common modelling problem across a variety of use cases and disciplines. In addition, complicated time series data are increasingly common as sensing technology continues to improve [8]. As such, the modelling approach developed here can help researchers effectively model complicated, sparsely labelled time series to optimize prediction accuracy and model fit.
Supporting information
S1 Appendix. Additional results from case study 1.
Figures displaying results from PHMMs fit using all five values of
https://doi.org/10.1371/journal.pone.0325321.s001
(PDF)
S2 Appendix. Plots of data used in case and simulation studies.
Figures displaying scatter plots of data used in the case studies and the simulation study.
https://doi.org/10.1371/journal.pone.0325321.s002
(PDF)
Acknowledgments
We thank Mike deRoos and Chris Hall for assistance in the field with tag deployments, Taryn Scarff for assistance with drone deployments, the M/V Gikumi captain and crew, and Keith Holmes for piloting the drone, filming the killer whales, and assisting in synchronizing time stamps. Drone footage was collected in partnership with Hakai Institute. This research was enabled in part by support provided by WestGrid (www.westgrid.ca) and Compute Canada (www.computecanada.ca).
References
- 1. Sutherland WJ. The importance of behavioural studies in conservation biology. Anim Behav. 1998;56(4):801–9. pmid:9790690
- 2. Ogburn MB, Harrison A-L, Whoriskey FG, Cooke SJ, Mills Flemming JE, Torres LG. Addressing challenges in the application of animal movement ecology to aquatic conservation and management. Front Mar Sci. 2017;4.
- 3. McClintock BT, Langrock R, Gimenez O, Cam E, Borchers DL, Glennie R, et al. Uncovering ecological state dynamics with hidden Markov models. Ecol Lett. 2020;23(12):1878–903. pmid:33073921
- 4. Lusseau D, Bain D, Williams R, Smith J. Vessel traffic disrupts the foraging behavior of southern resident killer whales Orcinus orca. Endang Species Res. 2009;6:211–21.
- 5. Ylitalo A, Heikkinen J, Kojola I. Analysis of central place foraging behaviour of wolves using hidden Markov models. Ethology. 2020;127(2):145–57.
- 6. Klappstein NJ, Thomas L, Michelot T. Flexible hidden Markov models for behaviour-dependent habitat selection. Mov Ecol. 2023;11(1):30. pmid:37270509
- 7. Leos-Barajas V, Gangloff EJ, Adam T, Langrock R, van Beest FM, Nabe-Nielsen J, et al. Multi-scale modeling of animal movement and general behavior data using hidden Markov models with hierarchical structures. JABES. 2017;22(3):232–48.
- 8. Patterson TA, Parton A, Langrock R, Blackwell PG, Thomas L, King R. Statistical modelling of individual animal movement: an overview of key methods and a discussion of practical challenges. AStA Adv Stat Anal. 2017;101(4):399–438.
- 9. Pirotta E, Edwards EWJ, New L, Thompson PM. Central place foragers and moving stimuli: a hidden-state model to discriminate the processes affecting movement. J Anim Ecol. 2018;87(4):1116–25. pmid:29577275
- 10. Adam T, Griffiths CA, Leos-Barajas V, Meese EN, Lowe CG, Blackwell PG, et al. Joint modelling of multi-scale animal movement data using hierarchical hidden Markov models. Methods Ecol Evol. 2019;10(9):1536–50.
- 11. Tennessen JB, Holt MM, Ward EJ, Hanson MB, Emmons CK, Giles DA, et al. Hidden Markov models reveal temporal patterns and sex differences in killer whale behavior. Sci Rep. 2019;9(1):14951. pmid:31628371
- 12.
Stephens DW, Brown JS, Ydenberg RC. Foraging: behavior and ecology. University of Chicago Press; 2008.
- 13. Saldanha S, Cox SL, Militão T, González-Solís J. Animal behaviour on the move: the use of auxiliary information and semi-supervision to improve behavioural inferences from Hidden Markov Models applied to GPS tracking datasets. Mov Ecol. 2023;11(1):41. pmid:37488611
- 14. Joy R, Tollit D, Wood J, MacGillivray A, Li Z, Trounce K, et al. Potential benefits of vessel slowdowns on endangered southern resident killer whales. Front Mar Sci. 2019;6.
- 15. Tennessen JB, Holt MM, Wright BM, Hanson MB, Emmons CK, Giles DA, et al. Divergent foraging strategies between populations of sympatric matrilineal killer whales. Behav Ecol. 2023;34(3):373–86. pmid:37192928
- 16.
Fisheries and Oceans Canada. Amended recovery strategy for the northern and southern resident killer whales (Orcinus orca) in Canada. Species at Risk Act Recovery Strategy Series. Canada: Department of Fisheries and Oceans; 2018.
- 17. Murray CC, Hannah LC, Doniol-Valcroze T, Wright BM, Stredulinsky EH, Nelson JC, et al. A cumulative effects model for population trajectories of resident killer whales in the Northeast Pacific. Biol Conserv. 2021;257:109124.
- 18. Noren DP. Estimated field metabolic rates and prey requirements of resident killer whales. Marine Mamm Sci. 2010;27(1):60–77.
- 19. Krogh A. Two methods for improving performance of an HMM and their application for gene finding. Proc Int Conf Intell Syst Mol Biol. 1997;5:179–86. pmid:9322033
- 20.
Bagos PG, Liakopoulos TD, Hamodrakas SJ. Maximum likelihood and conditional maximum likelihood learning algorithms for hidden Markov models with labeled data: application to transmembrane protein topology prediction. In: Proceedings of the International Conference on Computational Methods in Sciences and Engineering, Rhodes, Greece, May 1–5, 2019, 2003, pp. 47–55.
- 21. Tamposis IA, Tsirigos KD, Theodoropoulou MC, Kontou PI, Bagos PG. Semi-supervised learning of Hidden Markov models for biological sequence analysis. Bioinformatics. 2019;35(13):2208–15. pmid:30445435
- 22. Carroll G, Slip D, Jonsen I, Harcourt R. Supervised accelerometry analysis can identify prey capture by penguins at sea. J Exp Biol. 2014;217(Pt 24):4295–302. pmid:25394635
- 23. Allen AN, Goldbogen JA, Friedlaender AS, Calambokidis J. Development of an automated method of detecting stereotyped feeding events in multisensor data from tagged rorqual whales. Ecol Evol. 2016;6(20):7522–35. pmid:28725418
- 24. McClintock BT, King R, Thomas L, Matthiopoulos J, McConnell BJ, Morales JM. A general discrete-time modeling framework for animal movement using multistate random walks. Ecol Monogr. 2012;82(3):335–49.
- 25. McRae TM, Volpov BL, Sidrow E, Fortune SME, Auger-Méthé M, Heckman N, et al. Killer whale respiration rates. PLoS One. 2024;19(5):e0302758.
- 26.
Chapelle O, Schölkopf B, Zien A. Semi-supervised learning. MIT Press; 2006.
- 27.
Ren Z, Yeh R, Schwing A. Not all unlabeled data are equal: learning to weight data in semi-supervised learning. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in Neural Information Processing Systems. vol. 33. Curran Associates, Inc.; 2020, pp. 21786–97. Available from: https://proceedings.neurips.cc/paper/2020/file/f7ac67a9aa8d255282de7d11391e1b69-Paper.pdf
- 28. Ji S, Watson LT, Carin L. Semisupervised learning of hidden Markov models via a homotopy method. IEEE Trans Pattern Anal Mach Intell. 2009;31(2):275–87. pmid:19110493
- 29.
Zucchini W, Macdonald IL, Langrock R. Hidden Markov models for time series—an introduction using R. CRC Press; 2016.
- 30. Nigam K, Mccallum AK, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Mach Learning. 2000;39:103–34.
- 31. van Engelen JE, Hoos HH. A survey on semi-supervised learning. Mach Learn. 2019;109(2):373–440.
- 32. Hu F, Zidek JV. The weighted likelihood. Can J Stat. 2002;30(3):347–71.
- 33. Hu F. The asymptotic properties of the maximum-relevance weighted likelihood estimators. Can J Stat. 1997;25(1):45–59.
- 34. Higham NJ, Lin L. A Schur–Padé algorithm for fractional powers of a matrix. SIAM J Matrix Anal Appl. 2011;32(3):1056–78.
- 35. Hu F, Rosenberger WF, Zidek JV. Relevance weighted likelihood for dependent data. Metrika. 2000;51(3):223–43.
- 36. Cade DE, Gough WT, Czapanskiy MF, Fahlbusch JA, Kahane-Rapport SR, Linsky JMJ, et al. Tools for integrating inertial sensor data with video bio-loggers, including estimation of animal orientation, motion, and position. Anim Biotelemetry. 2021;9(1).
- 37. Wright BM, Ford JKB, Ellis GM, Deecke VB, Shapiro AD, Battaile BC, et al. Fine-scale foraging movements by fish-eating killer whales (Orcinus orca) relate to the vertical distributions and escape responses of salmonid prey (Oncorhynchus spp.). Mov Ecol. 2017;5:3. pmid:28239473
- 38. McInnes JD, Lester KM, Dill LM, Mathieson CR, West-Stap PJ, Marcos SL, et al. Foraging behaviour and ecology of transient killer whales within a deep submarine canyon system. PLoS One. 2024;19(3):e0299291. pmid:38507673
- 39. McClintock BT, Michelot T. momentuHMM: R package for generalized hidden Markov models of animal movement. Methods Ecol Evol. 2018;9(6):1518–30.
- 40. Visser I, Speekenbrink M. depmixS4: AnRPackage for hidden Markov models. J Stat Soft. 2010;36(7).
- 41. Quick NJ, Isojunno S, Sadykova D, Bowers M, Nowacek DP, Read AJ. Hidden Markov models reveal complexity in the diving behaviour of short-finned pilot whales. Sci Rep. 2017;7:45765. pmid:28361954
- 42.
R Core Team. R: a language and environment for statistical computing; 2023. Available from: https://www.R-project.org/
- 43. Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997;30(7):1145–59.
- 44. Viterbi A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inform Theory. 1967;13(2):260–9.
- 45.
Venables WN, Ripley BD. Modern applied statistics with S, 4th edn. New York: Springer; 2002.
- 46. Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18–22.
- 47.
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. e1071: misc functions of the Department of Statistics, Probability Theory Group (formerly: E1071). TU Wien; 2023. Available from: https://CRAN.R-project.org/package=e1071
- 48.
Singh A, Nowak R, Zhu J. Unlabeled data: now it helps, now it doesn’t. In: Koller D, Schuurmans D, Bengio Y, Bottou L, editors. Advances in Neural Information Processing Systems, vol. 21. Curran Associates, Inc.; 2008, pp. 1–8. Available from: https://proceedings.neurips.cc/paper_files/paper/2008/file/07871915a8107172b3b5dc15a6574ad3-Paper.pdf
- 49. Sato M, Trites AW, Gauthier S. Southern resident killer whales encounter higher prey densities than northern resident killer whales during summer. Can J Fish Aquat Sci. 2021;78(11):1732–43.
- 50. Saygili B, Trites AW. Prevalence of Chinook salmon is higher for southern than for northern resident killer whales in summer hot-spot feeding areas. PLoS One. 2024;19(10):e0311388. pmid:39388449
- 51. Tennessen JB, Holt MM, Hanson MB, Emmons CK, Giles DA, Hogan JT. Kinematic signatures of prey capture from archival tags reveal sex differences in killer whale foraging activity. J Exp Biol. 2019;222(Pt 3):jeb191874. pmid:30718292
- 52. Friard O, Gamba M. BORIS: a free, versatile open-source event-logging software for video/audio coding and live observations. Methods Ecol Evol. 2016;7(11):1325–30.
- 53. Sidrow E, Heckman N, Fortune SME, Trites AW, Murphy I, Auger‐Méthé M. Modelling multi-scale, state-switching functional data with hidden Markov models. Can J Statistics. 2021;50(1):327–56.
- 54. Heinze G, Schemper M. A solution to the problem of separation in logistic regression. Stat Med. 2002;21(16):2409–19. pmid:12210625
- 55.
Heinze G, Ploner M, Jiricka L, Steiner G. logistf: Firth’s bias-reduced logistic regression; 2023. Available from: https://CRAN.R-project.org/package=logistf
- 56. Pohle J, Langrock R, van Beest FM, Schmidt NM. Selecting the number of states in hidden Markov models: pragmatic solutions illustrated using animal movement. JABES. 2017;22(3):270–93.
- 57.
Florko KRN, Togunov RR, Gryba R, Sidrow E, Ferguson SH, Yurkowski DJ. An introduction to statistical models used to characterize species-habitat associations with animal movement data. 2025. https://arxiv.org/abs/2401.17389
- 58.
Hendriks BJL. Behaviour and movement of return migrating adult Chinook salmon (Oncorhynchus tshawytscha) through the Salish Sea. 2024. Available from: https://open.library.ubc.ca/collections/ubctheses/24/items/1.0444843
- 59. Wright BM, Deecke VB, Ellis GM, Trites AW, Ford JKB. Behavioral context of echolocation and prey-handling sounds produced by killer whales (Orcinus orca) during pursuit and capture of Pacific salmon (Oncorhynchus spp.). Mar Mamm Sci. 2021;37(4):1428–53. pmid:34690418