Satellite Tagging and Biopsy Sampling of Killer Whales at Subantarctic Marion Island: Effectiveness, Immediate Reactions and Long-Term Responses

Remote tissue biopsy sampling and satellite tagging are becoming widely used in large marine vertebrate studies because they allow the collection of a diverse suite of otherwise difficult-to-obtain data which are critical in understanding the ecology of these species and to their conservation and management. Researchers must carefully consider their methods not only from an animal welfare perspective, but also to ensure the scientific rigour and validity of their results. We report methods for shore-based, remote biopsy sampling and satellite tagging of killer whales Orcinus orca at Subantarctic Marion Island. The performance of these methods is critically assessed using 1) the attachment duration of low-impact minimally percutaneous satellite tags; 2) the immediate behavioural reactions of animals to biopsy sampling and satellite tagging; 3) the effect of researcher experience on biopsy sampling and satellite tagging; and 4) the mid- (1 month) and long- (24 month) term behavioural consequences. To study mid- and long-term behavioural changes we used multievent capture-recapture models that accommodate imperfect detection and individual heterogeneity. We made 72 biopsy sampling attempts (resulting in 32 tissue samples) and 37 satellite tagging attempts (deploying 19 tags). Biopsy sampling success rates were low (43%), but tagging rates were high with improved tag designs (86%). The improved tags remained attached for 26±14 days (mean ± SD). Individuals most often showed no reaction when attempts missed (66%) and a slight reaction–defined as a slight flinch, slight shake, short acceleration, or immediate dive–when hit (54%). Severe immediate reactions were never observed. Hit or miss and age-sex class were important predictors of the reaction, but the method (tag or biopsy) was unimportant. Multievent trap-dependence modelling revealed considerable variation in individual sighting patterns; however, there were no significant mid- or long-term changes following biopsy sampling or tagging.


Introduction
Cetaceans spend the vast majority of their lives under water and are highly mobile and often wide-ranging, which makes them a challenging taxon to study. Two field methods -tissue biopsy sampling and satellite-linked telemetry (or satellite tagging) -are becoming widely used in cetacean studies because they allow the collection of data which are difficult or impossible to obtain by other means. Tissues obtained by biopsy sampling can be used for a range of analyses including genetics, stable isotopes, fatty acids, contaminants, hormones and trace elements (see [1] for a review) and can so address aspects such as population structure, diet and animal health (e.g., [2][3][4][5]). Satellite tagging can elucidate the movement, distribution, behaviour and habitat use of cetaceans in relation to their physical environment (e.g., [6][7][8]). Such data are critical to understanding the ecology of a species and its environmental role and, consequently, are vital to conservation or management efforts (e.g., [9,10]). The need for such information is particularly acute given the anthropogenic pressures many such populations and species face [10][11][12].
However, researchers must carefully consider their methods not only from an animal welfare perspective, but also to ensure the scientific rigour and validity of their results. The latter point is critical where methods may affect the subsequent behaviour or performance of individuals, thereby biasing the results obtained (e.g., [13][14][15]). From an ethical perspective researchers have an onus to assess the tradeoffs between the 'importance' of research, its likely benefit and its effect on animals before conducting work [16,17]; from a scientific perspective the responsibility is to design robust and valid studies [18]. Researchers should further evaluate animal effects and research methods post-hoc, refine these where needed and, importantly, publish such results [19,20].
Small cetaceans may be captured and restrained for satellite tagging and biopsy sampling (e.g., [21,22]) but this is impractical for most species and therefore remote techniques, which employ pole-mounted or projectile systems (typically fired from pneumatic rifles or crossbows) to biopsy sample or tag unrestrained animals, are most common. Remote biopsy sampling is an effective, mostly benign method of collecting fresh tissue samples from free-ranging cetaceans [1]. While cetaceans usually show some behavioural reaction to biopsy sampling, the reactions are typically mild and short-term (0.5-3 min) and the wounds made by the biopsy dart or punch heal quickly with no apparent adverse effects. Few studies, however, report on the behavioural and physiological impacts of remote biopsy sampling; this is important as different species and populations may react differently. No studies have shown longterm effects of biopsy sampling such as avoidance of the sampling area (e.g., [23]) or negative effects on reproduction and calf survival [24]; however, such effects are likely difficult to examine and only a small number of studies have attempted to do so [1].
Satellite tags are attached to animals using some form of subdermal retaining dart (e.g., [7,25]). As with biopsy sampling, relatively few remote satellite (and earlier radio) tagging studies describe the behavioural reactions of animals to tagging -if they do it is largely qualitative -and mid-to long-term follow up studies are rare. The majority of immediate reactions to tagging seem to be unnoticeable or mild and short-term [25][26][27][28][29]. Best and Mate [30] found no major effect of satellite tagging on the reproductive success of adult female southern right whales Eubalaena australis or the survival of their calves. Tagging also does not appear to affect the survival or reproductive success of humpback whales Megaptera novaeangliae [29,31].
One of the main challenges in remote satellite tagging systems is maximising the attachment durations of tags while minimising their invasiveness. Attachment durations have improved greatly (often hundreds of days currently compared to only a few days for the first attempts, see [25]) and tags have become smaller due to technological advances, but attachment duration remains highly variable. Remote satellite tagging studies were previously limited to large cetacean species, but the development of tags such as the 'Low Impact Minimally Percutaneous External-electronics Transmitter' configuration (LIMPET, [7]) has allowed tagging of smaller species such as killer whales Orcinus orca, Blainville's beaked whales Mesoplodon densirostris, false killer whales Pseudorca crassidens and pygmy killer whales Feresa attenuata [7,8,[32][33][34].

Marion Island killer whales
Marion Island (46u 549 S, 37u 459 E), which lies in the Polar Frontal Zone in the Indian sector of the Southern Ocean, has a population of 58 identified killer whales which may occur at the island year round, but are most abundant between September and December [35,36]. This population has been observed preying on southern elephant seals Mirounga leonina, sub-Antarctic fur seals Arctocephalus tropicalis and three penguin species, and the peak killer whale abundance coincides with the breeding seasons of these seals and penguins [35]. It is entirely unknown what proportion of the whales' diet each species comprises and whether or not other prey (e.g., fishes, cephalopods) are taken, particularly when the whales are not observed at the island. Killer whales in the region depredate Patagonian toothfish Dissostichus eleginoides from longline fishing vessels [37], but it is unknown whether these individuals are from the Marion Island population or if toothfish are natural prey. When animals are not observed at the island their whereabouts and movements are unknown, although eight individuals have been photographically identified at both Marion Island and the Crozet Islands, located approximately 950 km east of Marion Island [36,38]. The role of killer whales as drivers of seal and penguin population dynamics at Marion Island is important, but quantitatively uncertain [39]. The remoteness of Marion Island makes geographically wide-scale observations to elucidate diet and movement unfeasible and thus satellite tagging and biopsy sampling are vital methods to investigate the ecology of this population of killer whales.

Aims
In this paper we, firstly, report our methods for shore-based, remote biopsy sampling and satellite tagging of killer whales, the success of these methods and particularly the attachment duration and performance of LIMPET satellite tags. Secondly, we describe the immediate behavioural reactions of animals to biopsy sampling and satellite tagging and test for differences in the reactions to each. Thirdly, we test whether researcher experience influences biopsy sampling and satellite tagging. Lastly, using multievent capture-recapture analysis, we evaluate whether biopsy sampling and satellite tagging changed the behaviour of individuals, altering mid-(1 month) and long-term (,24 months) sighting patterns.

Ethics statement
Biopsy sampling and tagging was approved by the University of Pretoria's Animal Use and Care Committee (EC023-10) and the Prince Edward Islands Management Committee research and collection permits: 17/12; 1/2013; 1/2014.

Field methods
All killer whale studies at Marion Island are shore-based as boat-based work is not logistically possible or permitted [40]. Shore-based photographic identification (photo ID) has been successful at Marion Island as killer whales frequently approach within a few metres of the shore (Figure 1; [41]). This also allows work in weather conditions unsuitable for boat-based operations and importantly, in this study, allowed us to assess the reactions of animals to biopsy sampling and satellite tagging without any confounding reactions to boats.
We use 'sampled' and 'sampling' to refer to both biopsy sampling and satellite tagging; biopsy sampling is distinguished. We biopsy sampled and satellite tagged killer whales at two locations (Rockhopper Bay and Transvaal Cove) on the island's leeward east coast, near (,1.0 km) a long-term observation/photo ID site [41]. Both locations are low rock ledges, 1.0-2.0 m above the water surface. Sampling attempts were made primarily during 'dedicated observation sessions', in which the marksman would wait for killer whales for a predetermined length of time (typically 3-10 hours). We used a 68 kg draw weight recurve crossbow (Barnett Panzer V; Barnett Outdoors, LLC, Tarpon Springs, Florida, United States of America) equipped with a red dot sight for biopsy sampling and satellite tagging. Bolts were tethered with line and a fishing reel mounted on the crossbow (Methods S1, [42]). Biopsy and tagging attempts were made by two arbalesters during the study and reactions -described in Table 1-were scored by the arbalester. After October 2011 the arbalester usually wore a high-definition video camera (GoPro HD Hero and GoPro HD Hero 2; Woodman Labs, Inc., Half Moon Bay, California, United States of America) to record biopsy and tagging attempts ( Figure 1).
Biopsy sampling. We obtained tissue samples using stainless steel biopsy tips (25 mm67 mm) attached to the bolts; a steel flange prevented penetration beyond 25 mm. Tips were sterilized before use and stored in clean plastic bags (Methods S1, [42]). The tissue samples obtained were stored for genetic, stable isotope and fatty acid analyses (Methods S1).
Satellite tagging. We deployed three models of satellitelinked telemetry devices: Sirtrack Kiwisat 202 (Sirtrack Ltd., Havelock North, New Zealand), Wildlife Computers SPOT5 and Wildlife Computers Mk10-A (Wildlife Computers, Redmond, Washington, United States of America). All three tag models allow estimation of geographic position via satellite using the Argos System (Collecte Localisation Satellites, Toulouse, France); the Mk10-A tag additionally includes a pressure (depth) sensor and a fast-response thermistor. Position estimates are classed by Collecte Localisation Satellites based on the estimated accuracy of the position, as follows: Class A and B -no estimate; 0-.1 500 m; 1-500-1 500 m; 2-250-500 m; 3-,250 m (Table S2) [43]. To extend tag battery life while maintaining biologically sensible data capture, tags were programmed with various transmission schedules or 'duty cycles' (Table S2).
The tags were all in the LIMPET configuration where the tag is externally attached to the animal by sub-dermal darts which typically do not penetrate past the blubber layer ( Figure 2; [7]).
Penetration deeper than the length of the darts is prevented by the tag itself. This is in contrast to a typical 'fully implantable' tag where the transmitter is largely sub-dermal and the attachment darts (or anchors) may often penetrate through the blubber into the muscle (e.g., [25,29]).
Kiwisat 202 tags were attached using 65 mm medical-grade stainless steel darts designed by RRR following [7]. Following an initial deployment with two darts (PTT 67764 in Table S2) we had difficulty attaching the tags and changed to a single dart design for these tags. SPOT5 and Mk10-A tags were attached using two 65 mm titanium darts designed and manufactured by RDA and Wildlife Computers (described in [7]). Tags (including darts) weighed 114 g (Kiwisat 202), 59 g (SPOT5) and 75 g (Mk10-A).
For deployment, tags were held on the crossbow bolt using urethane cups which fitted over the tag body ( Figure 2). On impact with the animal, the sudden deceleration causes the tag to separate from the tag cup and bolt, which are retrieved using the tether (Figure 1; as for biopsy sampling). To prevent losing the tag if a shot was missed, Kiwisat 202 tags were additionally secured using two small screws which sheared the tag cup on impact with the animal and Wildlife Computer tags were secured using water

Reactions to biopsy sampling and satellite tagging
We evaluated behavioural responses to tagging and biopsy by fitting generalized linear mixed models (GLMMs) using package lme4 in R [44,45]. We treated reactions as binomial; i.e., no response vs. response. The reaction observations (n = 103) were not independent because we resampled some individuals and we therefore included individual as a random effect. Our candidate models included combinations of three variables which potentially affected response: biopsy/tag (whether a biopsy sampling or tagging attempt), hit/miss (whether the tag or biopsy arrow hit or missed the animal), and class (adult male, adult female or juvenile) ( Table 2). Interactions between explanatory variables were not considered. Models were compared using Akaike's Information Criterion corrected for small sample sizes (AIC c ). The model with the lowest AIC c is the most parsimonious model in the model set [46].
To test the validity of using binomial reactions rather than the reaction scores as defined in Table 1, we also compared the reaction scores using Kruskal-Wallis rank sum tests (kruskal.test in R) followed by multiple comparison tests where applicable (kruskalmc in package pgirmess in R; [47]).

Effect of arbalester experience
To test whether the experience of an arbalester influenced the probability of hitting the target individual in a sampling event (hit/ miss, as above), we fitted generalized linear models (GLMs) with a binomial error distribution in R. Both arbalesters were proficient marksmen and underwent training before fieldwork; however, neither had field experience of remote biopsy sampling or satellite tagging prior to this study. We therefore used the cumulative number of sampling attempts by the arbalester as a proxy for their experience level at each sampling attempt. Candidate models included all combinations of the following predictor variables: experience, biopsy/tag (as above), arbalester (the identity of the arbalester) and range (estimated range of the shot, in meters) ( Table 3). As for the GLMMs, interactions between variables were not considered and AIC c was used to compare models.

Sighting patterns
We used two approaches to detect changes in the sighting patterns of individuals after sampling using photographic identification sighting histories from 2006/04-2013/05 (sighting proportion) and 2008/05-2013/05 (mark-recapture). Briefly, dorsal fin photographs were taken during opportunistic (2006-2013) and dedicated (2008-2013) survey sightings and individuals were identified based on characteristic features such as scarring, mutilation and pigmentation. We stringently scored photographs based on their quality and used only good quality photographs to create a sighting history for each individual. All individuals were considered equally identifiable from good quality photographs, irrespective of the uniqueness of their characteristic features. Thus, individual variation in 'recognisability' should not affect the detection process (see [41] for methods). Sighting histories were restricted to sightings near (,1.0 km) the biopsy/tagging sites.
Sighting proportion. Firstly, following [23], we compared an individual's 'sighting proportion' before and after sampling. For a given period, the sighting proportion was simply the number of photographic sightings of a given individual in that period divided by the number of photographic sightings of all individuals in that period. Sighting proportions were calculated for all sampled individuals before and after each sampling attempt and compared with a Wilcoxon paired Rank Sum Test (wilcox.test in R). Mark-recapture analysis. Secondly, we used multievent mark-recapture models [48] to determine whether sampling reduced future detection probabilities. Typically, when individuals are physically captured, they may seek (trap-happy) or avoid (trapshy) the sampling area (the 'trap') on future occasions [49]. We considered two possible responses to sampling. Firstly, sampling may result in temporary avoidance of sampling area, affecting detection only at the time-step following the one when the animal was sampled ('trap-dependence' in capture-recapture parlance [49,50]). Alternatively, sampling may permanently alter individuals' behaviour, resulting in a permanent state change with reduced detection following sampling, i.e., long-term trap-dependence. In this long-term trap-dependence model, instead of automatically returning to their initial state one time interval after being sampled [49,50], individuals permanently remained in a 'sampled' state. For the purpose of our study, 'normal' trapdependence corresponded to the mid-term (1 month) effect of sampling (Data S1), while long-term trap-dependence corresponded to the long-term (up to 24 months) effect of sampling (Data S2). Thus, in the model where response to sampling was temporary, animals reverted back to the naïve state after one month. Where sampling was assumed to permanently influence behaviour, the state change was permanent.
Before trying to estimate the effect of sampling on individuals' behaviour, we had to account for intrinsic individual heterogeneity in detection, as failure to do so may lead to flawed inference [51].  One-sided directional test statistics (the signed square roots of the x 2 -statistics) for Test3.SR (a test for transience) and Test2.CT (a test for trap-dependence) in U-CARE [52] suggested significant heterogeneity in detection (Table S3, [53] and references therein). We used capture-recapture mixture models [54,55] that model heterogeneity using discrete 'classes' of individuals with low or high detection probability. Transience was accommodated by separately estimating the survival probability over the interval immediately following the first observation of the individual at Marion Island and survival during following intervals [56]. Mixture models specified the existence of two hidden states, representing individuals with distinct probabilities of detection. Our specification of two classes of individuals should not strictly be interpreted as evidence of the existence of two such classes; rather, these classes introduce heterogeneity in detection, improving model selection and reducing bias in parameter estimates [54].
Individual capture histories (n = 48) were based on photographic resightings between 2008 and 2013 (Data S1, Data S2). The full set of resightings for each individual was reduced to monthly 'capture occasions' (i.e., an individual was considered resighted or 'captured' in a month if it was photographed at least once in the month). At each occasion resighted individuals were known with certainty to be 'sampled' or 'not sampled'. We thus defined three events: 'not observed', 'resighted; not sampled' and 'resighted; sampled'. Depending on which of the above-described model structures we used, we defined up to nine states ( Figure S2). Individuals moved in a Markovian way between the states. In the most complex model the states were thus: 'Seen t-1 ; sampled', 'Not seen t-1 ; sampled', 'Seen t-1 ; not sampled' and 'Not seen t-1 ; not sampled'. Assigning the four states to two hidden groups with different detectability increased the number of states to eight. Finally, 'death' was explicitly included as a state. Transitions between states were decomposed as: 1) survival, 2) detection conditional on survival, and 3) sampling, given survival and detection ( Figure S2). Models were fitted using program E-SURGE 1.9.0 [57].
Seasonality was introduced by separating the peak in killer whale abundance (September -December) from the rest of the year. Two periods of varying observer intensity (2008-2011 and 2011-2013) were also considered. Sampling was only possible when animals were seen, and sampling probabilities were constrained to the sampling period (2011-2013).
For both mid-term and long-term response to sampling, the same four initial candidate models were ranked using QAIC c (sample size corrected, quasi-likelihood Akaike's Information Criterion [46]). This initial set of four models was designed to help us decide on the best model structure for seasonality (winter/ summer) among the following four options: 1) no seasonality; 2) same seasonality effect for all individuals; 3) seasonality applying to all individuals but in different strength for two hidden groups (suggesting variation in seasonal attendance between individuals); 4) seasonality applying only to one of the hidden groups (suggesting 'residents' and 'migrants'). All models included two age classes for survival (transience model) and two periods of different field effort. They all included the effect of sampling (either long-term or mid-term). Having selected a seasonal model based on QAIC c , we removed the sampling effect from the model and evaluated the change in QAIC c .

Results
Overall, 109 biopsy and satellite tagging attempts were made, resulting in 71 hits (Table 4; Data S3). Of these, 101 attempts were made in 236 'dedicated observation sessions' (on 231 days) totalling 1,645 hours -therefore an attempt was made every 16 h 17 m, overall. Biopsy hit rate was lower than tagging hit rates and biopsy sampling rate was low (43%). Tagging rate for Kiwisat 202 tags was very low (30%), reflecting-together with the short attachment durations (below and Figure 3)-the greater size and weight of these tags and the unsuccessful design of the attachment darts used with the tags. Tagging rate for the SPOT5 and Mk10-A tags was high (86%). Biopsy attempts were made at ranges from 3-20 m (average 8 m) and tagging attempts were made at ranges from 3-9 m (average 6 m).

Reactions to biopsy and satellite tagging attempts
All responses corresponded to 'no response' and 'low response' in [1]. Several animals turned on their sides -they seemed to be looking at the arbalester, but may have been looking at the impact site (as described by [58]). Some animals rolled a number of times when tagged. Both such reactions were scored as 2 (Table 1); where the rolls were combined with an extended dive or flight the reactions were scored as 3. The most frequent reaction to a miss was 0 (no reaction), while the most frequent reaction to a hit was 1 (Figure 4). This was typically a slight acceleration, immediate submergence and/or a shake of the body (cf. [58]) ( Table 1). Such responses were often so slight that they were difficult to see, even when reviewing video footage.
In the GLMMs, the variance of the individual random effect was effectively zero, indicating either low individual variability in behavioural response, or that we were unable to detect individual variation with this limited data set. The model with the most support included hit/miss and class (adult male, adult female or juvenile) as predictor variables (Table 2). Hit/miss was the most important predictor variable (v i = 1), followed by class (v i = 0.86) ( Table 2). Biopsy/tag had essentially no support, ranking lower than the null model when included as the only predictor variable. Adult females were most likely to respond, followed by juveniles and lastly males. Although the probability of response was highest when hit, behavioural responses were often present when missed ( Figure 5).
Results of the Kruskal-Wallis tests support those of the GLMMs. Overall, there were significant response differences in the various categories (Kruskal-Wallis x 2 = 18.48, df = 3, p,0.01). Reactions to tag and biopsy were not significantly different (x 2 = 0.58, df = 1, p = 0.45) while reactions to hit and miss were (x 2 = 13.812, df = 1, p,0.01). Post-hoc multiple comparisons showed significant differences between reactions to tag-hit and tag-miss, biopsy-hit and tag-miss, and biopsy-miss and tag-hit (Table S4).

Effect of arbalester experience
The most supported model included only range as a predictor variable (b = 20.1360.06, p = 0.038). Models including experience and biopsy/tag in addition to range had DAIC c ,2, but only range was a significant or near-significant predictor in these models (Table 3).

Sighting patterns
Changes in sighting proportion were typically small, and mean changes ranged from 20.02-0.68 percentage points ( Figure S1). We found no significant differences when comparing sighting ratios before and after tagging/biopsy attempts; there also was no difference if we considered hits only (Table S1). The most frequently observed individual showed very large, positive changes in sighting proportion, but results remained the same if we repeated the comparison without this individual.

Multievent mark-recapture
Models not accounting for heterogeneity performed poorly. The most parsimonious seasonality model allowed detection of both hidden groups to fluctuate independently with season. Removing seasonality from the one mixture group (thus creating a 'resident' group with constant detection throughout the year) increased the QAIC c score.
When sampling was modelled as a permanent state change, QAIC c favoured removal of the sampling variable (Table 5). When sampling was modelled as a temporary state change, the sampling variable explained enough variation in detection probability to remain in the top ranked model, although the difference in QAIC c was only 0.09, indicating that the effect of sampling on detection was weakly supported ( Table 6). In that model, individuals seen and sampled during month t-1 had a higher probability of being detected in month t than individuals that were only seen (and not sampled) during month t-1 ( Figure 6).  Since we corrected for among-individual variation in sighting probability via the mixture model structure, this 'trap-happy' response suggests a possible bias towards sampling (and repeatsampling) of 'tamer' individuals. Indeed, upon removing the individual that was most often seen and also repeatedly sampled and repeating the analysis, the model including sampling ranked lower than the model without the sampling effect (DQAIC c = 1.13). Finally, the probability of sampling, given detection, was 0.18 (95% confidence interval: 0.14-0.25).

Discussion
Our results suggests that land-based remote biopsy sampling and satellite tagging of killer whales at Marion Island are an effective means of collecting otherwise elusive data and the methods elicit only mild, short-term behavioural responses. We show the potential of multievent trap-dependence models (compared to simpler approaches such as [29][30][31]) to assess responses to sampling while controlling for intrinsic heterogeneity and other covariates. We found no mid-(1 month) or long-term (,24 months) avoidance of the study site following biopsy or tagging and conclude that there is no evidence of behavioural changes due to sampling.

Biopsy sampling
Our successful biopsy sampling rate was low compared to biopsy sampling rates of odontocetes in other studies using bows (crossbows and compound bows) (mean 6 SD = 68% 619 percentage points in [1] compared to our 44%). Biopsy sampling rates of odontocetes with bows are typically lower than for mysticetes or using guns and poles [1], but we further attribute our low biopsy sampling rate to the tether line which worsens the crossbow's already poor performance in wind (of which there is a great deal at Marion Island) and taking less than ideal shot opportunities as necessitated by the shore-based study. Although biopsy sampling opportunities are rare and required many hours of dedicated observations, shore-based work proved viable and we managed to biopsy sample nearly half of all identified whales in our population in the first two years of biopsy sampling. Biopsy sampling rates were lower than tagging rates mainly because tagging was only attempted at much closer ranges (3-9 m, mean = 6 m, compared with 3-20 m, mean = 8 m).

Satellite tagging
Low tagging rates and short attachment durations meant that the Kiwisat 202 tags were not worth deploying (in a cost-benefit sense); this was due largely to poor attachment darts as the tags themselves performed well. The greater size and weight of that configuration probably contributed to their short attachment times -larger tags are subject to greater drag in the water and heavier tags slow the bolt's speed when fired, which may mean that darts do not consistently penetrate to their full depth. This also affected the trajectory of the shot -the heavier tags did not always strike at an appropriate angle, necessitating a single-dart design which further reduced attachment duration. This underlines the importance of using proven techniques and technologies in biopsy and tagging studies. When these are not available, methods and equipment should be developed with the input of those with relevant expertise and experience (e.g., field biologists, engineers, veterinarians) and tested in as realistic a way as possible (e.g., using cetacean carcasses to test tagging and biopsy techniques [59]). When species or populations of special conservation concern are involved, methods and equipment may need to be tested on other species or populations first [12].
Attachment durations were longer but highly variable (like other studies report) for SPOT5 and Mk10-A tags and still short compared to fully implantable tags (e.g., [60,61]). This represents the compromise of a minimally invasive, external tag attachment which can be deployed on smaller species compared to configurations where the tag itself is fully implanted, as used on large whales. Our average SPOT5 and Mk10-A deployment durations were shorter than, but as variable as, other studies using the same tag setup (mean 6 SD = 24624 d in [7]; 43623 d in [32], 32622 d in [8] and 46641 d in [34]). At Marion Island killer whales frequently hunt and patrol in dense bull kelp Durvillaea antarctica and giant kelp Macrocystis pyrifera forests which circle the island inshore, and we suggest that this may shorten attachment durations as tags may become ensnared. We obtained a greater number of accurate position estimates per day than large whale studies using fully implantable tags (e.g., 1.561 in [6], 261.6 in [61]), but we anticipated shorter deployments than those studies and our tags were programmed to transmit more frequently. Killer whales also have shorter dive durations than large whales. The LIMPET setup is thus currently more useful for finer scale movement studies.

Reactions
Reactions to tagging were similar to the few responses described in other tagging studies [7,25,26,29,62] and to reactions in other biopsy studies (reviewed by [1]), although there were no 'strong' (sensu [1]) reactions in our study. Some authors have attributed responses largely to the research boat rather than the actual tagging or biopsy, but we show that killer whales do respond to shore-based tagging and biopsy (as in [7]).
Although slightly stronger reactions were more frequent in response to tagging, the type of sampling (biopsy sampling or tagging) was not important in determining whether an animal would respond. Similarly, Reeb and Best [63] noted that southern right whales' reactions do not differ when biopsied with deep (11-20.5 cm) darts compared to more superficial darts used in a previous study [24]. This might suggest that, in general, responses to biopsy sampling and tagging are primarily startle, and not pain, responses. However in our study hit vs. miss did influence reactions, indicating that there is an effect of an object hitting the animal's body compared to hitting the water. We cannot say whether hitting the animal's body is simply more startling to the animal or if, and how much, pain plays a role.
Some individual variation in behavioural reactions may be expected, but this was not evident in our study. It is possible that our data were too few to detect consistent individual variation. Sex and age, however, did influence reactions. Adult males were less likely to react than juveniles and adult females. Other studies report that group composition influences reaction but very few studies report sex-differences: Brown et al. [64] reported that female humpback whales responded more often to biopsy sampling, Gauthier and Sears [65] report the same for female fin whales Balaenoptera physalus. Figure 5. Predicted probability of an immediate behavioural response of killer whales to biopsy and tagging. Response probabilities as predicted by our best generalized linear mixed effects model, which included class (adult male, adult female or juvenile) and method (biopsy or tag); see Table 2. doi:10.1371/journal.pone.0111835.g005 Table 5. Selection criteria for multievent capture recapture models of sighting histories of killer whales at Marion Island: long-term (up to 24 months) responses following sampling (tagging or biopsy) attempts. Noren and Mocklin [1] name research team experience as an important factor influencing the success of collecting biopsy samples from cetaceans (although only [58] provides any qualitative support for the statement). We found almost no support for an effect of arbalester experience on sampling success, however such an effect may be obscured by the baseline proficiency of the arbalesters (both had undergone training prior to fieldwork), may only become apparent after even more experience (e.g., hundreds of sampling attempts compared to less than one hundred in this study), or may be stronger in vessel-based studies, where the vessel driver's experience is also relevant (e.g., [58]). Regardless, research team experience remains an important consideration in terms of animal welfare. Consequences of inaccurate shooting may include: hitting non-target animals; hitting target animals at the wrong body location -an important concern for satellite tags which need to be above water to transmit and for biopsy samples where tissue characteristics may vary, Table 6. Selection criteria for multievent capture recapture models of sighting histories of killer whales at Marion Island: mid-term (1 month) responses following sampling (tagging or biopsy) attempts.  affecting subsequent analyses [3]; and the loss of equipment. Hitting a non-target animal or the wrong place on the body may result in serious injury to the animal.

Sighting rates
Multievent models provided a flexible framework to model the response of individuals to sampling while accounting for demographic processes of the population. The sighting ratio method assumed that 'all animals are equal' with regards to seasonal movement and thus availability for detection; this heterogeneity could confound the results of a simple analysis. In this study the results were not fundamentally different: neither demonstrated a negative response to tagging or biopsy. However, the multievent approach showed the important effect of seasonal occurrence and different residence patterns which influenced sighting probabilities. The weak mid-term (,1 month) positive response to sampling seemed to be caused by a single individual, which underlines the importance of taking individual variation in sighting rates into account. This also highlights potential sampling biases (e.g., sexbiased biopsy sampling [66]) which we could fortunately detect by photographic identification of all sampled individuals. Individuals that centre their home ranges in the study area and have higher sighting rates are more likely to be sampled due to their general availability. Field effort will need to continue in order to generate enough chances to sample animals that occasionally visit the sampling area.
Can sampling lead to mid-or long-term behavioural changes?
Whether or not biopsy sampling and satellite tagging can lead to mid or long-term changes in behaviour depends on several factors. Firstly, an individual must be aware of the sampling attempt. We have shown that individuals do react to sampling attempts (58% of attempts), and are thus often aware of them. However, the absence of a visible behavioural response to a sampling attempt does not necessarily imply that the animal is unaware of the attempt. Several studies have shown physiological responses to human disturbance where there was little or no behavioural response (e.g., [67][68][69]). This underlines the utility of measuring physiological stress indicators such as glucocorticoid hormones or heart rate, however in many cases such measurement itself will result in stress, confounding the measurements [70,71]. Secondly, the sampling attempt must be perceived negatively by the individual. We assume the immediate behavioural reactions sometimes associated with biopsy sampling -such as defecation, tail slapping, breaching and flight from the area -(see Table 3 in [1]) indicate a negative stimulus, be it fright or pain. Thirdly, in our case where sampling attempts were land-based at two locations, the individual must be able to associate its experience (the sampling attempt) with a spatial location or other cue (seeing the arbalester, for example) and this memory must persist for some length of time. This would seem well within the capabilities of many animals (e.g., [72][73][74]) and certainly killer whales, which range widely but show strong interannual site fidelity (at Marion Island - [41]) and are cognitively complex [75]. Lastly, given the above, the strength of the negative experience must be sufficient to alter behaviour. Animals may not show a mid-term behavioural response because the motivation to perform an activity (e.g., foraging), or to remain at a high quality site, may exceed the motivation to avoid sampling; individuals may also lack suitable habitat to disperse to in order to avoid sampling. This can be framed as a cost-benefit tradeoff if the disturbance stimulus (in this case sampling) is equated to predation or injury risk [76,77]. This may beg the question whether killer whales -which do not have significant natural predators -are less sensitive to disturbance stimuli.
Our two sampling locations, ,1 km apart, represent a short stretch of the ,50 km stretch of Marion Island coastline patrolled by killer whales [41,78,79]. Breeding colonies of killer whale prey (seals and penguins) at these sites represent a small proportion of the total breeding populations of these species at Marion Island (Table S5). We consider it plausible that an individual killer whale could alter its path by a few hundred meters to avoid the sampling sites, and that this would not represent a considerable energy cost or loss of foraging opportunity. Social bonds may possibly prevent sampling site avoidance, particularly when only some group members have been sampled, but our analyses of the social structure of Marion Island killer whales over 7 years (RRR and PJNdB, in preparation) indicates considerable flexibility in social groups. Half Weight Association Index values -an estimate of the proportion of time two animals spend together -range from 0.21-0.66 (average 6 SD = 0.4860.18) within defined social units, clearly indicating that animals are not constantly associated. Further, 370 (13%) of 2,821 sightings recorded in that study were of single (lone) individuals. This suggests that social bonds between killer whales will not necessarily prevent individuals from avoiding the sampling sites.
The factors we have mentioned which may prevent short term disturbance (sampling) from causing mid-term behavioural changes are intractable in this study, but could stimulate further research in different species or settings. There is debate as to how well behavioural changes signal the sensitivity of animals to disturbance [80]. In cetaceans, documented disturbance is likely largely due to direct or associated noise (e.g., [81] for killer whales). The mid-to long-term sensitivity of cetaceans to satellite tagging and biopsy sampling is unknown, but seems negligible. Best et al. [24] show sensitization to biopsy sampling up to 65 days in female southern right whales with calves, but such cases seem rare [1].
Importantly, we found no significant long-term (,24 months) changes in the sighting probability of tagged or biopsied killer whales. In the only study using a comparable method to ours, Tezanos-Pinto and Baker [23] found no difference in the longterm sighting probabilities between biopsied and non-biopsied bottlenose dolphins Tursiops truncatus. Our study supports the idea that cetaceans do not change their long-term behaviour in response to being sampled. However, if such responses are subtle, they may require considerable data and time to detect. We have not tested for physiological responses (e.g., stress) on any temporal scale, nor for an impact on hunting behaviour and demographic performance.
However, one of our stated aims was to 'evaluate whether biopsy sampling and satellite tagging changed the behaviour of individuals, altering mid-(1 month) and long-term (,24 months) sighting patterns.' We wished to evaluate any behavioural changes to our tagging and biopsy sampling protocol, rather than determine the mechanisms affecting such behavioural changes (or lack thereof, as we found). Our results are therefore meaningful independent of any evaluation of intermediate factors, however we recommend longer term monitoring to assess whether satellite tagging and biopsy sampling have any effect on demographic parameters (e.g., [82]).

Conclusions
Remote biopsy sampling and satellite tagging of killer whales from shore is successful at Marion Island and these methods can provide insights into the ecology of this population which is difficult to access at sea. We found that reactions to biopsy sampling and satellite tagging were mild or unnoticeable and we found no significant mid-or long-term changes in the occurrence of killer whales at the study site. However, long-term monitoring of individuals after biopsy sampling and tagging should continue in order to provide continuous assessment of potential impacts on the study animals. Such monitoring should be implemented in other studies where animals are biopsied or tagged, especially considering the increased use of these methods. Figure S1 Changes (percentage points) in the sighting proportion of killer whales at Marion Island following various sampling events. a) tag or biopsy -first attempt; b) biopsy -first attempt; c) biopsy -first hit; d) tag -first attempt; e) tag -first hit. Sighting proportion (%) was calculated as the number of sightings of an individual during a given period, divided by the number of sightings of all individuals in the same period. Negative change thus indicates an individual was seen less following a sampling event.

Supporting Information
(TIF) Figure S2 A multinomial tree diagram with arrows denoting the possible transitions between states (solid boxes) from t to t+1. States occupied are not directly observed, but events (dashed boxes) represent observations following initial capture ('Encounter'). Individuals belong to one of two hidden classes with distinct probabilities of detection; movement between detection groups over time is not allowed. Entry to the population conditions on the first encounter ('Seen') and all individuals are seen once or more prior to sampling ('Initial state' step). Subsequent state transition probabilities are decomposed in three steps as the product of the probabilities of 'Survival', 'Detection' and 'Sampling'. Only individuals that are detected ('Seen') can be sampled. Once sampled, individuals either remain in the sampled state (permanent state change scenario; solid arrows) or may move back to the 'Not Sampled' state at the next occasion (mid-term sampling effect scenario; dashed arrows).

(TIF)
Table S1 Comparisons of sighting proportions before and after tagging and biopsy attempts on killer whales at Marion Island (paired Wilcox rank sum test). The sighting proportion is the number of photographic sightings of an individual in a given period, divided by the number of photographic sightings of all individuals in that period (following [1]). Notes: a N is the number of sampling attempts included for each comparison. b W is the test statistic. c Tag or biopsy -first attempt includes only the first attempt (regardless of whether it was a tag or biopsy attempt), hence it is not the sum of Tag -first attempt and Biopsy -first attempt.    Data S3 Satellite tagging and biopsy sampling of killer whales at Marion Island. Satellite tagging and biopsy sampling attempts are shown, with associated data. Class: AMadult male; AF -adult female; J -juvenile. Success: Y -yes (hit and sample for biopsy sampling attempts, hit and attach for satellite tagging attempts); N -no. Reaction: see Table S1 in text. Range -range of the attempt (in meters). Attempt -cumulative attempts by the arbalester. (CSV) Methods S1 Further information about field methods used. (DOCX)