Non-Invasive Genetic Mark-Recapture as a Means to Study Population Sizes and Marking Behaviour of the Elusive Eurasian Otter (Lutra lutra)

Quantifying population status is a key objective in many ecological studies, but is often difficult to achieve for cryptic or elusive species. Here, non-invasive genetic capture-mark-recapture (CMR) methods have become a very important tool to estimate population parameters, such as population size and sex ratio. The Eurasian otter (Lutra lutra) is such an elusive species of management concern and is increasingly studied using faecal-based genetic sampling. For unbiased sex ratios or population size estimates, the marking behaviour of otters has to be taken into account. Using 2132 otter faeces of a wild otter population in Upper Lusatia (Saxony, Germany) collected over six years (2006–2012), we studied the marking behaviour and applied closed population CMR models accounting for genetic misidentification to estimate population sizes and sex ratios. We detected a sex difference in the marking behaviour of otters with jelly samples being more often defecated by males and placed actively exposed on frequently used marking sites. Since jelly samples are of higher DNA quality, it is important to not only concentrate on this kind of samples or marking sites and to invest in sufficiently high numbers of repetitions of non-jelly samples to ensure an unbiased sex ratio. Furthermore, otters seemed to increase marking intensity due to the handling of their spraints, hence accounting for this behavioural response could be important. We provided the first precise population size estimate with confidence intervals for Upper Lusatia (for 2012: N^ = 20 ± 2.1, 95% CI = 16–25) and showed that spraint densities are not a reliable index for abundances. We further demonstrated that when minks live in sympatry with otters and have comparably high densities, a non-negligible number of supposed otter samples are actually of mink origin. This could severely bias results of otter monitoring if samples are not genetically identified.


Introduction
Elusive species play an important role in conservation. Reliable information of population status and trends are crucial for improving conservation practices and management and for addressing conservation challenges, such as antagonistic interactions between protection and conflict mitigations for species involved in human-wildlife conflicts. However, elusive species are difficult to study and therefore we often lack important demographic information. The Eurasian otter (Lutra lutra) is such an elusive, conflict-laden species that has suffered dramatic declines in Europe due to hunting and man-made changes to its aquatic habitats (e.g. canalisation, water pollution, prey decline) [1,2]. Nowadays, otters benefit from protective legislations throughout Europe, and since the 1990s otter densities increased including recolonisations of areas from which they were extirpated. Since their main prey is fish, the species' recovery inevitably resulted in conflicts with fishermen [2]. Yet, important information, such as accurate estimates of population sizes, for reconciling species conservation and human interests are still lacking in most areas of Europe [1].
Otters are elusive, mainly nocturnal, and difficult to (live-)trap [1]. They are typically monitored either by direct but mostly occasional visual counts [3,4] or indirect records through dens [5], tracks [4,6,7], or faeces [8,9]. Otter faeces (spraints) are particularly suitable to study the species, because they are used for intraspecific communication. Otters produce up to 30 spraints daily and usually defecate on frequently visited conspicuous terrestrial sites at specific locations throughout their home range (e.g. rocks, trunks, under bridges, at junctions of water channels). These marking sites and spraints can be easily detected by collectors and therefore became the "standard survey method" [8] for mapping otter distributions but also for obtaining rough estimates of population sizes (see [4] for a review).
There are contrasting opinions whether spraint counts can be used as an index of abundance. Lanszki et al. [10] found a positive correlation between relative spraint density and relative numbers of otter genotypes in an area and concluded that spraint counts are suitable as such an index. Similarly, Guter et al. [11] found a positive correlation between number of spraints and number of otter visits in latrines. However, Calzada et al. [12] criticised their study because they were not able to distinguish between individuals and could hence not tell whether all visits and spraint samples were deposited by a single individual. Other researchers have also advised against the use of spraint density as an index of population sizes because of temporal, spatial, and individual sprainting variations [4,13,14].
In recent studies, researchers used otter spraints for non-invasive genetic capture-mark-recapture (CMR) analyses to estimate population size [15][16][17].With genetic techniques, such as microsatellite genotyping, it is possible to individually identify the originator and to use this information in capture-mark-recapture (CMR) models. Non-invasive genetic CMR has become a very powerful tool since its first application in the 1990s [18,19] to study rare and elusive species without direct handling [20,21]. The basic principle of this approach is that non-invasively collected samples (e.g. faeces) are genotyped at multiple molecular loci (e.g. microsatellites). This multilocus genotype is then treated as a molecular individual mark. Matching genotypes are considered to belong to the same individual and are classified as recaptures. Non-matching genotypes indicate newly captured animals. Hence, for each sampling occasion, all individuals are determined to be either captured (coded as 1) or not captured (coded as 0), resulting in individual capture histories that are used for CMR analyses. Non-invasive genetic CMR opens up the possibility to obtain estimates of population size, sex ratio, survival, migration, fecundity, or population growth [21].
However, there are several difficulties that must be overcome, such as low success rates and genotyping errors [22,23]. Genotyping errors can either result in erroneously assigning a sample to a wrong individual, because they appear to have the same genotype, or can create new so far unknown but "false individuals" (ghost individuals) by only one single loci being mistyped. The latter is more likely and can lead to overestimated population sizes [23,24]. Several methods are available to reduce genotyping errors. We have reviewed these methods in a previous publication [23] and demonstrated that the use of an error-incorporating population size estimator is crucial to receive reliable estimates. For unbiased estimates it is also required that all individuals have a reasonable chance of being collected [23]. Hence, when using faeces as DNA source, the marking behaviour of the target species has to be understood well to avoid biased results through marking differences in e.g. sex, age, social, or reproductive status [20].
Otters are assumed to defecate in nearly equal rates regardless of their sex, reproductive status, or age [2]. However, most studies employing non-invasive genetic sampling found a male bias in their sampling [15][16][17][25][26][27][28], although otter populations are likely to be slightly female-biased [1,29,30]. Therefore, Bonesi et al. [16] queried whether non-invasive sampling is appropriate to estimate population size and sex ratio of otters. They suggested differences in marking behaviour according to sex, social, or reproductive status as possible reasons and encouraged further research on these issues. Furthermore, Brzezinski and Romanowski [31] found that the sprainting intensity increases when spraints are previously removed. This raises the question, whether non-invasive genetic CMR is also affected by such a reaction. Both, differences in marking behaviour and changes in sprainting intensity could violate fundamental assumptions of CMR analyses.
Here, we wanted to first investigate the marking behaviour of otters by testing whether there are sex differences and to understand in which way this may influence the results of noninvasive CMR studies. Otter spraints can either be food remains or a jelly-like substance from the intestine, both with or without anal gland secretions [1,32,33]. Since jelly samples have higher genotyping success rates [34,35], it should be tested whether they are defecated at equal rates by both sexes, in order to assess potential sources of bias in population size estimates. Moreover, the spraint size, level of exposure, and the marking site use might also differ between sexes, affecting spraint visibility and thus detection rate. Hence, we investigated the characteristics and intensity of spraint deposition and effects of sex-specific marking behaviour using the results of a faecal-based non-invasive genetic CMR study on a wild otter population in Eastern Saxony, Germany, over a period of six years. We estimated yearly population sizes and sex ratios demonstrating that the use of an estimator incorporating both, genotyping errors and differences in marking behaviour, is crucial to obtain reliable results. Using these population size estimates, we examined whether spraint densities could serve as an index for otter abundances by testing whether spraint densities are correlated with number of genotypes or estimated population sizes. Otter monitoring based on otter spraint counts (without genetic identification) can be hampered by samples of other sympatric living species with similar-looking faeces, such as the invasive American mink (Neovison vison). Since the mink inhabited the same area, we tested whether all samples collected as otter spraints actually derive from otters.

Ethics Statement
The field sampling did not involve capturing or handling of the protected otters. Therefore, we did not require permits or approvals. The accessed land is private and required permission from the fish farmers, although the pond areas are commonly used by the local population for walks or as passage.

Study Area
The study area is located in the Upper Lusatian heath and pond landscape in the eastern part of Saxony, Germany (51°20 0 N, 14°19 0 E). Upper Lusatia covers about 5000 ha of ponds [36]. The tradition to build ponds and to use them for fish farming started already in the 13 th century [37]. Fish are harvested each autumn and ponds are drained. Three-year-old fish are sold, whereas spawning and young fish (1-2 years) are reinserted to smaller and deeper wintering ponds. In spring, summer ponds are filled with water again and stocked with fish. Besides the commercial function, the ponds offer an important habitat for many endangered species, such as the Eurasian otter. Due to fish production, the Upper Lusatia is believed to host one of the biggest and most viable otter populations in Central Europe [29,38,39].
The study area included 64 ponds (505 ha total water surface) that are clustered in seven pond areas, each comprising 8-13 ponds of varying size (0.36-39.6 ha) (Fig 1). All ponds are connected by a complex system of ditches and streams and framed by naturally vegetated embankments that are partly used as agricultural roads. Islands, extensive reed belts, and heavily   ) are considered to be off-peak seasons for otter reproduction in Eastern Germany [40].

Sampling and Microsatellite Genotyping
In each year, all ponds filled with water were included in the sampling. The number of ponds varied over years, due to the seasonally and yearly differing water regime management (Table 1). Each annual faecal collection started with a pre-sampling day on which we recorded active otter marking sites and marked already dropped spraints to facilitate recognition of fresh spraints the next day. In the morning of the following five days, all freshly deposited samples were collected from known or newly discovered marking sites. For each sample, we recorded location of marking site, size category of sample (small [ 1.5 cm], medium [ 3 cm], large [> 3 cm]), its degree of sliminess (spraint, spraint plus mucus, jelly), its exposure level (actively exposed (e.g. scratch piles), passively exposed (e.g. stones, roots, sticks, grass tussock), or non-exposed), and total number of old/fresh samples found on the marking site (1-2, 3-4, > 4). For each fresh spraint, the external layer containing sloughed gut cells was wiped off with a cotton stick. Cotton sticks were placed in a separate sterile 10 ml cryovial (Biozym Scientific, Hessisch Oldendorf, Germany) and either extracted on the day of collection (year 2006) or stored at-80°C until extraction in 1.8 ml buffer ASL (Qiagen, Hilden, Germany) (years 2007-2012). DNA was extracted from all samples employing the QIAamp DNA Stool Mini Kit (Qiagen) and stored afterwards at-20°C (for details see S1 Supporting Information). We followed all precautions recommended by Lampa et al. [23] to rigorously prevent cross-contamination during extraction and amplification. Extracted samples were genotyped using seven microsatellite markers (Lut435, Lut457, Lut604, Lut615, Lut701, Lut733, Lut914) [41][42][43] and sexed with markers Lut-SRY [43] and DBY7Ggu [44]. The latter was designed for wolverines (Gulo gulo) but also amplifies in male otters [44,45]. The nine loci were multiplexed in three primer sets (M1: Lut 457, 615, 733; M2: Lut 435, 604, 701; M3: Lut 914, SRY, DBY7Ggu) (see S1 Supporting Information for details on PCR conditions).
Because otter faecal samples from our study area have fairly high genotyping error rates and low genotyping success rates [23,34], the genotypes after one PCR per locus contained too many errors. Hence, it was crucial to repeat amplifications generating hereby a consensus genotype. To minimise costs and efforts, we followed a screening approach that consists of five amplification steps after that low-quality samples were removed according to certain thresholds (see [23] for more details). The first amplification step was also used to screen the dataset for non-target species (e.g. mink). After the fifth amplification step, all samples that generated a genotype at all but one or two loci were repeated until a reliable genotype could be assigned to the missing markers (up to 27 repeats).
The generated consensus genotypes were compared to each other; equal genotypes were scored as belonging to the same individual. Similar genotypes that mismatched at one or two alleles were re-amplified three times at the locus in question to ensure that this was not due to genotyping errors. All successfully genotyped samples were then amplified with the primer set M3 to identify sex. Individuals were identified as males after three sightings of the targeted peak. If all samples of an individual showed no PCR signal after three amplifications, we sexed this individual as a female. Individuals with less than three samples were six times amplified if no targeted peak was recorded to ensure that these samples derived from a female otter.
The six datasets of each year were subsequently checked for still extant genotyping errors with Programme DROPOUT [46] that determines probably erroneous samples (EB-test) or loci (DCH-test). Actual genotyping error rates were calculated following Broquet and Petit [47] by comparing scored genotypes with the consensus genotype (see also [23]). Amplification success rates were calculated by dividing the number of PCRs showing at least one expected allele by the number of conducted PCRs, while genotyping success rates depict the number of successfully genotyped samples relative to the number of extracted otter samples. Mean expected and observed heterozygosities (H e , H o ) and sample size corrected probabilities of identity (PI), as well as PIs for siblings (PI sib ) were computed over all six loci using software GIMLET 1.3.3 [48]. All calculations were done for each year separately and an overall mean is provided.

Marking Behaviour
For a better understanding of the otter marking behaviour, we first assessed whether spraint sliminess, amount, and exposure, as well as marking site utilisation were different for males and females. For this purpose, we pooled all successfully genotyped samples from all years and conducted a Pearson's chi-squared test for each of the four spraint characteristics. To correct for the multiple testing problem, p-values were adjusted following the Bonferroni-Holm correction [49].
To see if males and females defecate at similar rates, we compared the number of deposited spraints per individual first over all years taking the mean number of samples per individual over the six years. Since the means were not normally distributed, we compared males and females applying the non-parametric Mann-Whitney-U-test. We further tested each year separately for sex differences by taking the actual deposited number of scats per individual, using two-sample permutation tests for integer valued observations implemented in the Rpackage exactRankTests [50]. To account for alpha error accumulation, p-values were adjusted according to Bonferroni-Holm procedure.
Furthermore, we were interested in whether the three different spraint types are more or less often placed exposed and on frequently used marking sites and whether the latter have more or less often exposed samples (e.g. to see if higher quality jelly samples are easier to find for collectors). Using Kendall rank correlation coefficients, we tested for correlations between the sliminess, exposure level, and number of spraints found on the respective marking site, respectively. For this, we pooled all samples that showed at least one expected otter allele (sure otter samples). P-values were adjusted for the three correlations following the Bonferroni-Holm procedure.
The statistics performed in this chapter are done in the R environment [51].

Population Size Estimation
We estimated population sizes for each year using closed population CMR models [52], because these models require fewer estimated parameters, allow more flexible assumptions, and provide most precise and unbiased estimates. Closed models require that birth, death, or migration between sampling occasions is negligible. Because our study area was large and we sampled on five consecutive days outside the main reproductive period, these assumptions are very likely met, which was supported by a test in a previous publication [23].
Since it is unlikely that all genotyping errors were completely eliminated from the datasets [23], we employed the error-incorporating misidentification model from Lukacs and Burnham [53] (hereafter L&B estimator) implemented in Program MARK [54]. The L&B estimator adds to each closed population model available in MARK the misidentification parameter α-the probability of a correct classification. An α close to 1 indicates a low probability of still extant genotyping errors.
We estimated separately for each year the population size (Nfor males, females, and total), conditional capture (p) and recapture (c) probability, probability of a correct classification (α), and number of genotypes never captured (f 0 ). We fitted a variety of models to the data that incorporated no capture variation (M 0 ), individual (M h ), behavioural M( b ), or daily varying (M t ) catchability and combinations thereof (M bh , M th , M tb ). Since we observed a daily increase in the number of collected samples that peaked in the third or fourth sampling day and mostly decreased on the fifth day, we tested if this pattern was introduced by already sampled otters that displayed a daily changing recapture rate (c 1 , c 2 , c 3 , c 4 ), while the probability to be newly captured (p) remained constant. Each model was fitted with and without a sex difference.
Individual heterogeneity (p i ) is difficult to separate from misidentification (α), incorporating both can lead to inconclusive results [54]. Whenever p i and α were only poorly estimable, we dropped this model from the candidate model set. Models were adjusted for correct parameter counts where confounding or boundary estimates required it.
We ranked models employing corrected Akaike's Information Criterion (AIC c ) that accounts for small sample sizes [55,56]. Using normalised AIC c weights, reflecting the likelihood of a model [57], we calculated a weighted average for all parameter estimates (N males ,N females , N total , p, c, α, f 0 ). If supported models had unidentifiable parameters, a weighted average estimate for the unidentifiable parameter was calculated by dropping the respective model, but not for estimates of identifiable parameters. The model weighted average capture and recapture probabilities were weighted once more by the respective weighted average p i -value (heterogeneity parameter) and summarised for each day to receive a daily re/capture probability. With the obtained weighted average population sizes of males and females, we calculated yearly sex ratios (male to female mean) and used the total population size to compute population densities per water area (in ha), per km shoreline, and for the total area studied.
Finally, we wanted to test the hypothesis that spraint densities are good indicators for otter densities. Similar as in Lanszki et al. [10], we used a linear regression to check whether yearly numbers of genotyped scats per ha explain yearly numbers of genotyped individuals per ha or yearly numbers of estimated individuals per ha.

Sampling and Microsatellite Genotyping
Out of 2132 collected faecal samples, 2001 were suitable for DNA extraction (Table 1). After the first three amplifications with the multiplex trio M1, 179 samples could be identified as being either from minks (Neovison vison) or other unknown species (Table 1). All mink samples showed much shorter PCR products than the expected otter alleles and were compared to reference mink samples from an animal park in Leipzig, Germany (mink alleles: 120bp for Lut457; 95bp for Lut615; 142/146/150/154bp for Lut733). While in years 2006-2008 we only found seven mink samples each, the number increased tremendously from 2010 on ( Table 1). We were able to receive numbers of harvested minks (minks per trapnight-MPT) for one of our pond areas (100 ha) that clearly demonstrated an increase in minks for this period: Since further 549 samples did not produce any PCR product at all and may also belong to other species, the numbers of samples for which we recorded at least one expected otter allele decreased to 1273 (Table 1). We were able to obtain complete multilocus genotypes for 778 samples (Table 1) chi-squared test: χ 2 = 9.6, df = 2, p -adjusted = 0.0082). Males significantly defecated more often jelly samples and less often spraints than females (Fig 2).
The maximum number of scats deposited by one individual within a yearly sampling period amounted to 26. Within one night, individuals defecated on average 1.76 spraints with a maximum of 11. Both maxima were generated by males. However, taken over all years sex had no significant effect on the number of deposited scats (U-test: W = 971.5, p = 0.4189; mean -males = 4.9, median -males = 4, mean -females = 4.7, median -females = 4.5). Hence, there were also no signifi- The correlations between sliminess and exposedness showed that the more slime a sample consists of the more often it is placed exposed (more often actively than passively), whereas less slimy spraints are more often deposited in a non-exposed way (Kendall's tau = 0.087, z = 3.47, padjusted = 0.0011). On marking sites that were not used the days before, we found less often jelly samples than on marking sites with at least five old/fresh spraints (Kendall's tau = 0.063, z = 2.52, p = 0.012). When correlating the exposedness with the number of samples on a marking site, the results showed that the more samples are deposited on a marking site the more likely they are actively exposed (Kendall's tau = 0.16, z = 6.42, padjusted = 4.23 × 10 -10 ).

Population Size Estimation
In four years (2008-2012) we had to drop individual heterogeneity models from the candidate model set because heterogeneity was confounded with misidentification ( Table 2). All models Sex differences in otter marking behaviour. Frequency of genotyped otter samples regarding (A) their sliminess (spraint, spraint plus mucus, jelly samples), (B) their size (small, medium, large), (C) their level of exposedness (non-exposed, passively exposed, actively exposed), and (D) the number of otter faeces at the specific marking site (1-2, 3-4, >4 samples); all four separated by sex. Only sliminess showed a significant sex difference in a Pearson's chi-squared test (χ 2 = 9.6, df = 2, p -adjusted = 0.0082).    with sex-dependent parameters (p i , p, c, α, f 0 ) showed no significant difference in a likelihoodratio test compared to the respective model without the sex effect and were always ranked lower with ΔAIC c between 3.9 and 29 (mean = 12.1). Thus, these models were dropped from the candidate model set. The model and p i (within year capture heterogeneity) weighted average capture probabilities (p) were relatively high for each year (0.48-0.75; mean = 0.57 ± 0.07), whereas the model and p i weighted average recapture probabilities (c) were even higher (0.54-0.79; mean = 0.65 ± 0.07) ( Table 2). Except for year 2010, where we found equal but very high capture and recapture rates, the recapture probability was always higher than the capture probability, with differences between 0.011-0.23 (averaged difference = 0.08). The average misidentification parameter α ranged between 0.73 and 0.95 (mean = 0.85 ± 0.04), indicating that some samples were misidentified in each year with a probability of 5-27%. Hence, each year's dataset still harboured ghost individuals and hence genotyping errors. The derived population size estimates (N ) of all models for a particular year were very similar, even for those having AIC c weights < 0.01. The model weighted average population size using AIC c weights for each year ranged between 15 (2010) and 26 (2011) individuals (mean = 21) ( Table 2). In four years (2007/10/11/12), sex ratios ranged between 0.67 and 0.88. In 2008 the sex ratio equalled 1 and only in 2006 we found more males than females with a sex ratio of 1.2.
Using average population sizes, otter densities in our study area ranged from 0.048 (2008)  The linear regressions to test whether spraint densities are good indicators for otter densities revealed a near-significant relationship between yearly spraint densities (numbers of genotyped samples per ha) and yearly numbers of distinct genotypes per ha (R 2 = 0.62, df = 4, p = 0.063, Fig 3), whereas there was no relationship between yearly spraint densities and yearly estimated individuals per ha (R 2 = 0.24, df = 4, p = 0.33, Fig 3).

Microsatellite Genotyping and Population Size Estimation
The genotyping error rate (GER) was quite stable over the six sampling years (range: 0.44 (2012)-0. 51 (2006)), but fairly high compared to other otter studies that used the same way of calculation (15: GER = 20.9%, 16: GER = 18.1%, 45: GER = 17.3%, 58: GER = 31.9%). While the GER represents errors that are already removed from the data, the two tests in DROPOUT and the misidentification parameter α indicated that errors might still be present in the yearly datasets. Therefore, it is crucial to use population size estimators that account for genotyping errors if they cannot be entirely removed [20,23,24].
One reason for these high GERs might be the comparable high number of repetitions (up to 26 times) to gain increased genotyping success rates. Because of high error rates and low genotyping success rates, we followed a rigorous protocol including various contamination preventions during extraction and amplification, a screening approach to exclude low quality samples, and the generation of consensus genotypes via high numbers of repetitions [23]. Although those steps minimised errors they could not save us from having still undetected errors in the consensus genotypes.
Although error-incorporation is crucial for estimating population sizes and sex ratios, the statistical tests implemented for understanding the marking behaviour were less sensitive to ghost individuals. That is either because individual identification was irrelevant or because we only compared males with females. Since both sexes showed no significant difference in the number of single samples-that are potential ghost individuals-and also re/capture probabilities were equal between males and females, the number of ghost individuals should be evenly distributed among sexes. Thus, we regard the results of the tests for marking behaviour as trustworthy.
Despite acceptable low PIs [23], we had five dyads that had identical autosomal genotypes but different sexes. In two cases both individuals of the dyad were either found dead subsequently, were collected in several years, or were represented by a high number of samples within a year ( 9); they are hence likely to exist and to be closely related (e.g. siblings). For the remaining three dyads, one sex (2 ♂, 1 ♀) was only represented by a single sample in a given year and could thus be an erroneously sexed sample. Since further repetitions could not prove this and since it applied to both sexes, we treated the found genotypes to be real ones.
In each year, we had one to six more genotyped than estimated individuals. If the actual number was not underestimated, we captured most resident individuals, which can be explained by the high sampling intensity. Most studies estimating otter densities were conducted at rivers, streams, or ditches [10,30,[58][59][60], some at lakes or coasts [1,25,61], but only a few in fish pond landscapes [15,27] (Table 3). While densities seem to be lower at rivers and lakes than in fish pond landscapes (Table 3), one needs to bear in mind that comparability is limited because of different methods and water body shapes. Two studies, that also investigated fish pond landscapes employing non-invasive genetic methods, obtained higher estimates per total area [15] or per km pondside [27] (Table 3). Besides differences in pond sizes and overall landscape structures, methodological reasons could also account for this difference, because neither Hajkova et al. [15] nor Lanszki et al. [27] accounted for genotyping errors. The former used an estimation method, CAPWIRE [62], that does not account for genotyping errors. The latter counted the number of genotypes without employing population size estimators. If we would have used the same approaches, our densities would have been larger and comparable to both studies (0.006-0.009 otter per ha area using CAPWIRE; 0.35-0.56 otter per km pond shore using number of genotypes).

Marking Behaviour and Impacts on CMR Analyses
We showed that although the number of markings did not significantly vary between sexes, jelly samples (with higher success rates) were more frequently defecated by males and placed exposed on previously used marking sites with several old/fresh scats. Hence, male-biases could be introduced by preferring those kind of samples or these "hot spots"that are usually larger and more prominent, thus easier to find (e.g. markings sites under bridges). Therefore, we agree with Bonesi et al. [16] that non-invasive genetic sampling on otters has to account for their marking behaviour to gain information about sex ratios. Our results indicate that it could be crucial to not drop too many low quality samples, but to invest in replications increasing the From Faeces to Ecology and Behaviour overall genotyping success and the numbers of females successfully genotyped, and to include all kinds of marking sites in a study design, also less frequently used sites, to minimise the risk of collecting only a fraction of a population. We found an even sex ratio in 2008 and more females in 2007/10/11/12. Years 2011/12 even showed non-overlapping 95% confidence intervals (CI) between the number of estimated females and males (Table 2) and thus a female-bias. Although we found slightly more males in 2006, the CIs of males and females broadly overlapped, indicating no male-bias. The true sex ratio of otter populations is so far unknown, but a female-bias is to be expected because of an almost equal ratio for new born (♂/♀ = 1.125) or three months old cups (♂/♀ = 1.09) [30] but lower female mortalities [64]. A female-biased sex ratio was also observed by Kruuk [1] (♂/♀ = 0.83). However, most studies employing non-invasive genetic sampling [15][16][17][25][26][27][28] found more males both in number of samples and individuals. Therefore, Bonesi et al. [16] questioned the usefulness of non-invasive sampling to estimate population size and sex ratios of otters. Our balanced or female-biased sex ratios might be explained by persistent repetitions of lower quality samples, dropping only samples with no chances to gain a complete multilocus genotype, and by including all kind of marking sites in the sampling.
The fact that jelly samples are more often defecated by males also indicates that especially jelly samples have a special role either in sexual communication or for another sex-dependent function, such as social status as found for river otters [65]. Whereas Kruuk [1], Kruuk [66] postulated that spraints have probably no function in territory defence or sexual communication but in resource partitioning, a function in sexual communication was also postulated by Remonti et al. [67] and Kean et al. [33]. The latter demonstrated that volatile compounds from anal gland secretions differed in age and for adults also in sex and with reproductive status.

Behavioural Response of Sampled Otters
Compared to the capture rates, we observed higher recapture rates in almost all sampling yearsexcept for 2010, when both rates were comparably high. This could be due to a changed sampling protocol in 2010: larger faeces were first sampled with a cotton swab for genetic analyses and then entirely taken for hormone analyses. In all other years faecal samples were not removed. As otters reuse their marking sites for many years and also daily [1], higher recapture rates could be collector-induced if they searched more intensely on known marking sites or if they found more samples after a settling-in period (e.g. first 1-2 days). However, 71.1% of the individuals either never reused marking sites (45.9%) or reused one marking site at maximum twice within the five sampling days (25.2%). We also found no difference in the sampling patterns (e.g. settling-in period) between expert collectors and students. Another possibility is that already collected otters reacted on the frequent treatment of their spraints with an increased marking intensity. Such a behavioural response is known as "trap-happiness". It is known that otters use spraints for intraspecific communication [1,66] and so it could well be that they will notice if somebody handled and thus altered their markings. This could put them on the alert resulting in a higher marking intensity. Such behaviour was also found by Brzezinski and Romanowski [31], who conducted an experimental approach and found higher sprainting intensity on sites where spraints were previously removed. Removing spraints in 2010 may have disturbed the intraspecific communication such that also unsampled individuals increased their marking intensity or at least used marking sites that were seemingly free of any usage because of previous faecal removing. This is reasonable as the same marking site was used by up to six different individuals within five sampling days [64]. Regardless of whether the behavioural effect is collector-or otter-induced, it is important to account for this when estimating population size of otters (i.e. by including M b ), otherwise the results can be severely biased.
According to summed AIC c weights for each effect (constant, heterogeneity, behavioural, time) following Burnham and Anderson [57], a behavioural effect (incorporated in model M b , M bh , M tb , M tb_constrained ) was the most important when averaging over years (0.4) followed by time (0.36), heterogeneity (0.32), and constant (0.22).

Otter Monitoring by Spraint Densities
It has been argued that spraint density can be used as an index of abundance for comparison of populations in time or in space [8] and has been applied in several studies (see [68] for a review). A non-invasive genetic study even found a significant positive relation between spraint density and number of genotypes per area [10]. In our study this relationship also was close to significance. However, when relating the spraint density with the number of estimated individuals there was no correlation. Even when comparing only the four sampling years (2006,(2010)(2011)(2012) where we always sampled end of March, there was no relationship between number of individuals and samples (R 2 = 0.02, df = 2, p = 0.87). This can be explained by the removal of ghost individuals, which was not the case in the study by Lanszki et al. [10], who used the number of sampled genotypes instead of a population size estimate. It is natural that the more samples one collects in an area or period, the more ghost individuals will be in the dataset and thus the more genotypes one will have. Hence, in line with other authors [13,14], we caution against the extrapolation of otter spraint densities to relative abundances.
Furthermore, although we did not change our sampling design or the way of sampling, the number of collected mink scats increased tremendously in years 2010-2012 compared to 2006-2008 and was about two to six-fold higher. This increase was accompanied by an increased mink density from 2010 on as shown by the numbers of harvested minks with MPT 2010 = 0.091. For comparison, a saturated mink population in ca. 120 ha of the river Thames amounted to MPT = 0.04 using live-traps and including recaptures [69]. This implies that contrary to most studies stating that high otter densities are likely to entail a decline in mink densities [70][71][72], the mink proliferated quite well in our study area despite high otter densities. Similarly, Harrington et al. [73] found that mink abundances remained relatively high while otter densities raised.
Bonesi and Macdonald [74] stated that mink may persist in the presence of otters when terrestrial prey is abundant. The Upper Lusatian pond landscape is known for a high diversity in amphibians, reptiles, water birds, and small mammals [37]. However, most of the mink scats were collected because they contained fish remains, making them more similar to otter spraints. If minks coexist with otters, Bueno [75] found that minks prey on smaller fishes than otters, which might well be so for our study area. Beside mink scats containing fish remains, we also unintentionally collected mink scats that looked like otter jelly samples. Dunstone [76] already pointed out that mink can produce a jelly-like secretion. The mink samples were not only collected by students but also by expert collectors. The same difficulty was already noted by Harrington et al. [77]. In their study not a single supposed mink sample collected by experts was of mink origin; rather they belonged to pine martens (47%), foxes (41%), otters (6%), polecats (3%), or stoats (3%). In our study, fresh mink samples were found on typical otter marking sites, sometimes next to fresh otter samples from the same night. This implies that otter monitoring solely relying on otter spraints without genetically determining the species run the risk of overestimating abundance or occupancy if minks are present.
An extrapolation from spraint densities to otter densities is even more precarious to use given that a) number of samples vary seasonally [78], b) sampling rate (collector-induced) or marking intensity (otter-induced) can increase during several-day sampling periods (see discussion above), and c) one marking site is used by up to six individuals [64].

Conclusion
Faeces are a valuable source to gain information about population sizes and sex ratios via the use of genetic mark-recapture when potential error sources are carefully addressed and the marking behaviour of the target species is taken into account. We illustrated how sex differences in the marking behaviour can influence non-invasive genetic CMR, because high DNA quality jelly samples were more often defecated by males than by females and placed exposed on frequently used marking sites that are easier to find for collectors. Hence, it is crucial to not only concentrate on sampling jelly samples or on prominent marking sites. Furthermore, we recommend investing in high genotyping success rates by sufficient numbers of repetitions to ensure unbiased sex ratios and decreased genotyping error rates. Because of either collector-induced varying sampling intensity or a behavioural response of otters on spraint handling and removing, researchers should employ models that can account for a behavioural effect to receive unbiased estimates. Even when using high quality samples, researchers should use CMR models that incorporate genotyping errors to avoid overestimates, since it is difficult to completely exclude genotyping errors [23]. Our study further shows that faecal densities are not a reliable index for otter abundances because of variability in marking behaviour and because of the risk of confusion with mink faeces even by experts. Similar problems may exist for other elusive species. Therefore, we strongly recommend testing the reliability of faecal densities as index of abundance with genetic CMR methods before using them for monitoring elusive species.