The challenge of monitoring elusive large carnivores: An accurate and cost-effective tool to identify and sex pumas (Puma concolor) from footprints

Acquiring reliable data on large felid populations is crucial for effective conservation and management. However, large felids, typically solitary, elusive and nocturnal, are difficult to survey. Tagging and following individuals with VHF or GPS technology is the standard approach, but costs are high and these methodologies can compromise animal welfare. Such limitations can restrict the use of these techniques at population or landscape levels. In this paper we describe a robust technique to identify and sex individual pumas from footprints. We used a standardized image collection protocol to collect a reference database of 535 footprints from 35 captive pumas over 10 facilities; 19 females (300 footprints) and 16 males (235 footprints), ranging in age from 1–20 yrs. Images were processed in JMP data visualization software, generating one hundred and twenty three measurements from each footprint. Data were analyzed using a customized model based on a pairwise trail comparison using robust cross-validated discriminant analysis with a Ward’s clustering method. Classification accuracy was consistently > 90% for individuals, and for the correct classification of footprints within trails, and > 99% for sex classification. The technique has the potential to greatly augment the methods available for studying puma and other elusive felids, and is amenable to both citizen-science and opportunistic/local community data collection efforts, particularly as the data collection protocol is inexpensive and intuitive.


Terminology
Trail: An unbroken series of footprints made by one animal Footprint: A single impression made by a foot Track: Commonly used in the literature to describe both an individual footprint and a trail. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

The challenge and necessity of studying puma populations
The puma (Puma concolor; also commonly known as mountain lion, cougar, panther, catamount) is the most widely distributed free-ranging land mammal in the Americas, ranging from Northern Canada to the Southern Andes. Like other cryptic carnivores with large territories, puma populations are notoriously difficult to study [1]. Despite the challenges, agencies responsible for managing pumas are often tasked with estimating their populations [2]. The need for reliable data on puma populations has led to the development of a range of research approaches including: capture-recapture [3,4], extrapolation of hunter harvest data [2], comparison of harvest and demographic data [5], camera trap surveys [6,7], aerial snow trail counts [8], and several variations of track surveys [9][10][11]. However, there are shortcomings with each of these methods that prevent wide-scale adoption. While estimates of absolute abundance (the total number of animals in a population) are generally preferred over indices of relative abundance, the former usually require the identification of individual animals. A common method for identifying individual pumas in a freeranging population, capture-mark-recapture, is prohibitively expensive for wide-scale practical use [12]. Consequently, researchers often use indices of relative abundance (footprint or trail counts, hunter harvest, etc.) which, while more affordable can be less reliable, and rarely define the true relationship between the index and the actual population [4].

The risks and penalties of invasive monitoring techniques
Invasive survey methods such as capturing pumas with dogs or snares are often used when estimates of absolute population size are needed, but can result in direct physical injury or death [13,14].
There are also less obvious consequences of invasive survey methods. For example, the repeated stimulation of the mammalian adrenocortical axis by external stressors in standard capture-mark-release procedures can result in profoundly negative effects on a range of physiological systems, including the immune and reproductive systems [15][16][17]. The process of immobilization, and particularly repeated immobilization, can also have unexpected effects on behaviour, including reduced ranges in black bears [18], reduced body condition in Polar bears [19], and changes in sex ratios of offspring in water voles [20]. In addition, radio and GPS collars can cause injuries or even death in various species, including: African wild dog [21], kit foxes, [22], mule deer [23] and black rhino [24]. A thorough review is provided by Murray & Fuller [25].
Conservationists are also increasingly aware of welfare and ethical issues in monitoring [26] and how invasive approaches might also compromise the validity of data they gather.

The use of indirect signs for monitoring
Indirect signs (footprints, scat, nests, etc.) can be the most effective and least expensive way to detect many animals [12,27]. Animal footprints are much more frequently encountered in the field than the animals themselves, and have served as the basis for population indices and estimators [12,28,29]. Footprint surveys (also called track surveys) are also non-invasive; the animal need not be seen, captured, or handled. underestimate carnivore numbers [7]. Because pumas lack distinguishing marks, accurate identification by camera-trap images with large sample sizes has proven difficult [6,33]. Genetic identification of individuals from scats and hair has also been used [34-36], but puma scats can be difficult to find in the field. Some researchers have successfully used scat detection dogs to improve detection rates [37] however, scat dogs require considerable training and care, factors that may be prohibitive for some managers.

Previous attempts to classify footprints by individual and sex
There are many accounts in the literature of efforts to solve this challenge by identifying individuals of a wide range of species from their footprints, including: fisher,

Previous puma footprint research
Researchers have attempted to use footprints to identify and sex individual pumas [57][58][59][60]. Smallwood and Fitzhugh [60] were the first, to our knowledge, to publish an objective mathematical method for discriminating individual puma footprints using measurements. Their method was based on measurements taken from footprint tracings made in the field and was successfully tested with nine free-ranging pumas. Grigione et al. [57] refined the technique developed by Smallwood and Fitzhugh [60] and successfully tested it with a known population of 10 pumas. Lewison et al. [58] validated the methodology presented by Smallwood and Fitzhugh with footprints made from plaster casts taken from the feet of 13 pumas. However, the rigidity of plaster casts differs from the flexible feet of living animals, and may not be an ideal substitute for natural field conditions. While these projects showed initial success, their small sample sizes and complex methodologies continue to limit broad scale field application.

The Footprint Identification Technique (FIT)
The Footprint Identification Technique (FIT) software enables the identification of pumas by sex and individual using a classification algorithm based on measurements of distance, angle and area taken between anatomically derived points on the footprint. The software was developed for black rhino monitoring [28,43] but has subsequently been adapted for a wide range of species [61].
To the best of our knowledge, the FIT presented here is the first system for identifying individual puma that is based on a large data training-set (535 footprints from 35 unique animals), and the first to have all analytical processes encapsulated in a software package with an integrated graphical user interface.
The FIT puma algorithm has built upon previous puma footprint classification approaches in the following respects: 1.8.1 Used a large training-set to develop a best-fit algorithm. The FIT software has derived a best-fit algorithm from a large training-set of footprints, in this case from 35 known individual puma (known individual puma used in previous studies: Smallwood and Fitzhugh [60] n = 9, Grigione [57] n = 3, Lewison [58] n = 13). The FIT algorithm specifies which footprint measurements (variables) are able to discriminate between individuals using a robust cross-validated discriminant analysis that feeds into a Ward's clustering model. 1.8.2 Extracted more data from each footprint. The FIT extracted more data from each footprint, thereby increasing the potential resolution and accuracy for individual classification. The software generated 123 morphometric variables from each footprint, including areas, lengths, and angles (variables analyzed in previous studies: Smallwood and Fitzhugh [60] n = 11, Grigione [57] n = 9, Lewison [58] n = 17) The large number of variables were used to develop a more robust training dataset than was previously possible.
1.8.3 Analysed more footprints per animal. To build the initial algorithm training-set database, the FIT used a mean of between 14 and 16 left hind footprints from each animal in order to adjust for individual footprint variability.
1.8.4 Developed an integrated software interface. The FIT employs a user-friendly integrated software interface to a new customized statistical model, providing minimal risk of subjective interpretation.
1.8.5 Integrated algorithm validation. The FIT software provides integrated algorithm validation in the form of sequential data holdout testing, by randomly apportioning the data into training and test sets.
1.8.6 Provided a standardized protocol for data collection. Data collection uses a simple, standardized digital protocol for photographing footprints [28].
The overall aim of this research was to demonstrate the potential utility of a new monitoring tool for widespread application in large felid populations, using the puma as a model. This was successfully met.

Ethics statement
No invasive techniques were used in this research. Footprint images were collected from captive puma in zoos, animal rescue facilities and animal sanctuaries, all on private land. The owners of the captive puma facilities gave permission to study on their sites. No further formal or informal approvals or permits were needed from an animal ethics committee because all procedures carried out were non-invasive..

Safety statement
When working with captive habituated pumas, two handlers were present at all times, safety gates were employed to ensure that the animals were kept out of areas when footprints were being photographed.

The methodological basis of the Footprint Identification Technique (FIT)
The basis of the individual classification methodology has been described in full by Jewell et al. [62] and Alibhai et al. [28]. Jewell et al. (62) provides an additional video description (S2 Video) of the entire methodology and analytical processes in FIT, using the cheetah (Acinonyx jubatus) as an example.
The FIT is able to identify individuals from their footprints on the basis of morphometric variation, in the form of footprint measurements, between them. The initial classification algorithm is extracted from a training set of footprints from known individuals. The algorithm specifies those footprint measurements that are able to discriminate between individuals using a robust cross-validated discriminant analysis that feeds into a Ward's clustering model [63].
The footprint identification technique is species specific because each species has a unique foot morphology. Thus, each species has a unique algorithm, extracted from a training dataset of footprints from known individuals, in this case puma. Once the species algorithm has been field-validated, it can be applied to identify and monitor unknown free-ranging individuals of that species.
Classification algorithms in FIT are based on Fisher's canonical variates [64] that generate centroid plots [43]. A forward stepwise regression selects the footprint measurements that have the highest discriminatory power, according to their F-ratios. Stepwise regression is an approach to selecting a sub-set of effects for a regression model, in which a model is generated by progressively adding or removing variables based on the t-statistics of their estimated coefficients.
The first two canonical variates are then constructed, from the stepwise selected measurements, to map the trails in a two dimensional space. The centroid values (multivariate leastsquare means) and 95% confidence interval ellipses are plotted for each trail. This combination provides the classifier for individual identification. If the 95% ellipses overlap, then these two testing trails belong to the same individual. If they don't overlap, they belong to two individuals (Fig 1). However, we found that the distance between the centroids also depends on the matrix of within-group variations and the relative-position vector of the centroids, so that any changes to a testing set (adding or removing individuals) could change the positions of the centroid values and their ellipses. To eliminate this potential problem and ensure that the distance between centroids was an effective classifier, we [43] proposed two modifications. First, we limited the centroid plot comparison to pairs of trails, comparing only two trails at one time. Second, we developed a "reference centroid value (RCV)" entity, built from all the other known individuals in the library. This functioned as a reference point in the canonical space, to stabilize the location of any test groups with respect to each other [28,62]. We used a standard linear discriminant analysis, coded within the FIT software, to classify by sex.
Three elements of the FIT model must be adjusted to optimize the output algorithm from a training set:

The number of footprint measurements (variables).
Too many footprint measurements may lead to increased separation of trails that are known to be from the same animal (self-trails) and too few footprint measurements may lead to overlap of ellipses of trails known to be from different animals (non-self trails), resulting in an estimate of too few individuals. The puma algorithm was optimized with 14-20 measurements.
2.3.2 The contour probability. This is used to determine the size of the ellipse (the confidence interval around the centroid value), which in turn determines if there will be overlap or not with the ellipse from the other test set in the pairwise comparison.

The threshold value (Ward distance).
Using the above elements, we generated a cluster dendrogram in JMP using the Ward distance. The threshold value that gave the highest level of classification accuracy was used to generate a predicted value for the number of puma in the dataset of known individuals.
The above elements, once optimized, describe the best-fit algorithm for the footprints taken from captive puma, which, after validation using comprehensive sequential holdback trails, can be used to predict the number of puma from unknown populations.
The FIT software runs as an add-in to JMP data visualization software, using the JMP Scripting Language (JSL), that can be customized for any species [65].

Collecting footprint images
The protocol for collection of footprint images of large carnivores is described in manuscript and video in Jewell et al. [62] (Supplementary material S2) To summarize, the first step in collecting reference images from captive puma is to lay a sand path for them to walk over. Common builder's sand is acceptable, laid to a depth of about 1cm, and width of 2-3m giving the animal plenty of space to walk comfortably. Captive animals are often most comfortable walking along a perimeter fence, or familiar path. Garden tools are used to rake and flatten the sand, and sprinkle with water if too dry.
There are several critical steps necessary for the success of the protocol. First, sand paths must be prepared correctly and the animal led over the sand at a normal relaxed walking pace. When photographing the footprints, the photographer must be directly overhead of the center of the print. Often it is useful to have a second observer to check this. Lastly, it is essential that the photographer (or an assistant, who might be an expert tracker) be able to identify a left hind puma footprint on the ground, and have the skill to follow the trail of footprints forwards or backwards along the line of travel. The purpose of collecting footprints from captive animals was to build the reference dataset for the creation of the FIT algorithm. The FIT has been designed to provide optimal accuracy without the need to collect prints from all four feet. Specifically, we collected left hind footprints only. Hind footprints were preferred because they frequently step on top of (register with) front prints and the left side was chosen arbitrarily. We were able to identify left hind footprints by observing clear morphological characteristics in the footprints (i.e., asymmetry in toe position). Puma feet are asymmetrical in the same manner as the human hand, where the middle finger (toe 3) is longer than the rest. The thumb (toe 1) does not make contact with the ground or register in the footprint. The toes also have a tendency to be slightly skewed to the inside of the trail. Hind footprints tend to be longer and narrower than the fronts. The heel pad is usually narrower and smaller. Hind prints are also more symmetrical than the fronts.
Only footprints with clearly visible toe and heel edges were used. Footprints exhibiting distortion were discarded (Fig 2) We photographed footprints according to the FIT protocol [62] that requires images be taken from directly above the footprint, a ruler should be placed along the X and Y-axes, and a photo ID slip included in the image (Fig 3) which should include (if known) footprint number, trail number, date, time and location point. Since a scale (ruler) was included in each image, there was no specification for the height at which an image should be taken, only that the image, photo ID slip and scale should fill the camera frame. No level was used, but we recommend inexperienced photographers engage a second person to observe that the camera lens is angled parallel to the ground. Natural lighting conditions varied and so footprints were shaded before photography to avoid partial-shadows or challenges with contrast. Camera pixel resolution and lens settings must be such that mm increments on the ruler and text on the photo ID slip are clearly visible.

Free-ranging pumas.
We tested the FIT with footprints from free-ranging pumas to validate the FIT for monitoring unknown free-ranging animals and to confirm that substrates in situ would provide sufficient quality of prints.
Following the same photo protocols, footprint images were obtained from free-ranging pumas in Texas and California. To ensure that each trail came from a unique individual, we did not collect more than 1 trail from each geographic region.

Extracting features from footprint images.
The following process has been described in detail both in manuscript and video formats for cheetah [62] and will be summarized here.
Footprint images are labeled according to the photo ID slip in the image, and filed on the desktop. Statistical analyses are performed in the FIT software, an add-in to JMP data visualization software. The FIT software add-in is opened in JMP and the first footprint image prepared for processing in the 'Image Feature Extraction' window. The footprint is rotated and adjusted for depth if necessary and two scale points are identified. Twenty-five landmark points are then placed sequentially on the footprint guided by the feature extraction template on the left of the window. The script then generates a further fifteen 'derived' points to augment the number of measurements taken (Fig 4). All data from the Photo ID slip are entered into the image feature extraction window. The x.y coordinates for each landmark and derived

Development of the footprint identification technique algorithm for the puma.
Each footprint generates a single row of 128 measurements of distances, angles and areas (Table 1). S1 Table provides the full dataset of footprint measurements. From the total data training-set of all the footprints we generate an algorithm, based on those measurements that is able to discriminate between individuals [62].
A pairwise robust cross-validated discriminant analysis is then performed. From the main FIT window, the 'robust cross-validated pairwise analysis' window is selected. The model scripted into the analysis uses a classifier to determine the likelihood of a pair of trails belonging to the same individual or two different individuals. We then conduct a pairwise comparison of trails. This is detailed in Jewell et al. [28,62], using the training database of known individuals, and summarized below.
The FIT system produces two outputs, an assigned self/non-self table to describe the classification distance between each validation pair, and a classification matrices window showing the different trails selected for comparison. The software calculates the distances between any two trails, and produces a cluster dendrogram to classify them. The user can test the accuracy of classification by adjusting the number of variables and the contour probability. Adjusting the threshold value identifies the algorithm that consistently gives the highest accuracy to be selected. The numbers 01-25 refer to the landmark points and 26-40 to the derived points (see Fig 5).
Variables 81-88 were generated at the intersection of two vectors e.g. V81 refers to vector from points 01 to 05 and 09 to 13.
A full holdback trial for validation is then performed To validate the algorithm, we divide the dataset into training and test sets, then conduct comprehensive sequential 'holdback' trials. Holdback is a validation method that randomly divides the original data into 'training' and 'test' data sets so that different compositions and sizes of subsets of 'test' trails can be held back as 'unknown' and classified against the remainder of the dataset (the training set).
Individual puma in the dataset are apportioned randomly to either the test or training sets. Holdback trials are then conducted to assess the accuracy of the algorithm for the expected number of individuals and the clustering classification.

FIT classification by sex
A standard discriminant analysis in FIT is used to classify footprints by the sex of the animal that made them.
We excluded the possibility of sex determination being linked to age by using a wide range of puma ages, and testing for age effects. For sex identification, we used 535 footprints from 35 captive pumas, 19 females (300 footprints) and 16 males (235 footprints). Animal ages ranged from 1 year to 20 years for both sexes without significant age differences between groups (Female mean age 9.13 yrs, Male mean age 12.63 yrs (Table 2).
We used linear discriminant analysis in JMP to identify a linear combination of variables that characterize or separate different classes of objects, in this case sex. We used the JMP The varying test set size was plotted against itself (red), the predicted value for each test size iteration (black) and the mean predicted value for each test size (blue). Optimal classification accuracy was obtained when the test set size was smallest relative to the training set. However, the robustness of the model was demonstrated by the predicted value being close to the expected value, even when the test set was considerably larger than the training set (24:11). Stepwise Variable Selection function to select the variables that provided best discriminating power based on their F ratios. This function also has the added advantage of excluding highly correlated variables. By plotting the number of variables included in the analysis against the predicted level of accuracy of sex identification, we were able to establish the number of variables that would provide an effective predictive model.
To validate the level of accuracy of sex discrimination using linear discriminant analysis, we used 2 holdout methods, Jackknife and partitioning the data into training and test sets. Jackknifing tests sequential subsets of data, excluding every footprint in the data set in sequence, where partitioning splits the data to enable testing of one set against an algorithm derived from the other.
An algorithm derived from all the variables, consisting of the 10 best discriminating variables generated by the captive puma footprint data was used as a predictive model to determine sex for unknown free-ranging puma footprint data.

Classification by individual
We collected 535 footprints from 35 individuals (19 females and 16 males). The number of puma individuals, the number of footprint images collected, the range of footprints per puma, the number of trails, the range of trails per puma and the mean and standard error of age per group are shown in Table 2.
The accuracy of individual identification was assessed using two different methods: 3.1.1 Data validation model accuracy. Using the holdback method, we conducted 10 random iterations for each training/test subset size (Fig 5). The varying test set size was plotted against itself (red), the predicted value for each test size iteration (black) and the mean predicted value for each test size (blue). We demonstrate that even when the footprint test set size was increased considerably in relation to the training set size, the mean predicted value proved similar to the expected value.
Optimal accuracy in classifying individuals was obtained when the test set consisted of 4 animals relative to the training set of 31 animals, but even when the test set was 28 animals and the training set 7 animals, classification accuracy was 93.2%.
3.1.2 Clustering accuracy. This is an assessment of how often individual footprint trails were assigned to the correct individual. A dendrogram (Fig 6) shows three trails out of 77 were misclassified, giving an overall individual accuracy for correct trail placement of 96.11%. The FIT model also allows for an assessment of relative likelihood of estimated puma numbers using a slider (Figs 7 & 8). The slider position is determined from a hypothesis test where the null hypothesis is the number of individuals set by the slider. Footprints collected from two additional free-ranging animals were then included in the test sets, and accurately separated out as two individuals (Fig 9). The number of individual puma from whom footprint images were collected, their sex, their mean age (range), the total number of footprint images collected, the mean number of footprints per individual (range), the number of trails, and the mean number of trails per individual (range).

Classification by sex
The variables with the highest discriminating power were selected using the stepwise selection feature in JMP. To avoid using too many variables and over-fitting we used linear discriminant analysis to plot the number of variables against the sex-prediction for all the footprints. The optimal number of variables for individual classification was 20 (Fig 10). The classification by sex was 99% accurate from a single footprint. In addition, two freeranging animals were classified; the first as a male (from Camino Cielo, Los Padres National Forest in California) and the second as a female (from Big Bend National Park in Texas) using the algorithm developed from the captive animals (Fig 11).

Potential applications
The FIT is an adaptable non-invasive tool. Footprints can be collected systematically in a small closed population to obtain an absolute count, or opportunistically throughout a larger or Identifying individual pumas from footprints unfenced range area to obtain a minimum known alive (MKA) population estimate. Footprints can also be used as a 'mark' in a capture-recapture (CR) population estimate. Work is underway to validate the FIT for puma in snow substrates, as has already been achieved for Amur tiger [56], giant panda and Polar bear (Jewell, pers.comm.).
Footprints are generally the most ubiquitous animal sign, and data collection is simple, inexpensive and accessible to a range of data collecting groups from expert trackers and professional conservation biologists, to citizen scientists. The FIT can also be used synergistically with camera-traps. While trail cameras are of limited value for identifying individual pumas [66], the FIT could be combined with trail cameras to improve confidence. For example, the substrate on the ground covered by a camera trap can be prepared to allow footprints to be collected as the animal walks through. If traps are checked frequently (at least every 24 hours) then whole animal images could be matched to footprints.

Potential limitations
This methodology requires good quality footprints, which require suitable impression-holding substrate. In some environments it may not be possible to find enough high quality footprints for this methodology to be easily applied. However, it may still be possible to create "footprint traps" by placing sand in travel corridors or by raking and smoothing the existing substrate.
While limited tracking skills are needed to collect photos for the FIT, it is still necessary for trackers to be able to accurately identify puma left hind footprints. Even experienced biologists can struggle with footprint identification [27]. However, in our experience, most observers were able accurately identify left hind puma footprints after a day of training. The benefit of using digital images to collect the footprints is that they can be independently verified for accuracy at a later time by a skilled observer.

Future development
This paper has demonstrated the development of the FIT puma algorithm from captive puma, and a potential application to monitor free-ranging puma and other large felids. The next step is to undertake wider field validation of free-ranging puma on various substrates.
Supporting information S1 Table. WildTrack mountain lion footprint dataset (electronic file). The full set of measurements taken from each footprint of each individual. (XLSX) S1 Video. Spotting cheetahs: Identifying individuals by their footprints. A step-by-step guide to using the Footprint Identification Technique. (MP4)