The Spatial Distribution of Mustelidae in France

We estimated the spatial distribution of 6 Mustelidae species in France using the data collected by the French national hunting and wildlife agency under the “small carnivorous species logbooks” program. The 1500 national wildlife protection officers working for this agency spend 80% of their working time traveling in the spatial area in which they have authority. During their travels, they occasionally detect dead or living small and medium size carnivorous animals. Between 2002 and 2005, each car operated by this agency was equipped with a logbook in which officers recorded information about the detected animals (species, location, dead or alive, date). Thus, more than 30000 dead or living animals were detected during the study period. Because a large number of detected animals in a region could have been the result of a high sampling pressure there, we modeled the number of detected animals as a function of the sampling effort to allow for unbiased estimation of the species density. For dead animals -- mostly roadkill -- we supposed that the effort in a given region was proportional to the distance traveled by the officers. For living animals, we had no way to measure the sampling effort. We demonstrated that it was possible to use the whole dataset (dead and living animals) to estimate the following: (i) the relative density -- i.e., the density multiplied by an unknown constant -- of each species of interest across the different French agricultural regions, (ii) the sampling effort for living animals for each region, and (iii) the relative detection probability for various species of interest.


Introduction: Aim of this document
In this document, we detail the calculations carried out to estimate the relative density of six species of Mustelidae in every small agricultural region (SAR) in France. We provide the data, the C code, and the R code to fully reproduce the analyses carried out for the paper. We also provide additional elements related to the model-fitting process. This appendix is written with Knitr (XIE, 2013), a system allowing to include R code in a L A T E X document (LAMPORT, 1994).
More precisely, in this appendix: • We import, organize and format the data for the fit; • We describe three available fitting algorithms; • We fit a first model, using a measure of the sampling effort for the dead animals supposed to be constant within the department (i.e. the measure described in the paper). We use a cross-validation approach to select the optimal value for the penalty parameter; • We fit another model, using an alternative measure of the sampling effort for the dead taking into account the officers behaviour. We select the optimal value for the penalty parameter using a cross-validation approach; • We use another cross-validation approach to compare the two possible measures of sampling effort, and we show that the simplest one, i.e. the measure supposed to be constant within a department, is the best choice; • We use the same type of cross-validation approach to compare the model based on a spatial and environmental regularization with a model based on a purely spatial regularization. We show that accounting for both the spatial and the environmental autocorrelation allows to reduce the prediction error significantly; • We study the residuals and the goodness of fit of the final model; • We estimate the coefficient of variations associated to the estimated relative densities using a bootstrap approach.
• We illustrate the use of this modelling approach for exploratory purposes (i.e. to identify the main distribution patterns in space), as pointed out at the end of the discussion of the paper.
We also provide an R package named scsl containing the datasets used in the paper as well as the R functions implementing the models. Because the calculations carried out in this paper are often cumbersome, we have programmed the relevant algorithms in C. The R functions available in the package are just interfacing the C procedures. Thus, the model used in the paper is described on the help page of the function penalizedmodel(). The function crossvalidation() can be used to identify the optimal value of the penalty. The function bootstrapcisstat() can be used to estimate the standard errors of the estimates with a bootstrap approach.
The data collected under the small carnivorous species logbook program are available in the following datasets: (i) the dataframe scsl contains the data related to the detected animals, (ii) the dataframe carsscsl contains the data related to the cars working in this program, (iii) the dataset department contains a map of the French department, and (iv) the dataset SAR contains a map of the small agricultural regions in the department.
We first describe how to install the package scsl provided with this appendix. This package contains source C code, and this code needs to be compiled at the install time. Therefore, the operating system of the reader needs a C compiler. Such compilers are usually available on Linux. On Mac OS X, the developers tools should be installed (they come in the Mac OS X installation DVD). However, there is no C compiler available by default on Windows. Windows users will need to install the Rtools to reproduce the calculations of this document. These tools can be downloaded at the following URL: http://cran.r-project.org/bin/windows/Rtools/. Select the suitable version corresponding to the version of the R software used by the reader. Then install the program, and select the checkbox "add to the variable PATH" during the installation process.

The data
We first load the data required for the analyses. The datasets SAR and department store the maps of small agricultural regions and departments respectively (objects of class "SpatialPolygonsDataFrame"):

data(department) data(SAR)
We can plot these spatial data, and thereby reproduce fig. 1B of the paper: plot(SAR, border = "grey") plot(department, add = TRUE, lwd = 2) We also load the datasets scsldetect, which contains the information on the detected animals, and carsscsl, which contains the information on the cars: Additional information on these datasets is available on the help pages of these datasets. For example, for information about the variable names in the data.frame scsldetect, type: help("scsldetect")

Formatting the data
First, we prepare the data for the model fit. We will fit the model with the function penalizedmodel. The format required for this fit is described on the help page of this function:

help("penalizedmodel")
We build the data.frame required for the analysis below. This data.frame contains the number of detections for every combination of the factors species, small agricultural region (SAR) and status. It also contains the area of the SARs: ## The area is added to the data.frame dfdata <data.frame(dfdata, Sj = area) We also calculated the environmental and spatial proximities π jm , for every pair of SARs j and m. Recall that we defined these proximities in the following way: (i) π jm = 1 when the SARs j and m are spatially adjacent and belong to the same large agricultural region, (ii) π jm = 0.5 when the SARs j and m belong to the same large agricultural region but are not spatially adjacent, or are spatially adjacent but do not belong to the same large agricultural region, and (iii) π jm = 0 otherwise. We calculated the J × J proximity matrix below: ## First step: matrix storing the neighbouring ## relationships Makes use of the functions of spdep ## and ade4 library(spdep) library(ade4) pg <-neig2mat(nb2neig(poly2nb(SAR))) ## Second step: identify the large agricultural ## region to which each SAR belong. The SAR code ## stores this information. Indeed, this code is: ## Department (two digits) + large agric. region (3 ## digits) We get the 3 last digits larcode <substr(SAR@data$codeSAR, 3, 5) ## Calculates a matrix storing, for each SAR, if it ## belongs to the same large agric region as the ## other SARs aprgr <sapply(1:nrow(SAR), function(i) as.numeric(larcode == larcode[i])) ## The diagonal is equal to 0 diag(aprgr) <-0 ## Calculate the proximity matrix prx <-(pg + aprgr)/2

Two possible measures of effort
As indicated in the paper, we considered two alternative measures of observational effort. Because we know the number of kilometers traveled by each car (stored in the data.frame carsscsl), we know the observational effort at the scale of the department. However, we need a way to attribute a sampling effort at a smaller scale, i.e. the SAR scale.
2.3.1 A sampling effort uniformly distributed within the department: measure V (1) j The first possible measure that we considered is the measure described in the main paper. We supposed that the sampling effort was uniformly distributed within a department. In other words, the sampling effort of a given SAR 7 was calculated with: where B d is the total number of logbooks available in the department d during the study period, R d is the number of logbooks available in this department for which the yearly distance traveled was recorded, k c is the number of kilometers traveled by the car c, S j is the area of the SAR j, and D is the set of SARs belonging to the department d (see the main paper). We calculated this effort for each SAR below: ## Calculation of the observational effort Vj <sapply(SAR@data$codeSAR, function(y) { ## Identifies the department code from the SAR code ## (the first two numbers of the SAR code are the ## department code) x <substr(as.character(y), 1, 2) ## Departments of the south of the Parisian region ## are pooled together if (any(c("91", "95") == x)) x <-"78" This sampling effort is stored in the vector Vj1.
2.3.2 Another measure taking into account the officers behaviour: measure V (2) j We also considered an alternative measure V (2) j of effort. This alternative measure accounted for the fact that a given team of officers generally works on a restricted area within the department. For a given car, the activity is therefore not homogeneously distributed within the department. We have therefore measured the sampling effort for each SAR using the following approach: • For each car with at least 5 detected animals, we have calculated the center of gravity of the observations (all years pooled), as well as the distance between the observations and this center of gravity; • We have associated a center of activity to each car. When the car had detected at least 5 animals, the center of activity corresponded to the center of gravity of these observations. Otherwise, this center of activity corresponded to the center of gravity of another car randomly sampled within the same department, • The distribution of the distances between the animals detected by a car and its center of gravity was lognormal with a mean and a standard deviation depending on the department. Therefore, under this hypothesis and knowing the center of activity of every car, we could estimate a two-dimensional function giving the probability density of presence of each car at each point in every department.
• From this two-dimensional function, we estimated the proportion of the time spend by each car in every SAR of the department. Moreover, we generally knew the total number of kilometers traveled by a given car a given year (when this number was missing, we randomly sampled this value in the set of available mileages in the department). We multiplied this number of kilometers with the proportion of the activity time spent by this car in a given SAR to obtain the number of kilometers traveled in the SAR; • Finally, we have divided this number of kilometers by the area of the SAR. This standardized number of kilometers is our measure of effort.

The algorithms used for model fitting
Because our modeling approach is cumbersome to implement, we have chosen to program it with the C language after prototyping in the R language. We have considered three different ways to program this model. Indeed, because we wanted to use this approach intensively (cross-validation, bootstrap), the speed comparison allowed us to choose the fastest program. Moreover, the fact that the three programs returned the same results indicated that we did not made any programming error.
The speed comparison favoured the BFGS approach (see below), which we used in the rest of this document. We nevertheless also describe the two other available algorithms in this section. Indeed, all three optimization approaches are implemented in the function penalizedmodel and are therefore available to the reader. As noted on the help page of this function, the type of algorithm can be chosen by setting the option typealgo in the list passed to the parameter options of this function: typealgo The type of algorithm used for the fit: value 1 = steepest descent; value 2 = mixed approach; value 3 = BFGS approach; value 4: approach consisting to use a steepest descent approach until a convergence evaluated with a threshold equal to '10^-6', then a BFGS to reach convergence for the threshold defined with 'stopCrit'. Default to 3.

Steepest descent algorithm (typealgo = 1)
Recall from the main text that minus the log-likelihood of the model has the following form: We use the same notations as in the main text here. Adding the regularization constraint, the penalized negative log-likelihood is: Let θ be the vector containing the unknown parameters a i j , e j2 et p i2 . Let C(θ) = − log L + P be the criterion being minimized by the fit. The gradient vector ∇C(θ) associated to this penalized negative log-likelihood is equal to: We develop the criterion C(θ) with a second order Taylor expansion in the neighborhood of θ : which can also be written: Because this criterion is convex in θ, it follows that the last element of this equation is greater than 0: In other words, subtracting h∇C(θ) to every value of θ allows to reduce the value of this criterion, when h is small. This suggests a means to find a value of θ that minimizes this criterion: 1. We select an inital value for the vector θ 1 , a small value for h (we have chosen h = 10), and a threshold s below which we consider that the algorithm has converged (s = 10 −8 ).

We calculate
5. If (C 1 −C 2 )/|C 1 | > s, this indicates that the algorithm has not converged yet. We set θ 1 = θ 2 , C 1 = C 2 , and we start the procedure again at step 3.
6. If C 2 > C 1 , this means that the value of h is too large. We set h = h/2 and we start again the procedure at the step 3.
This algorithm is used very commonly to find a solution for problems of this type (steepest descent algorithm, cf. NOCEDAL et WRIGHT, 2006, p. 21).

"Mixed" approach (typealgo = 2)
We have implemented another approach, which consists to update the parameters e j2 , p i2 and a i j in turn. Indeed, we have developed the equation of the gradient vector associated to the penalized negative log-likelihood in the previous section. The criterion to be minimized being convex, this gradient vector takes a value equal to zero for the solution θ of the problem. Then we can find the solution by using the following algorithm: 1. Initialization: Definition of θ 1 containing the starting values for the parameters a i j , e j2 et p i2 . We set them equal to the estimations of these parameters under the classical non-penalized maximum likelihood. We define a small value for h (h = 10), an a threshold s below which we consider that the algorithm has converged (s = 10 −8 ).
2. We calculate the criterion for the current values of the parameters: 3. We update the parameters e j2 . From equation (2), we can update these parameters conditionally on the values a i j and p i2 . Indeed, the value of e j2 that minimizes the penalized negative log-likelihood conditionally on these parameters can be calculated exactly, noting that the first derivative of the log-likelihood is equal to zero for the solution. Then: 4. We update the parameters p i2 . Similarly, the value of p i2 minimizing the criterion conditionally on the values of a i j et e j2 can be calculated exactly, noting that the first derivative of the log-likelihood is equal to zero at the solution. Then: And we define p 12 = 0; 5. We update the parameters a i j . Contrarily to the two previous parameters, we cannot find analytically an exact solution to this minimization problem. We therefore use a steepest descent algorithm (similar to the algorithm developed in the previous section). More precisely, for each a i j , we calculate the value of the first derivative of the penalized log-likelihood, given in the equation 1: and we update the value de a i j by defining the new value equal to a i j − hd.
6. The updated values are stored in the vector θ 2 . We calculate the penalized negative log-likelihood associated to this updated vector θ 2 : C 2 = C(θ 2 ).
7. If (C 1 −C 2 )/|C 1 | > s, this means that the algorithm has not converged yet. We set θ 1 = θ 2 , C 1 = C 2 , and we start the procedure again at step 3.
8. If C 2 > C 1 , this means that the value of h is too large. We set θ 1 = θ 2 , et h = h/2 and we start the procedure again at step 3.

Quasi-Newton algorithm -BFGS (typealgo = 3)
The quasi-Newton algorithm is a common approach identify the solution to such optimization problems (NO-CEDAL et WRIGHT, 2006, p. 22 and following). This algorithm can be simply understood by considering the Taylor expansion of the penalized negative log-likelihood, described in the previous sections: If we suppose that ∇ 2 C(θ) is positive definite (which is actually the case, given the convexity of the criterion to be minimized), then the solution is found by searching the value of p that minimizes this function. We can find this solution by identifying the value of p that cancels the value of m k (p). This solution is: The solution direction p N k is named Newton direction, and is generally different from the steepest descent direction. This direction can be used to find the solution immediately when the difference between the criterion and its quadratic model (i.e., the Taylor expansion) is equal to zero. When this difference is small, a small number of iterations of this algorithm allows to find the solution. One great advantage of the Newton direction in comparison to the steepest descent direction is that there is no need to set a step size.
The main drawback of this approach is that it requires the calculation of the Hessian matrix ∇ 2 C(θ), which can be cumbersome, in particular when the number of parameters increases. For this reason, we use another direction, the Quasi-Newton direction, which relies on an approximation B k of the Hessian matrix. This approximation is updated after each iteration k of the algorithm. The most common formula to calculate this matrix is the BFGS formula (named after its inventors: Broyden, Fletcher, Goldfarb, and Shanno): For additional details, see NOCEDAL et WRIGHT (2006, p.22).
This approach is implemented in the R function optim. This function relies on a C procedure available to C programmers through the API of the R software (see http://cran.r-project.org/doc/manuals/R-exts. html, section 6.8). Our function penalizedmodel actually relies on this C procedure. In this section, we consider the measure of effort V j . We first have used the function crossvalidation to identify the most suitable value of the penalty parameter ν.
Although we only present one criterion to measure the prediction error in the paper, we have considered four possible criteria to select the best value of the penalty parameter ν in our calculations. We presented in our paper the only theoretically funded and stable measure of the prediction error. However, we describe the other criteria that we used in this appendix. In all cases, the criterion was calculated with: with g = 1, ..., 4 defining the four possible criteria: • g = 1: this criterion corresponds to the predicted log-likelihood of the validation dataset (this criterion is the one presented in the paper), i.e.: Again, we use here the same notations as in the paper.
• g = 2: this criterion is the sum of squared differences between predictions and observations: ik − log(R − 1) 2 • g = 3: the chi-squared statistics: • g = 4: the sum of the squared differences between observations and predictions divided by the predictions +1: As noted above, the implementation of the cross-validation revealed that the criterion Q 1 was the best measure of the prediction error, so that we only presented the results for this criterion for the sake of concision.

Implementation of the cross-validation approach
We implemented the cross validation approach below (Warning! these calculations are very slow!!! it took 4 hours on a PC with a processor Intel © core I5): vv <sapply ( ## Then we make sure that the variable SAR in the ## data.frame scsldetect has the exact same set of ## levels as the first column of dfVj scsldetect$SAR <factor(scsldetect$SAR, levels = levels(dfVj[, 1])) ## And we make sure that the other vectors are ## factors scsldetect$Species <factor(scsldetect$Species) scsldetect$Status <factor(scsldetect$Status) scsldetect$Year <factor(scsldetect$Year) cvpra <crossvalidation(scsldetect$Species, scsldetect$SAR, scsldetect$Status, scsldetect$Year, dfVj, vecnu = c(seq(0, 0.5, by = 0.1), seq(0.6, 2, by = 0.2)), proximities = prx, control = list(verbose = TRUE, stopCrit = 1e-07)) The object cvpra contains all the information required for the calculation of the four criteria, as well as their decomposition per species. The function global.crossvalCisstat is used to derive the global criteria Q g (ν) from this information: For this criterion, we obtain results pretty similar to those obtained for the criterion Q 1 . For most species, the value ν = 0.2 is optimal.
Given the previous results, the prediction error seems to be reasonably small for all species and all criteria with ν = 0.3. This is the value that we have chosen to fit our model.

22
We can map the estimated relative densities for each species:

Selection of the penalty parameter ν by cross-validation
We have fitted our model conditional on the measure of effort V (2) j , i.e. under the more complicated model of the distribution of effort within a department. We used an approach identical to the one described in the previous section: ## We recalculate the dataframe dfVj with this ## alternative measure of effort dfVj <data.frame(nom = factor(SAR@data$codeSAR), Vj = Vj2, Sj = vv2) ## Then we make sure that the variable SAR in the (2) J 25 ## data.frame scsldetect has the exact same set of ## levels as the first column of dfVj scsldetect$SAR <factor(scsldetect$SAR, levels = levels(dfVj[, 1])) ## And we make sure that the other vectors are ## factors scsldetect$Species <factor(scsldetect$Species) scsldetect$Status <factor(scsldetect$Status) scsldetect$Year <factor(scsldetect$Year)

glo2 <global.crossvalCisstat(cvpra2)
We can plot these criteria for the different values of ν:

27
For the weasel, this criterion suggests that a stronger regularization is required. On the other hand, the criterion Q 4 suggests that a penalty roughly equal to 0.2-0.3 is a good choice for this species: plot(cvpra2, withgroups = F, xlim = c(0.1, 1), criterion = 4) (2) J

28
For the sake of simplicity, we have considered the penalty parameter ν = 0.3 for the model based on this measure of effort, which corresponds to small prediction error for all species.

Implementation of a cross-validation approach
To identify the most suitable measure of the sampling effort for the dead, we have implemented a cross-validation approach. We have considered the even years (2002,2004) and the odd years (2003,2005) separately. We have fitted a model using the data collected during the even years and we have measured the prediction error using our four criteria on the odd years, and conversely.
7 Is a purely spatial regularization a better choice to fit our model?

Implementation of a cross-validation approach
As noted in the paper, we maximize a regularized log-likelihood to account for a spatial and environmental autocorrelation. In this section, we compared this approach to an approach where a purely spatial regularization is implemented. More precisely, we tried to maximize the following quantity: where σ jm is a measure of purely spatial proximity between the SAR j is adjacent to the SAR m (the other parameters are described in the paper). We defined σ jm = 1 when the SARs j and m are spatially adjacent, and σ jm = 0 otherwise.
First, we implemented a cross-validation approach as in the previous sections to select the best value of ν, i.e. the value that maximizes the criteria defined in the section 4.1.1: ## The neighbouring matrix pg <-neig2mat(nb2neig(poly2nb(SAR))) ##  We now examinate the global results: glospa <global.crossvalCisstat(cvpraspa) plot(glospa) In this case, the value ν = 0.2 was characterized by a small prediction error. We considered this value to fit our purely spatial model.
We therefore considered, in the rest of this document, the model fitted in the section 4.2 (i.e. the model fitted using both spatial and environmental regularization, using the effort measure V j ). We study this model in detail in the next section.
8 Model examination

Model residuals and overdispersion
In this section, we studied the residuals of the model: res <residuals(modVj1) pre <predict(modVj1) ## We present only the predicted values >0 ## (residuals associated to predicted 0 are all = 0) plot(pre[pre > -20], res[pre > -20], xlab = "predicted log-densities", ylab = "standardized residuals") These residuals do not present any problematic pattern, except a very small number standardized residuals greater than 6. This indicates a very minor overdispersion in the data. We studied more precisely this overdispersion in our data.
Thus, a first check of the model consists in plotting the absolute value of the Pearson's residuals as a function of the predicted values. Indeed, the fact that a variable Y follows a Poisson distribution implies that: We have calculated the standardized Pearson's residuals (see p. 141 in CAMERON et TRIVEDI, 1998), which are expected to have a constant variance when plotted as a function of the predicted values. Therefore, the absolute value of the standardized residuals are expected to have a constant mean, when plotted as a function of the predicted values. We show this plot below, which allows to identify possible relationships of the form: with h > 1. We also added a lowess smoothing to this plot, to enhance the visualization of the relationship between these two quantities: plot(exp(pre[pre > -20]), abs(res[pre > -20]), xlab = "predicted densities", ylab = "|standardized residuals|", xlim = c(0, 110)) lines(lowess(abs(res[pre > -20])~exp(pre[pre > -20])), col = "red", lwd = 2) The standardized residuals clearly have a constant variance! However, the constant variance of the residuals on this plot could have been obtained if: with h > 1. We therefore calculated the dispersion parameter h. Thus, we calculated the Pearson's statistic (sum of squares of Pearson's residuals), along with the number of degrees of freedom: chi2pearson <sum(res^2) numberDegreesFreedom <length(res)length(c(modVj1$aij))length(modVj1$ejk[, 1])length(modVj1$pi2) resultsover <round(c(Chi2Pearson = chi2pearson, NDegreesFreedom = numberDegreesFreedom)) resultsover Thus, if we considered the overdispersion in our model, the standard errors of the estimated parameters would be inflated by 100 × ( √ 1.2 − 1) ≈ 10%, which is negligible given the other sources of error in our model (Note that LINDSEY (1999) proposes, as a rule of thumb, that the overdispersion should be accounted for when the estimated dispersion parameter is lower than 2: in our case, the dispersion parameter is much lower than 2!). Moreover, note that the bootstrap approach that we used to estimate the precision of our estimates (see section 8.3) would allow to account for the overdispersion in the estimation of the precision if it was present in the data.
It is clear that the Poisson model is a reasonable model for the process studied here.