Bioacoustics for in situ validation of species distribution modelling: An example with bats in Brazil

Species distribution modelling (SDM) gained importance on biodiversity distribution and conservation studies worldwide, including prioritizing areas for public policies and international treaties. Useful for large-scale approaches and species distribution estimates, it is a plus considering that a minor fraction of the planet is adequately sampled. However, minimizing errors is challenging, but essential, considering the uses and consequences of such models. In situ validation of the SDM outputs should be a key-step—in some cases, urgent. Bioacoustics can be used to validate and refine those outputs, especially if the focal species’ vocalizations are conspicuous and species-specific. This is the case of echolocating bats. Here, we used extensive acoustic monitoring (>120 validation points over an area of >758,000 km2, and producing >300,000 sound files) to validate MaxEnt outputs for six neotropical bat species in a poorly-sampled region of Brazil. Based on in situ validation, we evaluated four threshold-dependent theoretical evaluation metrics’ ability in predicting models’ performance. We also assessed the performance of three widely used thresholds to convert continuous SDMs into presence/absence maps. We demonstrated that MaxEnt produces very different outputs, requiring a careful choice on thresholds and modeling parameters. Although all theoretical evaluation metrics studied were positively correlated with accuracy, we empirically demonstrated that metrics based on specificity-sensitivity and sensitivity-precision are better for testing models, considering that most SDMs are based on unbalanced data. Without independent field validation, we found that using an arbitrary threshold for modelling can be a precarious approach with many possible outcomes, even after getting good evaluation scores. Bioacoustics proved to be important for validating SDMs for the six bat species analyzed, allowing a better refinement of SDMs in large and under-sampled regions, with relatively low sampling effort. Regardless of the species assessing method used, our research highlighted the vital necessity of in situ validation for SDMs.

Introduction added the following sentence to the text: "(…) aiming to the evaluate SDM's performance of using different thresholds given the validation dataset obtained with bioacoustics". Methods 5. "Perhaps a flowchart with the steps and setting used to build models could be provided to have a clearer picture of the distinct models." Answer: We added a new figure (Fig. 2) with a flowchart summarizing the methodology used in this study.
6. L. 131-132: "could you provide, in few words, how does the regularization multiplier parameter work?" Answer: We added the following sentence to the text: "The regularization multiplier is a tunning parameter used to smooth the distribution prediction of the model, making it more regular and less overfitted." 7. L. 167: "The validation points seem to be more concentrated toward east, not randomly distributed across the 1000 x 1000 km area. This pattern is similar to the historical records, which may likely be associated with accessibility." Answer: Yes, in fact, road accessibility was a pre-condition for the point selection, and we failed to be clear about that. Therefore, we changed the text to: "Considering those species' potential distribution covered extensive areas, we focused our field validation on 129 randomly-selected sampling points along an area of 758,193 km 2 , in the Northeastern part of Brazil (Fig 1, and S3 Table   supplementary material). Since some Northeastern Brazil areas are not easily accessed, we preimposed the point selection near roads or paths accessible by, at least, an off-road vehicle.". 8. L. 181: "Did the acoustic monitoring took place in all months from march 2014 -January 2020? Can you provide some information about the seasonal activity of these species?" Answer: No, it did not, the acoustic monitoring took place between those dates. The seasonality in the Neotropics is defined by the differences in precipitation and the period of the year for that seasonal precipitation also differs between regions. Therefore, not only due to the size of the sampling area but due to the climate specificities of each region/biome, it would be impractical to collect acoustic data in all points simultaneously. To avoid any bias due to the sampling scheme, we chose to sample all points for at least 2 nights during its dry season, avoiding nights with rain, strong winds, or less than 15ºC, as stated in the manuscript.
There is little information about the neotropical bats' seasonal activity, and the majority of the studies available were conducted in the Amazonia biome and/or used mist-nets. Since we did not conduct our study in the Amazonia biome, and mist-net studies are biased towards the Phyllostomidae bats; probably, seasonal activity information available might not apply for the Caatinga, Cerrado, or Atlantic Forest populations. Therefore, to avoid confusion and further incrementing the already long text in the methods section, we prefer not to add the seasonal activity information here. 9. L. 182: "In line 71 you mention 300,000 files. Are these files subsets within the total continuous files? If so, please, describe how subset files were selected." Answer: Yes, those files are subsets. We changed the text to: "The acoustic sampling performed in this study resulted in more than 1.5 TB of raw sound files. Those raw sound files were divided into 15-sec files and subsequently found that more than 300,000 of those 15-sec sound files contained bat calls." "Results" 10. "You could add a paragraph summarizing the main results found in SDM with the different settings used in the models, before validation."

Answer:
We added a new figure with a model generated for the species Promops nasutus and the differences on the predicted occurrence after applying the three studied thresholds (Fig. 4).

"What are the differences in performance found for different regularization multiplier values?"
Answer: The usage of different regularization multipliers is essential to avoid model overfitting.
Therefore, we employed it as part of the modeling process and not as a feature to be validated. We did not systematically test regularization multiplier values' performance, and we believe that the best models can be found in any of them. For example, if you consult the S4 table, the best performance model for Noctilio leporinus (in terms of G-mean) was built with a 3.0 regularization multiplier (Reg 3), but the second-best was built with a 1.0 regularization multiplier (Reg 1). In the case of Promops centralis, the best performance model (in terms of G-mean) was built with a 1.0 regularization multiplier (Reg 1), but the second-best was built with a 2.0 regularization multiplier (Reg 2). There is no 'most suitable' regularization multiplier but is crucial to model using different regularization multipliers to find the best model.

Answer:
We changed the first paragraph to make our main findings clearer. The main differences between the three tested thresholds are explained in materials and methods, and we present the validation results of each in the "Thresholds vs. validation" results section, and we discuss extensively its performances in the discussion. table 1 are not strong, with most <0.5. This finding should be highlighted and discussed."

Answer:
We followed the Reviewer 2 suggestion on the problem of using Spearman Rank test correlation in our case. Therefore, to assess the performance of the theoretical evaluation metrics (OAcc, P-kappa, TSS, and SEDI), now we performed mixed-effects linear models to evaluate the correlations between those scores and the post-validation performance metrics scores obtained using acoustic monitoring (accuracy, precision, sensitivity, specificity, g-mean, and f-score), where species were treated as a random-effect variable.

"Also, it would be interesting to know if the performance metrics are similar to studies that rely upon non-acoustic methods for validation."
Answer: We also agree that it would be interesting. However, we think that this would need to be addressed in a new study since we would need to use independent datasets (with and without acoustic methods).
17. "L. 374-375: This is not entirely true for all species/thresholds evaluated and could be acknowledged here." In this first paragraph, you could be more specific in the take-home message. For instance, you could include information on how different thresholds may be better than others and discuss the influence of sample size and unbalanced data.

Answer:
We stated: "Bioacoustics proved to be a very effective method for the in situ validation of SDM for six neotropical bat species in a large and poorly-sampled area in Brazil." We still think that bioacoustics proved to be a very effective validation method of SDM's since it allowed us to easily flag SDM's omission and commission errors in a vast study area with a relatively low sampling effort.
After your and Reviewer 2's remarks, we linked our statements with our results and we edited and reviewed entirely this first paragraph. We discuss the influence of the threshold choice and sample size on the "Theoretical model evaluation metrics and thresholds vs. validation" section of the discussion.

Reviewer 2' comments
This study examines the contribution of bioacoustic tools (passive acoustic monitoring) to validate the predictions of Species Distribution Models (SDM) in six tropical bat species. Based on the comparison between theoretical evaluation metrics and post-validation performance parameters obtained from field sampling, the authors highlight the need of in situ validation of SDM and argue the use of novel acoustic techniques as rapid validation methods. The study has been properly conducted, using sound methods and a large data set, which enable the authors to successfully address the proposed goals. Overall, the manuscript is clear, well-written and presents results in an effective manner. Nevertheless, there are still a series of issues that should be carefully revised before publication. First, I strongly recommend an English revision of the whole text by a native speaker or language service, if it has not been made yet. I am not an English native speaker, but I feel this is needed to significantly improve spelling, grammar, and the general flow of the text.
All across the manuscript, I included suggestions and minor questions (directly on the pdf; see attachment) that aim to increase the clarity and precision of the document. My major points are listed below.

"Statistical analysis"
1. "My main concern is related to the statistical analyses, since some of them may be fell into pseudoreplication. As shown in Table 1, Spearman correlation tests were calculated using 144 observations (except for SEDI) that came from models of the six study species (24 models per species). Thus, these observations (validation metrics) are grouped by species and they must be not considered as independent replicates. Predictions obtained by models of the same species can likely be related. As consequence, the statistical analysis applied to examine the correlation between theoretical evaluation and in situ validation should be designed taking into account the non-independence of the observations and performed again. I recommend the use of general linear mixed-effects models (GLMM), for instance, with species as random factor. Probably, the same might be applied to Kruskal-Wallis tests that the authors should carefully review in the light of this comment about potential pseureplication."

Answer:
We believe that our design is free from pseudoreplication since sampling points are apart at least 2 km from each other. Therefore, is not possible to record the same bat at the same time in different sampling points. We replaced the Spearman's Rank correlations with we performed mixed-effects linear models to evaluate the correlations between those scores and the post-validation performance metrics scores obtained using acoustic monitoring (accuracy, precision, sensitivity, specificity, g-mean, and f-score), where species were treated as a random-effect variable. However, we maintained the Kruskal-Wallis tests since we believe our data is pseudoreplication-free.

"Results interpretation, conclusions and goals"
2. "Despite of the fact that the manuscript is generally well-written and structured, I find that the link between results and discussion is still unclear, especially for a general audience with less experience on SDM. In results, the reader can find vast details and analyses, but there are not clear explanations about the implications of these findings. In discussion, the interpretation of the results is often presented in a general manner, hindering the general understanding of the origin of such conclusions. I recommend the authors to make an effort to clarify (in results or discussion; or even better, in both) which specific result in each case enable them to draw a particular conclusion, so that the text gain in clearness and can be accessible for a broader audience. Which specific result helps us to understand that validation is key for SDM? Which one indicates that bioacoustics is "a very effective method for the in situ validation of SDM"? Which one that "we empirically demonstrated that independent field surveys are the best approach to corroborate the predictions made by modelling"?" Answer: We supported the highlighted conclusions with some examples extracted from our results, as suggested by the reviewer.
3. "Moreover, in my opinion, the text would also benefit from a more clear link between the goals presented in the last paragraph of the introduction and the main conclusions presented in the first paragraph of the discussion. The paragraph presenting the study goals lacks an explicit mention to the general aim of the study. The authors did not clearly refer to a key aspect of the study: the assessment of the role of validation methods in SDM and their proposal of using bioacoustic tools as rapid validation method." Answer: We agree and we corrected it, adding two sentences in the paragraph presenting the study: "(…) aiming to the evaluate SDM's performance of using different thresholds given the validation dataset obtained with bioacoustics" and "This methodological procedure enabled us to assess of the role of validation methods in SDM outputs and acoustic samplings as rapid validation method.". We also reviewed and edited the text to add a clear link in the discussion to each of the goals presented in the conclusions.