Can species distribution models really predict the expansion of invasive species?

Predictive studies are of paramount importance for biological invasions, one of the biggest threats for biodiversity. To help and better prioritize management strategies, species distribution models (SDMs) are often used to predict the potential invasive range of introduced species. Yet, SDMs have been regularly criticized, due to several strong limitations, such as violating the equilibrium assumption during the invasion process. Unfortunately, validation studies–with independent data–are too scarce to assess the predictive accuracy of SDMs in invasion biology. Yet, biological invasions allow to test SDMs usefulness, by retrospectively assessing whether they would have accurately predicted the latest ranges of invasion. Here, we assess the predictive accuracy of SDMs in predicting the expansion of invasive species. We used temporal occurrence data for the Asian hornet Vespa velutina nigrithorax, a species native to China that is invading Europe with a very fast rate. Specifically, we compared occurrence data from the last stage of invasion (independent validation points) to the climate suitability distribution predicted from models calibrated with data from the early stage of invasion. Despite the invasive species not being at equilibrium yet, the predicted climate suitability of validation points was high. SDMs can thus adequately predict the spread of V. v. nigrithorax, which appears to be—at least partially–climatically driven. In the case of V. v. nigrithorax, SDMs predictive accuracy was slightly but significantly better when models were calibrated with invasive data only, excluding native data. Although more validation studies for other invasion cases are needed to generalize our results, our findings are an important step towards validating the use of SDMs in invasion biology.


Introduction
In the recent past, globalization has led to an increase of invasive species, a pattern likely to continue [1]. Besides being one of the biggest threat to biodiversity and ecosystems [2], biological invasions are also very costly to the global economy [3]. This increase of invasive species and their consequences on biodiversity and ecosystems raise numerous management and control issues [4,5]. Preventing an invasive species' establishment and further spread is recognized PLOS  as a more efficient and less costly management strategy than eradication, containment and control that may be required when the invasive species has fully established [6]. To that end, species distribution models (SDMs) are increasingly being used in invasion biology, especially to predict invasion risk [7][8][9][10][11][12] and optimize control strategies [13,14]. SDMs are also widely used in conservation biology, e.g. to predict the potential impact of climate change on genetic diversity [15,16], on species diversity [17][18][19][20] and on functional diversity [21,22] or to help reserve planning [23][24][25]. However, SDMs have been criticized for lacking mechanisms and independent validation, among other things [26]. Besides, two key assumptions of SDMs are often violated in invasion biology. First, niche conservatism is an assumption required for model transferability, whereby climate niches modeled with information from the native area are often projected onto new geographical spaces to estimate the likelihood of successful invasions there. In the context of invasion biology, niche conservatism differs from evolutionary niche conservatism (fundamental niche conserved over evolutionary time), as the question is to know whether the species realized niche is conserved over space. Yet, the assumption of niche conservatism over space is not always met, as the naturalized climatic niches of invasive species can differ from their natives climatic niches [27][28][29]. Second, until the latest stage of invasion, an invasive species is not yet at equilibrium with its environment [30], so its climatic niche is likely underestimated. Despite such criticisms, the important need of predictive models is such that SDMs are still often used in invasion biology. Indeed, validation studies (i.e., where SDM predictive accuracy is estimated with independent data and not through cross-validation only) being so scarce, SDMs have still not been fully proven to be inaccurate-or accurate -. A few pioneer studies aimed at assessing the predictive accuracy of SDMs in predicting species distribution changes with mixed results: a study showed that models had a good accuracy in predicting the range change of the Eurasian otter in Spain observed between two nation-wide surveys carried out ten years apart [31], whereas a study of deer species in Great Britain showed that SDMs predictions of range changes were no better than dispersal models [32]. Invasive species represent a good opportunity to evaluate SDMs predictive accuracy with independent data, as their invasive range can expand quickly. Indeed, invasive species whose invasion was closely monitored can be used to test whether records from the later stage of the invasion could have been predicted by a model calibrated with records only from the early stages of the invasion. Only very few studies have taken advantage of this opportunity to carry out validation studies with independent data [33-38], but they were carried out with simplistic envelop models or at small spatial scales. Besides, invasive species not being likely at equilibrium makes the use of common evaluation metrics (such as AUC, TSS. . .) not appropriate as observed absences can either be because of unsuitable environment or because the species did not disperse there yet [ [49] and finally the Netherlands where it was first recorded in 2017 (http://frelonasiatique.mnhn.fr/ le-frelon-asiatique-detecte-aux-pays-bas/). The spread of this invasive species has been closely monitored since the species was introduced, so it is a perfect example for a validation study. In this study, after investigating whether V. v. nigrithorax is at equilibrium in its invasion range, we used more than 10,000 European invasion occurrences recorded between 2004 and 2015 to test whether occurrences from the later stage of invasion could have been predicted by models calibrated using occurrences from the earlier stage of invasion. We also took advantage of the authors works for the INPN and is the head of this monitoring program (QR), the authors had special access to the data. In addition to the restrictions explained above, the authors cannot provide these data freely due to French regulations. However, the authors confirm that interested, qualified researchers will be able to access the data without restriction after filling in the request form and specifying that the data will not be used for commercial purposes, agreeing to provide attribution in any published papers, and that they will not be distribute these data to third parties.
having independent validation data to test whether SDMs would have a better predictive accuracy of the ongoing invasive range if native data were accounted for, thereby responding to another strong question regarding the use of SDMs in invasion biology.

Presence data of V. v. nigrithorax in its native and invaded ranges
Presence data from the native Asian range was obtained by gathering information on museum specimens, published records and hornet sampling performed in China [50] (see S1

Climate data
We used the same eight climatic variables as in previous studies for the niche modelling of V. v. nigrithorax [50,52]. We considered: (1) annual mean temperature, (2) mean temperature of the warmest month, (3) mean temperature of the coldest month, (4) temperature seasonality, (5) annual precipitation, (6) precipitation of the wettest month, (7) precipitation of the driest month and (8) precipitation seasonality. The seasonality is the coefficient of variation of the monthly means. Current data were downloaded from the worldclim database [53] (http:// www.worldclim.org/) as 2.5 arc-min grids (subset of the 19 bioclim variables). These data are interpolations from observed data representative of current climatic conditions.

Is V. v. nigrithorax at equilibrium in its invaded range?
SDMs are often criticized when used in invasion biology because the equilibrium assumption is often violated [30]. Therefore, we investigated whether V. v. nigrithorax is at equilibrium in its invaded range by comparing the climatic niche occupied by the species during the first span of its invasion (2004)(2005)(2006)(2007)(2008)(2009)(2010) to the one occupied now (2011-2015 invasion data). After being first observed in 2004, V. v. nigrithorax was observed twice in 2005 before really starting its spread in 2006 (more than 100 records over seven departments). Thus, the split of occurrence data into earlier and later stage of invasion in 2010 represents an equal 5-year length of geographical spread for each invasion stage. Visualization of the climatic niche and tests of niche equivalency and niche similarity were realized following the methods described by Broennimann et al. [54,55]. The first step consists of calculating the density of occurrences along the first two axes of a climate PCA (with the same variables used for the SDMs). The niche overlap was then calculated and niche equivalency and similarity were statistically tested [54]. data from both the native and the invaded range. In order to evaluate whether the ongoing invasion in Europe can be predicted by climate suitability modeling, we used only presence data from the earlier stage of the invasion (2004-2010), so that records from the later stage of the invasion could be used as evaluation data. As no absence data were available for the species, pseudo-absences were randomly drawn [58]. For models using presence data from the invaded range only, pseudo-absences were chosen in Europe, whereas for models using presence data from both the invaded and the native range, pseudo-absences were chosen in the South-East part of Asia and in Europe. In both cases, we used 10,000 random pseudo-absences, with the total weight of presences being equal to the total weight of pseudo-absences [58]. As results might depend on the choice of pseudo-absences, models were replicated three times (with different pseudo-absences selection) [58]. To obtain a consensus distribution, we used an ensemble forecast technique [59]: the consensus distribution was calculated as the average of all distributions across modeling techniques and pseudo-absences replicates.
Model predictive accuracy was evaluated by assessing how well data recorded during the later stage of the invasion (2011-2015, hereafter called evaluation data) was predicted by the models that were calibrated with data from the early stage of the invasion only (2004-2010, hereafter called calibration data). To this end, we extracted the predicted climate suitability values for all invasive records from the evaluation data. Although the species displays one of the fastest invasive rate, with founder queens in flight mill experiments able to travel over 40 km a day [42] or fly an average of 18 km per day covering up to 200 km over 10 days [43], dispersal (in all cardinal directions) remains likely to be a limiting factor to the natural spreading. V. v. nigrithorax is therefore more likely to colonize areas close to where it first invaded. Thus, we need to compare the predicted climate suitability values of evaluation data to random points being within the same distance to the first invasion data. A better predictive accuracy is obtained when the climate suitability of validation point is higher than expected (given its distance to the first invasion record in this case). For a given validation point, we can thus compare its predicted climate suitability to the distribution of climate suitability values of all points being at the same distance from the first invasion record. That way, we can infer in which percentile of the distribution the validation point falls. A better predictive accuracy is obtained when percentiles of validation points are higher.
As results might depend on the cut-off year chosen to split the invasive data into calibration data and evaluation data, sensitivity analyses were carried out by applying different cut-off years. With that in mind, all analyses (SDM calibration and SDM evaluation) were carried out nine times, with cut-off years going from 2006 to 2014.

Results
The climatic niche occupied by V. v. nigrithorax in its invasion range clearly extended during the past few years (Fig 1A and 1B), as there is only a 45% overlap between the climatic niche occupied between 2004 and 2010 and the climatic niche occupied between 2011 and 2015. Statistical tests show that both niches are similar but not equivalent (Fig 1D and 1E). As the climatic niche first occupied by the species in its invasion range is still occupied, we could have expected both niches to be similar. Both niches not being equivalent further shows that part of the climatic niche occupied by the species between 2011 and 2015 was not occupied between 2004 and 2010. This means that in 2010 the species was not yet at equilibrium with its environment in Europe.
Climate suitability predictions differ depending on whether native data were accounted for (Fig 2 & S2 Fig). When accounting only for invasion data, climate suitability is high mainly in the southwestern part of France, where the invasion initiated (Fig 2A). However, when we accounted for both native and invasive data, high climate suitability is further predicted in the north of France, in Belgium, in northern Italy and in northern Spain (S3A Fig). In both cases, the predicted climate suitability of evaluation points (2011-2015, or "late invasion" data) is higher than expected by chance given their distance to the first invasion data (Fig 2 & S2 Fig). Indeed, out of the 2,534 evaluation points that are further away than 150 km from the first invasion record, more than 60% of them have their predicted climate suitability above the 70 th percentile of all background points within the same distance, whereas less than 3% of the evaluation points have their predicted climate suitability below the 30 th percentile (Fig 2 & S2 Fig). Besides, although the predicted suitability of the evaluation points that are the further away (further than 450 km of the first invasion occurrence, but within 850 km) is lower than for closer evaluation points (Fig 2 & S2  Fig), it is still higher than expected given their distance. Indeed, in both cases, more evaluation points have their predicted climate suitability above the 70 th percentile of all background points within the same distance of the first invasion occurrence (31% vs 56% with or without accounting for native data), than below the 30 th percentile (0% in both cases) (Fig 2 & S2 Fig).
Results do not depend on the cut-off year that was used to split the data into calibration data and evaluation data, as results were very similar for the other cut-off years that were tested (S3 & S4 Figs). Such overall results thus mean that climate influences-at least to some extentthe ongoing invasion of V. v. nigrithorax in Europe and that this influence can be predicted by SDMs, despite the species not being at equilibrium yet.
Furthermore, even though modeling methods provide good predictions, the predictions still differ according to whether or not native records were taken into account (Fig 2 & S2 Fig). We can thus further investigate whether one option provides more accurate results than the other. In our case study, percentiles of validation points were significantly higher when the climate suitability was predicted by models accounting for invasive data only (for all cut-off years, except for 2006, opposite result) (Fig 3 & S1 Table). Overall, models thus seem to have a better predictive accuracy when accounting for invasive data only.

Discussion
Using the unique features of an invasion closely monitored in space and time, we demonstrated that despite some known limitations, SDMs can be a powerful tool to predict where invasive species will spread next. In fact, our case study does show that V. v. nigrithorax is not at equilibrium with its environment in its European invaded ranges (Fig 1). This finding is consistent with studies focusing on other invasive species [30,36]. The equilibrium hypothesis being an important assumption, its violation needs to be acknowledged when interpreting SDMs predictions [26]. Indeed, violating the equilibrium hypothesis has some consequences when modeling species distributions, among which underestimating the potential climatic niche of the species, which can in turn lead to underestimating the geographical area the species can invade [36]. However, predicting the full potential invasive range of an invasive species may not be as relevant as accurately predicting the areas that are more likely to be colonized next. Indeed, given the cost of species monitoring and surveillance for the early detection of invasive species, it is more relevant to predict areas that might be invaded next rather than all potential areas that could be reached by the invader if the species achieved its climate equilibrium. Information regarding the areas that might be invaded next could indeed be used by managers for a cost-effective effort on monitoring and controlling such areas. For example, in the case of V. v. nigrithorax, whose invasion can be most efficiently controlled by an early detection followed by nest removal [43,60], monitoring efforts need to be implemented within the highest suitable areas within the already invaded range, as well as within the Evaluation (red) points above the median have a higher predicted suitability than expected given their distance to the first invasion occurrence. (c) Boxplots representing the range of climate suitability values for all possible points (grey) and invasive data (calibration data in blue and evaluation data in red) depending on their distance to the first invasion data. In all three cases, the predicted suitability of evaluation points is lower than the predicted suitability of calibration points, but is higher than expected given their distance to the first invasion occurrence (all possible points, in grey). highest suitable areas that are the closest to the already invaded range. Improved detection techniques would further increase the efficiency and decrease the costs of monitoring/controlling the invasion [61]. Therefore, even if invasive species distribution models cannot predict the full potential invasion range of an invasive species that has just established [36,62], they can still be very valuable for invasive species management. Yet, validation is needed for model reliability and credibility, especially when management decisions are based upon it [63].
Here, we showed that models calibrated with data from the earlier stage of invasion predicted adequately the recent invasive data of V. v. nigrithorax: new invasion observations had higher climatic suitability than expected from their distance to the first invasion occurrence. Although invasive species present a good opportunity to test SDMs predictions with independent data, this has rarely been done so far for the ongoing range expansions of invasive species. The few studies that pioneered this approach used evaluation metrics that have since been shown not to be appropriate when the species is not at equilibrium. Previous studies also showed that SDMs can be used to predict invasions under climate change, through validation from field data [64], as the performance in the field of three plant species was highly correlated with SDMs predicted climate suitability. Although further studies will be needed with other species, our study indicates that SDM can be used in invasion biology to better predict where the species is most likely to spread next, once an invasion has started. It is thus very important to monitor invasive species from the start of the invasion, so as to gather a large enough amount of information to run predictive SDMs. Besides, here we only considered climate variables, but model predictions can Comparing SDMs predictive accuracy when trained with or without native data. Percentiles of validation points (further than 150km from the first invasion record) depending on whether or not native data was accounted for to calibrate the models and on the cut-off year that was used to split the invasive data into calibration and evaluation data. Percentiles are obtained by comparing the predicted climate suitability of a given validation point to the distribution of climate suitability values of all points being at the same distance from the first invasion record than the validation point (i.e., grey points in Fig 2B). Percentiles higher than 50 th thus mean that the predicted climate suitability of the validation point is higher than expected given its distance to the first invasion record. For all cut-off years, paired t-test were computed to assess the difference between models with and without native data: a red star indicates significantly higher values (S1 Table). likely be improved by also considering non climatic drivers such as land-use variables [65], since predictions depend on the variables used to compute the models [66,67].
Numerous studies on invasive species distribution advocated to use distribution data from both the native and the invasive range [68,69]. In fact, if the species climatic niche is conserved from its native range to its invaded range, distribution data from the native range can be very valuable to characterize the full potential climatic niche of the species and thus the full geographical space it can invade. In this context, SDMs calibrated with distribution data from only the invasive range might under-predict the potential invasive range if the species is not at equilibrium yet [36]. However, if niche conservatism during invasions has been shown for some species [70,71], other studies revealed niche shifts during invasions [29,72,73], highlighting an inconsistent pattern of niche conservatism during invasions [74]. Accounting for native distribution data when calibrating invasive SDM may thus not improve their predictive accuracy in all cases. Here, we took advantage of having independent validation data to investigate this issue. In the case of V. v. nigrithorax, the invasive range is clearly predicted to be larger when accounting for native data (Fig 2 & S2 Fig). It is thus clear that at this time of the invasion, the climatic niche occupied by V. v. nigrithorax in its invaded range differs from the one occupied in its native range. If the native climatic niche is to be conserved, the full potential invasive range of the species might be better predicted by accounting for both invasive and native data, as the species is not at equilibrium with its environment in its invasive range yet (Fig 1). However, if the native climatic niche of V. v. nigrithorax is not to be conserved, accounting for native data when modeling its potential invasive range might lead to overprediction. Furthermore, predicting the full potential invasive range might not be as relevant as predicting the areas that are most likely to be invaded next, from a management point of view. Actually, the model predictive accuracy is slightly but significantly better when accounting for invasive data only (Fig 3). Thus, if the modeling purpose is to predict which areas the species is most likely to invade next, it is better to perform the SDMs without accounting for native data. Of course, similar studies need to be carried out for other species to know whether we can generalize such results or whether it depends on the species (as it appears to be the case for niche conservatism during invasions). Furthermore, although significant, the difference in model predictive accuracy is slight (Fig 3 & S1 Table), highlighting a very good predictive accuracy even when performing SDMs with both native and invasive data.

Conclusions
SDMs are increasingly used in ecology whether to predict the potential impact of global change or to predict the potential invasive range of introduced species. Yet, they are often criticized, especially because their predictive accuracy cannot be truly estimated due to a lack of independent validation data. Our study of the invasion of V. v. nigrithorax showed that the predicted climate suitability of independent validation points was very good. Such a result means that the spread of V. v. nigrithorax is-at least partially-climatically driven and can be accurately predicted by SDMs. In the case of V. v. nigrithorax, SDMs predictive accuracy was slightly but significantly better when models were calibrated with invasive data only, excluding native data. Although more validation studies for other cases of alien invasion are needed to generalize our results, our findings validate the use of SDMs in invasion biology.
Supporting information S1 Table. Comparing SDMs predictive accuracy for models trained with or without native data.