Figure 1.
Data locations in New Zealand.
Tree fern occurrence locations (orange) and “absence” locations (blue) are based on a) herbarium data extracted from GBIF; b) NVS ecological survey data extracted from GBIF, and c) LUCAS plot data. In the case of the herbarium and NVS datasets, “absences” are background locations based on locations of other vascular plants; in the case of the LUCAS dataset, true absences are shown.
Table 1.
Effects of correcting for geographical sampling bias on the predictive performance of New Zealand tree fern distribution models trained on herbarium and NVS datasets.
Table 2.
Effects of correcting for geographical sampling bias on the rates of false presences and absences, and on the predicted extent of tree ferns (as a percentage of the total land area of New Zealand).
Figure 2.
Density distribution plots of environmental variables.
Tree fern occurrences (orange) and background locations are based on locations of other vascular plants (blue) compared to all NZ locations (∼1 km resolution; black) for the herbarium dataset (upper row) and the NVS dataset (lower row). Temperature seasonality is represented as standard deviations multiplied by 10.
Figure 3.
Comparison of presence-only calibration (POC) plots.
MaxEnt LQ models were trained on (a) herbarium and (b) NVS data, correcting for geographical sampling bias; plots were derived from the average predictions of 40 runs. Values above the linear diagonal signify model underestimation of species prevalence and values below the line signifies overestimation of species prevalence. The calibration curve is shown in cyan and the orange lines represent ±2 standard deviations. Presence and background data are marked at the bottom of each graph at their corresponding predicted probabilities of presence: presences are orange and background data are black.
Figure 4.
Box plots of AUC values. AUC values derived from MaxEnt models fitted using different functional forms (“feature types”) and two different training datasets: herbarium (a–d) and NVS (e–h).
Evaluations are made using randomly withheld test data without and with correcting geographical sampling bias (a & e) and (b & f), respectively; evaluations are made using independent LUCAS data without and with correcting for sampling bias are (c & g) and (d & h), respectively. Box plots indicate variation in AUC among 40 runs (boxes encompass 25th and 75th percentiles, whiskers approximate 99% of the data range, points are outliers).
Figure 5.
LUCAS presence/absence locations with predicted presences and absences generated from average LQ model predictions (with geographical sampling bias correction).
Correct agreement between predicted presences/absences and LUCAS presences/absences are shown in green and incorrect agreements are shown in orange. LUCAS presence locations are shown with predictions from (a) herbarium dataset and (b) NVS dataset, LUCAS absence locations are shown with predictions from (c) herbarium dataset and (d) NVS dataset.