The Effects of Sampling Bias and Model Complexity on the Predictive Performance of MaxEnt Species Distribution Models

doi:10.1371/journal.pone.0055158

Figure 1.

Data locations in New Zealand.

Tree fern occurrence locations (orange) and “absence” locations (blue) are based on a) herbarium data extracted from GBIF; b) NVS ecological survey data extracted from GBIF, and c) LUCAS plot data. In the case of the herbarium and NVS datasets, “absences” are background locations based on locations of other vascular plants; in the case of the LUCAS dataset, true absences are shown.

More »

Expand

Table 1.

Effects of correcting for geographical sampling bias on the predictive performance of New Zealand tree fern distribution models trained on herbarium and NVS datasets.

More »

Expand

Table 2.

Effects of correcting for geographical sampling bias on the rates of false presences and absences, and on the predicted extent of tree ferns (as a percentage of the total land area of New Zealand).

More »

Expand

Figure 2.

Density distribution plots of environmental variables.

Tree fern occurrences (orange) and background locations are based on locations of other vascular plants (blue) compared to all NZ locations (∼1 km resolution; black) for the herbarium dataset (upper row) and the NVS dataset (lower row). Temperature seasonality is represented as standard deviations multiplied by 10.

More »

Expand

Figure 3.

Comparison of presence-only calibration (POC) plots.

MaxEnt LQ models were trained on (a) herbarium and (b) NVS data, correcting for geographical sampling bias; plots were derived from the average predictions of 40 runs. Values above the linear diagonal signify model underestimation of species prevalence and values below the line signifies overestimation of species prevalence. The calibration curve is shown in cyan and the orange lines represent ±2 standard deviations. Presence and background data are marked at the bottom of each graph at their corresponding predicted probabilities of presence: presences are orange and background data are black.

More »

Expand

Figure 4.

Box plots of AUC values. AUC values derived from MaxEnt models fitted using different functional forms (“feature types”) and two different training datasets: herbarium (a–d) and NVS (e–h).

Evaluations are made using randomly withheld test data without and with correcting geographical sampling bias (a & e) and (b & f), respectively; evaluations are made using independent LUCAS data without and with correcting for sampling bias are (c & g) and (d & h), respectively. Box plots indicate variation in AUC among 40 runs (boxes encompass 25^th and 75^th percentiles, whiskers approximate 99% of the data range, points are outliers).

More »

Expand

Figure 5.

LUCAS presence/absence locations with predicted presences and absences generated from average LQ model predictions (with geographical sampling bias correction).

Correct agreement between predicted presences/absences and LUCAS presences/absences are shown in green and incorrect agreements are shown in orange. LUCAS presence locations are shown with predictions from (a) herbarium dataset and (b) NVS dataset, LUCAS absence locations are shown with predictions from (c) herbarium dataset and (d) NVS dataset.

More »

Expand