A machine learning framework for estimating the probability of blacklegged tick population establishment in eastern Canada using Earth observation data

doi:10.1371/journal.pone.0332582

Fig 1.

Geographic distribution of surveillance sites and environmental context. A) Distribution of tick surveillance sites across Quebec (2014-2023) and Ontario (2015-2018), B) land cover classification map, and C) DD > 0°C for 2022.

More »

Expand

Fig 2.

Workflow for estimating the probability of established tick populations.

(A) Model development process; (B) Model implementation to produce a map showing the probability of tick population establishment at 250 m resolution.

More »

Expand

Table 1.

Candidate predictor variables included in the initial parameter selection.

More »

Expand

Fig 3.

Pearson correlations among the candidate predictor variables, calculated at surveillance sites using a 1 000-m radius buffer.

White cells indicate non-significant correlations (p-value > 0.05). Bolded variables were retained for model training. Asterisks indicate predictor variables that were initially excluded during the VIF analysis but subsequently reinstated as based on point-biserial correlation results.

More »

Expand

Table 2.

Performance of ML models for estimating tick population establishment at Quebec surveillance sites from 2022-2023, using predictor variables calculated at the surveillance sites using a 1 000-m radius buffer. Models are ordered by ROC AUC.

More »

Expand

Table 3.

List of predictor variables included in the final models, calculated at the surveillance sites using a 1 000-meter radius buffer.

More »

Expand

Fig 4.

Global and local feature (predictor variable) explanations from the XGBoost trained with Quebec data (2014–2021).

Left: bar chart showing the global importance of each feature, measured as the mean absolute SHAP value across all observations. Higher values indicate greater overall influence on the model’s predictions. Right: The local explanation summary plot indicates how each feature observation contributes to the model’s predictions. Each dot represents a site, with colour indicating the feature value (red = high, blue = low). Dot position along the x-axis is the SHAP value, showing how much that feature shifts the model’s prediction from the baseline on a log-odds scale, with positive values increasing the prediction and negative values decreasing the prediction. The baseline prediction (the model’s average output) was a log-odds of approximately −0.212, corresponding to a probability of about 0.44. For a single feature, predicted log-odds for a site is calculated by adding that feature’s SHAP value to the baseline. For example, a high DD > 0°C value contributing a SHAP value of +2.2 would increase the predicted probability from the baseline of 0.44 to 0.88 as follows: log-odds = Baseline + SHAP_DD > 0 = −0.212 + 2.2 = 1.988 and the final probability, p, would be p = 1/ (1 + e^(−1.988)) ≈ 0.88.

More »

Expand

Fig 5.

XGBoost model output for 2022 at 250 m resolution.

A) Predicted probability map of the established tick population, and B) associated uncertainty quantification (UQ). Shown in C) and D) are zoomed-in sections of the UQ map with corresponding land cover maps, illustrating areas of high uncertainty in Montreal, Quebec and Toronto, Ontario, respectively.

More »

Expand