Fig 1.
Scanned map of historical locations with banana in 1958.
Each dot represents 2.02 km2 of standing banana. A total of 2400 dots are spread across Uganda equivalent to 4,856 km2 of banana. Image reproduced with permission of Giles Clark, the copyright holder.
Fig 2.
Sampled locations with banana in 2016.
Each dot represents the centroid of a quadrat of 10,000 m2 used for collecting banana presence/absence information during the Geosurvey. Data acquired with permission of Markus Walsh, Africa Soils Information Service (https://doi.org/10.17605/OSF.IO/J8Y3Z).
Fig 3.
Correlation among selected covariates A) 29 covariates after recursive feature elimination; B) 17 covariates with Pearson’s correlation coefficient (r) less than ± 0.7; C) 12 covariates selected using a subjective approach.
Table 1.
List of 29 covariates selected from a list of 71 variables using recursive feature elimination and further selection of 17 uncorrelated covariates (shaded grey).
Table 2.
List of 12 covariates selected subjectively and their underlying reasons for their selection.
Table 3.
Average nearest neighbour analysis before and after filtering of data points.
Fig 4.
Performance metrics (A: Adjusted F-measure, B: Brier score, C: Geometric mean, D: Cohen’s Kappa, E: PR AUC, F: ROC AUC) for random forest (RF), gradient boosted machines (GBM) and neural networks (NN) trained on the 12 covariates chosen via subjective feature selection. Each algorithm was trained under three different sampling scenarios: Oversampling (OS), and undersampling (US) and without sampling (WS). The black line and red dot inside the box are the median and mean, respectively.
Fig 5.
Performance metrics (A: Adjusted F-measure, B: Brier score, C: Geometric mean, D: Cohen’s Kappa, E: PR AUC, F: ROC AUC) for random forest (RF), gradient boosted machines (GBM) and neural networks (NN) trained on the 17 covariates selected using recursive feature elimination. Each algorithm was trained under three different sampling scenarios: Oversampling (OS), undersampling (US) and without sampling (WS). The black line and red dot inside the box are the median and mean, respectively.
Fig 6.
Performance metrics (A: Adjusted F-measure, B: Brier score, C: Geometric mean, D: Cohen’s Kappa, E: PR AUC, F: ROC AUC) for the ensemble models. The black line and red dot inside the box are the median and mean, respectively. Wilcoxon rank test significance values: Not significant (ns) p > 0.05; * p < 0.05; ** p < 0.01; *** p < 0.001; ****p < 0.000.
Fig 7.
Predicted banana distribution map (2016) using an ensemble model from RF, GBM and NN trained on A) 12 covariates and B) 17 covariates. The maps were refined using the SAGA majority filtering tool within QGIS. Probabilities were converted into categories of banana presence using the probability threshold of 0.25 that maximizes the true positive rate and true negative rate (Max TPR+TNR).
Table 4.
Comparison between logistic regression models of different complexity and structure derived from the 12 covariates chosen using subjective feature selection.
Table 5.
Summary results of the logistic regression model M2-12 including the significant two-way interactions to maximise loglikelihood.
Fig 8.
Predicted banana distribution map (2016) using logistic regression model M2-12 fitted using 12 covariates and significant two-way combinations.
Fig 9.
Spatial distribution of banana A) historical banana distribution (1958); B) latest banana distribution (2016) predicted using ensemble model of RF, GBM and NN trained on the 12 covariates; C) percentage share of banana among administrative regions: Northern, Eastern, Central and Western. The share of banana was computed using counts of pixels with banana in each region divided by the total number of pixels with banana in Uganda.
Fig 10.
Geographic shifts of banana in Uganda.
A) geographic shift patterns generated by overlaying the historical distributions (1958) and latest banana distribution (2016); B) percentage distribution of banana geographic shift between administrative regions: Northern, Eastern, Central and Western; C) percentage distribution of banana geographic shift among agroecological zones 1: West Nile Farmlands; 2: Northwestern Farmlands-Wooded-Savanna; 3: Northern Moist Farmlands; 4: Northeastern Central Grass-Bush Farmlands; 5: Northeastern Semi-arid Short Grass Plains; 6: Western Mid-Altitude Farmlands and the Semuliki Flats; 7:Central Wooded Savanna; 8: Southern and Eastern Lake Kyoga Plains; 9: Mount Elgon Farmlands; 10: Western Medium High Farmlands; 11: Southwestern Grass Farmlands; 12: Lake Victoria Crescent and Mbale Farmlands; 13: Ssese Islands and Sango Plains; 14: Southwestern Highlands. The percentages were computed based on numbers of pixels in each region that correspond to the different shift categories divided by the total number of pixels in the geographic shift map of Uganda.
Fig 11.
Classification and regression tree (CART) showing the biophysical factors associated with geographic shift in banana at national level.
Probabilities for each geographic shift class are included within the coloured boxes. The node number at which a split occurs are shown above the coloured boxes.