Fig 1.
(a) Expert-based mapped image of Maupiti island and (b) Pleiades image of Maupiti island segmented with Felzenszwalb’s method.
Table 1.
Percentage of the number of pixels of each class on Maupiti data, based on expert mapping.
Fig 2.
Difference between the original SMOTE algorithm and SMOTE-CD.
The blue points are the points to oversample. (a) The points to oversample belong to the same class (here, class 1). (b) The points to oversample are the ones that have the same class as their majority class in their compositional vector label.
Fig 3.
Simulation of 400 points using B(a) (a) and B(b) (b).
Fig 4.
(a) The original imbalanced dataset, (b) the output balanced dataset with the created points displayed as a cross.
Table 2.
Comparison of simulated raw data (4 classes) and oversampled data, repeated 100 times. Displayed results are mean (s.d.).
Fig 5.
Performance of Dirichlet model on raw and oversampled data, depending on the imbalance of the dataset (indicated by % of observations in class 0), based on 16 features and 4 classes.
Fig 6.
Average R2 and F1-score per class of Dirichlet model on raw and oversampled simulated data.
Bars represent the mean score, vertical lines represent the standard deviation.
Table 3.
Results comparing raw Maupiti data (4 classes) and oversampled with a 5-fold cross validation. Displayed results are mean (s.d.).
Fig 7.
Average R2 score per class of Gradient Boosting tree on raw and oversampled Maupiti data.
The red dotted lines represent the weight of each class, and the value below the class is its weight. Bars represent the mean score, vertical lines represent the standard deviation.
Table 4.
Results comparing raw Tecator data (3 classes) and oversampled with a 10-fold cross validation, iterated 100 times. Displayed results are mean (s.d.).