Fig 1.
Separating hyperplanes in the linearly separable case.
The hyperplane is shown as a solid line, the margins as dashed lines, and the support vectors are enclosed in red circles.
Fig 2.
Training an SVM classifier only on support vectors compared to that trained on the whole dataset.
The decision boundaries are shown as solid lines, the margins as dotted lines, and the support vectors are enclosed in circles. (a) A classifier trained on the entire dataset. (b) A classifier trained only on support vectors identified earlier by an SVM classifier trained on the whole dataset.
Fig 3.
Identification of a layer of border points in our proposed method (DBI).
(a) The input classes of a synthetic 2-dimensional dataset set. (b) The calculated core scores for all points of the dataset. (c) The three identified types of points; i.e. border points (red), core points (green) and outliers (black). (d) The final reduced training set.
Fig 4.
Effect of border overlap on the decision boundaries of the trained classifier on the Banana dataset.
The whole dataset is compared to our proposed DBI and BRIX methods, both at a ratio of 0.2. The test dataset is plotted against the decision boundaries (solid lines) and margins (dashed lines). (a) Whole dataset. (b) Reduced dataset (DBI). (c) Reduced dataset (BRIX variant). (d) Test dataset plot against decision boundaries (whole dataset). (e) Test dataset plot against decision boundaries (DBI). (f) Test dataset plot against decision boundaries (BRIX variant). Notice the misclassified points in the central region of the plot in (e) due to the overlapping borders of the opposite class around the central part.
Fig 5.
Reduced dataset selection using the proposed SVO method for a non-overlapping dataset.
(a) A moons-shaped dataset. (b) Identified support vectors. (c) Reduced subset using SVO at k = 15.
Table 1.
Results of the proposed methods (DBI, BRI & BRIX) on the Banana dataset with different reduction ratios.
Fig 6.
Results for our proposed methods (DBI, BRI and BRIX) on the Banana dataset with different reduction ratios.
Table 2.
Results of the proposed SVO & SVOX methods on the Banana dataset with different values of k.
Fig 7.
Results for the proposed SVO and SVOX methods on the Banana dataset with different values of k.
Table 3.
Results of the proposed methods compared to other methods from the literature on the Banana dataset.
Table 4.
Pareto set of different methods on the Banana dataset.
Fig 8.
Ranking of Pareto set methods on the Banana dataset based on closeness to the optimal point.
The optimal point is the zero point, representing the ideal of minimizing all the optimized metrics. The score is calculated as the reciprocal of the Euclidean distance from the optimal point.
Fig 9.
Pareto set for the Banana dataset.
After the elimination of 70 non-dominating solutions, the set is composed of only five elements. These elements are exclusively proposed methods, namely BRIX and SVOX. (A point is dominating if it is better or equal in all objectives and strictly better in at least one objective).
Fig 10.
Comparison of methods on the Banana dataset.
The distribution of the different methods in the solution space of the three optimization objectives, in addition to the ratio of the reduced dataset, is shown for each pair of objectives. The proposed methods, except DBI and SVO, are predominantly closest to the optimal point.
Fig 11.
Results for the proposed methods (DBI, BRI and BRIX) on the USPS dataset with different reduction ratios.
Table 5.
Results of the proposed methods (DBI, BRI & BRIX) on the USPS dataset with different reduction ratios.
Table 6.
Results of the proposed SVO & SVOX methods on the USPS dataset with different values of k.
Fig 12.
Results for the proposed SVO and SVOX methods on the USPS dataset with different values of k.
Table 7.
Results of the proposed methods compared to other methods from the literature on the USPS dataset.
Table 8.
Pareto set of different methods on the USPS dataset.
Fig 13.
Ranking of Pareto set methods on the USPS dataset based on closeness to the optimal point.
The optimal point is the zero point, representing the ideal of minimizing all the optimized metrics. The score is calculated as the reciprocal of the Euclidean distance from the optimal point.
Fig 14.
Pareto set for the USPS dataset.
The Pareto set is composed of 17 elements selected from 106 candidate solutions.
Fig 15.
Comparison of methods on the USPS dataset.
The distribution of the different methods in the solution space of the three optimization objectives, in addition to the ratio of the reduced dataset, is shown for each pair of objectives. The proposed methods, except SVO, are predominantly closest to the optimal point.
Table 9.
Results of the proposed methods (DBI, BRI & BRIX) on the Adult9a dataset with different reduction ratios.
Fig 16.
Results for our proposed methods (DBI, BRI, and BRIX) on the Adult9a dataset with different reduction ratios.
Table 10.
Results of the proposed SVO & SVOX methods on the Adult9a dataset with different values of k.
Fig 17.
Results for the proposed SVO and SVOX methods on the Adult9a dataset with different values of k.
Table 11.
Results of the proposed methods compared to other methods from the literature on the Adult9a dataset.
Table 12.
Pareto set of different methods on the Adult9a dataset.
Fig 18.
Ranking of Pareto set methods on the Adult9a dataset based on closeness to the optimal point.
Fig 19.
Pareto set for the Adult9a dataset.
The Pareto set is composed of 17 elements selected out of 68 possible candidate solutions.
Fig 20.
Comparison of methods on the Adult9a dataset.
The distribution of the different methods in the solution space of the three optimization objectives, in addition to the ratio of the reduced dataset, is shown for each pair of objectives. BRIX and Gaffari’s method are the closest to the optimal point. SVO and SVOX are more clustered in the solution space than the other methods.