Table 1.
List of base classifiers used in GA-EoC.
Fig 1.
The steps in preprocessing the training dataset and generating the base classifier models.
The process starts taking the training dataset as input. First, it balances the class distribution for imbalanced training data. Next, it selects features using (α, β) − k Feature Set selection method if features are not available. Then, it creates train and validation folds from the training dataset for 10-fold cross validation. These folds of the dataset are saved and used for internal validation of ensembles. Finally, it generates the models for each classifiers on each training fold (Train 1 to Train 10) and save them for future use.
Fig 2.
Overall process flow of the proposed GA-EoC algorithm.
In GA-EoC, each individual represents an EoC and the genetic algorithm is used to find the best EoC based on its performance on validation folds. For each individual an EoC is constructed using the base classifier models of a training fold and the MCC score of the EoC is calculated for the corresponding validation fold generated beforehand (Fig 1). The average MCC score calculated over 10 folds is taken as the fitness value of the individual. The algorithm iterates creating a new population from the current one until a terminating condition is satisfied. The individual with the best fitness value form the final population is returned as the solution.
Fig 3.
Representation of an individual in GA-EoC and its mapping into the corresponding base classifiers for ensemble combination.
Table 2.
Characteristics of the datasets used for experiments.
Table 3.
Distribution of the training and testing data in PubFig05 dataset.
Table 4.
Outcome of the (α, β) − k Feature Set selection method for three different setups (UAB, IAB, UEAB) showing the number of selected features per binary-class datasets of PubFig05.
Table 5.
Classification performances (in MCC scale) of the base classifiers and GA-EoC for all experiments.
Table 6.
Classification accuracies achieved by the base classifiers and GA-EoC for all experiments.
Fig 4.
Confusion matrices for comparing the best classification performances using 18-protein biomarker.
(a-b) These classification performances are achieved by [Ray et al., 07], (c-d) These classification performances are achieved by [R.Moscato, 08] and (e-f) These classification performances are achieved by the proposed GA-EoC for TestSetAD and TestSetMCI, respectively.
Table 7.
Average classification performances (in terms of accuracy and MCC) using 18-protein biomarker.
Table 8.
Average classification performances (in terms of accuracy and MCC) using 5-protein biomarker.
Fig 5.
Best classification performances by the state of art method vs. the proposed method with the 5-protein biomarker.
The comparison of best classification performances using the 5-protein biomarker (RavettiMoscato-AD-Trn-5) as training dataset and TestSetAD and TestSetMCI as test datasets. (a-b) Classification performances achieved by [R.Moscato, 08], (c-d) Classification performances achieved by GA-EoC for the TestSetAD and TestSetMCI, respectively.
Table 9.
Average classification performances on UAB setup.
Fig 6.
Classification performances of GA-EoC and other ensemble of classifiers on PubFig05 datasets.
The classification performances of AdaBoostM1, Bagging, Random Forest and GA-EoC are compared in terms of Precision, Accuracy and F-Measure scores for (a) UAB datasets, (b) IAB datasets and (c) UEAB datasets.
Table 10.
Average classification performances on IAB setup.
Table 11.
Average classification performances on UEAB setup.
Fig 7.
Comparison of MCC scores achieved by GA-EoC and other ensemble of classifiers (AdaBoostM1, Bagging and Boosting) for all experiments.
Fig 8.
The accuracies of base classifiers and average accuracies of GA-EoC over all experiments.
Fig 9.
The MCC scores of base classifiers and the average MCC scores of GA-EoC over all experiments.
Table 12.
The number of different ensembles (with common base classifiers in them) constructed by GA-EoC over repeated experimental runs.
Table 13.
Classification performances of common ensemble of classifiers vs GA-EoC for all experiments.