Adaptive TreeHive: Ensemble of trees for enhancing imbalanced intrusion classification | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

Intrusion detection systems and local area networks.
The diagram illustrates a corporate network protected by a layered security architecture. External threats originating from the internet or a hacker are routed through a firewall and switch before reaching the internal network components, which include a web server, email server, and a centralized management system. An Intrusion Detection System (IDS) monitors the traffic passing through the network to detect and report suspicious activities. The design emphasizes the role of the IDS in safeguarding network assets by identifying potential security breaches in real-time.

More »

Table 1 — Table 1.

Decision tree algorithms.

More »

Table 2 — Table 2.

Machine learning algorithms hyperparameter values.

More »

Table 3 — Table 3.

Datasets description of five large-scale public benchmark datasets.

More »

Fig 2 — Fig 2.

Flow diagram of the Adaptive TreeHive intrusion detection framework.
Adaptive TreeHive groups feature by gain ratio, build randomized trees on each subset, and merge their outputs via weighted majority voting. NSL-KDD, UNSW-NB15, CIC-IDS2017, CSE-CIC-IDS2018, and CICDDoS2019 datasets are pre-processed, features ranked, informative instances selected by clustering, redundancies removed, and then split into training and testing sets. The chosen trees (those exceeding a performance threshold) are trained on the processed training set, assigned weights based on their error rates, and their predictions are aggregated by weighted voting. Performance is evaluated using accuracy, precision, recall, and F1-score.

More »

Table 4 — Table 4.

The juxtaposition of the quantitative performance of different existing methods after applying SMOTE.

More »

Table 5 — Table 5.

The juxtaposition of the quantitative performance of different existing methods after applying random under-sampler.

More »

Table 6 — Table 6.

Comparing the empirical outcomes of various ensemble-based methods on the intrusion classification task.

More »

Fig 3 — Fig 3.

Confusion matrix of NSL-KDD dataset.

More »

Fig 4 — Fig 4.

Confusion matrix of UNSW-NB15 dataset.

More »

Fig 5 — Fig 5.

Confusion matrix of CIC-IDS2017 dataset.

More »

Fig 6 — Fig 6.

Confusion matrix of CSE-CIC-IDS2018 dataset.

More »

Fig 7 — Fig 7.

Confusion matrix of CIC-DDoS2019 dataset.

More »

Table 7 — Table 7.

The influence of dataset size (CICDDoS2019) on the performance of the proposed method.

More »

Table 8 — Table 8.

A comprehensive comparison of our proposed method against two deep learning baselines (BiLSTM and CNN-GRU) for the intrusion classification task. The evaluation was performed on five standard datasets, with metrics including Accuracy (ACC), Precision (PR), Recall (RE), and F1-Score.

More »

Fig 8 — Fig 8.

The empirical outcome of Adaptive TreeHive on three different-sized training sets in terms of accuracy.
Accuracy of Adaptive TreeHive on three training set sizes using different data-balancing strategies—balanced dataset (ours), oversampling, and undersampling—across five intrusion detection datasets. Line plots show that the balanced strategy consistently outperforms the others in classification accuracy.

More »

Fig 9 — Fig 9.

The empirical outcome of Adaptive TreeHive on three different size training sets in terms of f1-score.
F1-score of Adaptive TreeHive on three training set sizes using our balanced dataset, oversampling, and undersampling across five intrusion-detection benchmarks. Line plots show that the balanced approach consistently delivers the highest F1-scores, then oversampling, then undersampling.

More »