Adaptive TreeHive: Ensemble of trees for enhancing imbalanced intrusion classification

Mahbub E. Sobhani; Anika Tasnim Rodela; Dewan Md. Farid

doi:10.1371/journal.pone.0331307

Peer Review History

Original SubmissionMay 15, 2025
16 Jun 2025 Decision Letter - Issa Atoum, Editor PONE-D-25-26240Adaptive TreeHive: Ensemble of Trees for Enhancing Imbalanced Intrusion ClassificationPLOS ONE Dear Dr. Farid, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== Thank you for your submission. Both reviewers found the topic relevant and the proposed method promising, but several critical revisions are needed to strengthen the paper’s validity and clarity: Evaluation Metrics and Statistical Rigor Reviewer 1 highlights the lack of essential metrics, such as precision, recall, F1-score, and class-wise performance for minority classes (e.g., U2R, R2L), which are crucial for validating performance on imbalanced datasets. Reviewer 2 reinforces this need and also notes the absence of statistical significance testing. Comparative Scope and Model Description Both reviewers stress the limited comparison to classical ensemble methods. Reviewer 1 recommends including deep learning models (e.g., BiLSTM, CNN-GRU), while Reviewer 2 suggests at least discussing how TreeHive compares conceptually. Additionally, Reviewer 1 finds the architecture insufficiently described. Reviewer 2 requests more details on how datasets were balanced. Generalization, Efficiency, and Presentation Reviewer 1 raises concerns about overfitting, especially on high-dimensional datasets, and recommends including learning curves, confusion matrices, or other diagnostics. Reviewer 2 suggests providing runtime or complexity comparisons to support efficiency claims. Both reviewers recommend enhancing result presentation with visualizations and improving overall clarity. Please address all comments constructively and revise the manuscript accordingly. Ensure all figures and tables are embedded within the main text. Provide a clear and reasonable justification if any comment cannot be addressed. Responses should follow the journal’s guidelines and be submitted in a separate supplementary file, with edits highlighted in yellow. ============================== Please submit your revised manuscript by Jul 31 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Issa Atoum Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. Additional Editor Comments (if provided): [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: I Don't Know ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: No ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: 1- While the manuscript reports very high classification accuracies across all datasets (99.96% on NSL-KDD, 95.54% on CICDDoS2019), it fails to consistently report other crucial metrics such as precision, recall, F1-score, and especially class-wise recall for minority classes. This omission obscures the true performance on rare attacks, which is critical in imbalanced intrusion datasets. 2- The comparison of Adaptive TreeHive with baseline models lacks any statistical validation. The authors report percentage improvements but provide no confidence intervals, standard deviations, or hypothesis testing to confirm the observed performance gains are statistically significant and not due to random variation. 3- Although the paper claims to address imbalanced classification, per-class metrics for rare classes such as U2R or R2L in NSL-KDD are not reported. These classes are historically difficult to classify, and without class-specific recall or detection rates, the claim of handling imbalanced data remains insufficiently validated. 4- The work compares Adaptive TreeHive only with traditional ensemble methods like Random Forest, Bagging, and AdaBoost. It omits comparison with modern deep learning models such as BiLSTM, CNN-GRU hybrids, and attention-based methods, which are now commonly used on CIC-IDS2017 and CSE-CIC-IDS2018 datasets. This limits the positioning of the proposed method within the state-of-the-art. 5- Despite high reported accuracies, there is no detailed analysis of potential overfitting, especially on high-dimensional datasets like CIC-IDS2017 (78 features) and CSE-CIC-IDS2018. Training and test accuracy curves, confusion matrices, or learning curves are not shown to substantiate the model's generalization ability. 6- The construction of the “Adaptive TreeHive” architecture remains vaguely described. It is not clearly explained how the ensemble is built, how informative instances are selected, or how dimensionality reduction is integrated. Key hyperparameters and internal structure are only briefly mentioned in tabular form without accompanying rationale or ablation studies. 7- While Random Forest and AdaBoost use Decision Tree (C4.5) as the base classifier, the Bagging ensemble uses Naïve Bayes. The choice of inconsistent base classifiers across ensemble methods compromises the fairness of the comparisons and should be better justified or unified. Reviewer #2: Please address these comments in your revised manuscript to strengthen the technical rigor, reproducibility, and clarity of your work. Addressing these points will enhance the paper’s overall contribution and impact in the field of intrusion detection. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Attachments Attachment Submitted filename: Comments for Authors.docx https://doi.org/10.1371/journal.pone.0331307.r001
Revision 1
31 Jul 2025 Author Response Reviewer #1, Concern #1 (Clarify Dataset Creation and Informative Instance Selection): Author response: Thank you for this crucial question. We define "informative instances" as the data points most representative of their respective classes, identified through a two-phase balancing process, updated details in the "Data Balancing" subsection of our paper. Informative Instance Selection via Clustering: First, for each dominant class in a dataset, we use K-Means clustering (with k=1) to find the class centroid. Instances are then ranked based on their Euclidean distance to this centroid. We select the instances closest to the centroid as the most "informative" (Ψ), as they best represent the core characteristics of that class. Scattered instances, which are farther from the centroid, are discarded to reduce noise and redundancy. This process effectively undersamples the majority of classes while preserving their essential patterns. Balancing with SMOTE: After selecting informative instances, we address the remaining class imbalance, particularly for minority classes, by applying the Synthetic Minority Over-sampling Technique (SMOTE). SMOTE generates synthetic samples for the minority classes by interpolating between existing instances and their nearest neighbors. This combined approach ensures our final datasets are not only balanced but are also built from high-quality, representative instances, which is vital for robust model training and reproducibility. Reviewer #1, Concern #2 (Hyperparameter Justification.): Author response: We appreciate the opportunity to clarify our methodology. The hyperparameters listed in Table 2 was not chosen arbitrarily but was the result of a systematic tuning and validation process to ensure optimal performance. Within the "Adaptive TreeHive" section, we provide justifications for these choices. For example: ● The number of feature groups K used in our data randomization strategy was determined to be optimal at k=15-40 through ablation studies, as this value offered the best trade-off between feature coverage and redundancy reduction. ● The accuracy threshold ϕ for selecting classifiers into our final ensemble was set to ϕ > 0.5 based on cross-validation, as this value ensured the inclusion of reliable classifiers without sacrificing ensemble diversity. ● Tree growth parameters, such as a maximum depth (dmax = 15) and a minimum of 5 samples per leaf node, were set to prevent model overfitting while maintaining discriminative power. This tuning process was integral to guiding the algorithm’s optimization and achieving the reported results. Reviewer #1, Concern #3 (Comparative Analysis with Modern Deep Learning Models.): Author response: We thank the reviewer for this important point. We conducted a direct comparative analysis between our proposed Adaptive TreeHive and established deep learning models. This comparison is presented in our "Ablation Study" section and detailed in Table 8. The results show that Adaptive TreeHive is highly competitive and often superior to the deep learning baselines (BiLSTM and CNN-GRU). ● Notably, our model surpassed the next-best model, CNN-GRU, by significant accuracy margins of 5.16% on UNSW-NB15, 2.61% on CIC-IDS2017, and 2.95% on CICDDoS2019. ● Beyond accuracy, we justify our focus on tree-based models by highlighting their computational efficiency. Deep learning architectures like BiLSTM and CNN-GRU are notoriously resource-intensive and require prolonged training times. In contrast, our tree-based framework is inherently more lightweight, enabling faster training and inference without specialized hardware, making it a more pragmatic and scalable solution for real-world deployment. Reviewer #1, Concern #4 (Computational Complexity and Runtime Evaluation.): Author response: We acknowledge the reviewer's feedback. While we did not include a specific table of execution times, our claim of reduced computational requirements is based on two core aspects of our model's design. 1. Built-in Dimensionality Reduction: Our methodology incorporates a unique data randomization and feature grouping strategy. As detailed in the "Adaptive TreeHive" section, each decision tree in the ensemble operates on a reduced feature subset (approximately 35% of the original features), significantly lowering the computational load for each base learner. 2. Inherent Efficiency of Tree Ensembles: As discussed in the "Ablation Study," our tree-based ensemble framework is inherently more lightweight and computationally efficient compared to deep learning architectures like BiLSTM and CNN-GRU, which are known to be resource-intensive and demand substantial computational overhead. Further, the process of selecting only informative instances reduces the overall size of the training data, which directly leads to faster training times. Reviewer #1, Concern #5 (Visualization and Result Presentation.): Author response: We agree that strong visualizations are key to understanding model performance. In our updated manuscript, we have included detailed confusion matrices for each of the five benchmark datasets to provide these insights: ● Figure 3: NSL-KDD dataset ● Figure 4: UNSW-NB15 dataset ● Figure 5: CIC-IDS2017 dataset ● Figure 6: CSE-CIC-IDS2018 dataset ● Figure 7: CIC-DDoS2019 dataset These figures are supported by an extensive, multi-paragraph analysis within the "Experimental Results" section. This analysis delves into the model's class-by-class performance, with a particular focus on its exceptional success in detecting extremely rare minority attacks, which directly highlights the model's strength and its robustness against class imbalance. Reviewer #1, Concern #6 (Writing and Organization): Author response: We have separated the “Limitations and future work” and “Conclusion” sections in the updated manuscript. Reviewer #1, Concern #7 (Reproducibility): Author response: The Experimental Setup section explains the explicit details about the hardware and software environment used in our experiments. Reviewer #1, Concern #8 (Minor Points): Author response: We have corrected the inconsistent usage of "base classifier" and "base classifier" and have corrected throughout the revised manuscript to ensure consistency. Furthermore, we have performed a thorough check entire paper to check all figure labels, captions, and in-text references for clarity, accuracy, and completeness. Attachments Attachment Submitted filename: Response to Reviewers PONE-D-25-26240.pdf https://doi.org/10.1371/journal.pone.0331307.r002
14 Aug 2025 Decision Letter - Issa Atoum, Editor Adaptive TreeHive: Ensemble of Trees for Enhancing Imbalanced Intrusion Classification PONE-D-25-26240R1 Dear Dr. Farid, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Issa Atoum Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: (No Response) ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: (No Response) ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: (No Response) ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: (No Response) ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have successfully addressed all critical points raised in the previous review. The revisions enhance the clarity and robustness of the study's findings. ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No ******** https://doi.org/10.1371/journal.pone.0331307.r003
Formally Accepted
Acceptance Letter - Issa Atoum, Editor PONE-D-25-26240R1 PLOS ONE Dear Dr. Farid, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Issa Atoum Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0331307.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .