Peer Review History
| Original SubmissionJanuary 28, 2024 |
|---|
|
PONE-D-24-03825Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular dataPLOS ONE Dear Dr. Uddin, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Apr 07 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Nagarajan Raju Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. Additional Editor Comments (if provided): I suggest authors to go through the reviewers comments and address them properly in the revised manuscript. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Yes Reviewer #3: No Reviewer #4: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: Yes Reviewer #3: No Reviewer #4: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes Reviewer #3: No Reviewer #4: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #2: Yes Reviewer #3: No Reviewer #4: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: 1. The study relies on tabular datasets from the UCI Machine Learning Repository and Kaggle, which are popular repositories for machine learning datasets. However, there's a risk of selection bias as these datasets may not be representative of real-world data or may have biases inherent in their collection process. 2. The study chooses five machine learning algorithms, but the rationale for selecting these specific algorithms is not thoroughly justified. While tree-based and non-tree-based algorithms are commonly used, there should be a discussion on why these particular algorithms were chosen over others and how they complement each other in addressing the research question. 3. While the study uses common performance metrics such as accuracy, precision, recall, and F1 score, there's limited discussion on why these metrics were chosen and how they align with the research objectives. Additionally, there's no mention of other important metrics such as area under the receiver operating characteristic curve (AUC-ROC) or specificity, which are crucial for evaluating classification models. 4. The study mentions using the Scikit-learn library for implementing machine learning algorithms and IBM SPSS Statistics for conducting paired-sample t-tests. While these are widely used tools, the lack of detailed information on specific parameter settings and preprocessing techniques could hinder reproducibility. Providing a clear and detailed description of the experimental setup would enhance the study's transparency and reproducibility. 5. The study splits the data into training and test sets using an 80:20 ratio and performs five-fold cross-validation during model development. While cross-validation helps assess the model's performance, there's no external validation using independent datasets. 6. The study focuses solely on classical tree-based and non-tree-based supervised ML algorithms, neglecting other important techniques such as deep learning algorithms or unsupervised learning algorithms. 7. Measurement metrics (i.e., accuracy, recall, etc.) are well-known and have been used in previous biomedical studies such as PMID: 36642410, PMID: 28155651. Therefore, the authors are suggested to refer to more works in this description to attract a broader readership. 8. The study does not consider ensemble approaches beyond Random Forest (RF), such as AdaBoost or XGBoost, which have shown significant performance improvements in various classification tasks. 9. While the study demonstrates the superiority of tree-based ML algorithms, it fails to explore the underlying reasons behind this dominance. 10. The study suggests that tree-based algorithms consistently outperform non-tree-based algorithms across all datasets, without considering potential dataset-specific factors that may influence algorithm performance. 11. While the study briefly mentions future research opportunities, such as exploring ensemble tree-based algorithms and investigating the underlying reasons for algorithmic performance, it lacks depth in discussing specific research avenues and methodologies. Reviewer #2: My Comments are as follow: 1) The study focuses on classical algorithms and may not include recent advancements in machine learning, such as deep learning techniques that have shown promise in handling tabular data. It is highly recommended to include these for more contemporary perspective. 2) Abstract does not highlight novelty of the proposed work. It’s better to add more specific details of your work. 3) Introduction is not focused and literature can be reorganised to strengthen literature review following contributions and discuss few relevant works i.e., a) A Benchmark Dataset and Learning High-level Semantic Embeddings of Multimedia for Cross-media Retrieval b) Unsupervised pre-trained filter learning approach for efficient convolution neural network c) CSFL: A novel unsupervised convolution neural network approach for visual pattern classification d) Optimization of CNN through novel training strategy for visual classification problems e) Face recognition: A novel un-supervised convolutional neural network method f) ModPSO-CNN: an evolutionary convolution neural network with application to visual recognition g) Two-stage domain adaptation for infrared ship target segmentation 4) The work does not delve deeply into the impact of feature engineering and data preprocessing steps, which are crucial for the performance of machine learning algorithms. Add a detail discussion on it. 5) While the proposed work effectively compares tree-based algorithms with non-tree-based counterparts, it might lack a deeper analysis of why certain algorithms perform better than others. A more thorough investigation into the intrinsic properties of the datasets that favour tree-based methods is needed. Reviewer #3: The paper is not scentifically sound to be published in this form. Reviewer #4: The study aims to investigate the statistical significance of the performance of decision tree-based algorithms over other classical machine learning algorithms. Some points need modification in a final version. The manuscript's idea is interesting, since it seems inappropriate for articles on machine learning algorithms not to conduct statistical comparisons between the accuracies obtained by these algorithms in classification tasks. Abstract and Introduction -"no study has shown such supremacy through a statistical significance test." and "However, none shows such supremacy by employing any statistical significance comparison, such as a t-test." It's not true; below I can indicate an example that used statistics to compare the accuracy of machine learning algorithms, and it is possible that others have proceeded similarly. I suggest the authors rewrite the sentence and indicate that it is not usual to find statistical comparisons between the classification performance of machine learning algorithms. Farias, F. M., Salomão, R. C., Rocha Santos, E. G., Sousa Caires, A., Sampaio, G. S. A., Rosa, A. A. M., Costa, M. F., & Silva Souza, G. (2023). Sex-related difference in the retinal structure of young adults: a machine learning approach. Frontiers in medicine, 10, 1275308. https://doi.org/10.3389/fmed.2023.1275308. Methods -Figure 1: Use a dot instead of a comma for decimal numbers. Include the label name for the X-axis. -It would be important to provide more information about the type of data used. Time series for subsequent feature extraction? Was feature extraction performed? If yes, how many and which ones were extracted? Were they the same for all comparisons? How many groups were used in different datasets? -Why was the t-test chosen over an analysis of variance? I think it would be more appropriate to use an analysis of variance or Kruskal-Wallis or perform a Bonferroni correction for the t-test results. -I suggest performing at least a 10-fold cross-validation. -Was there data preprocessing? Any normalization? I think it would be important. -Does it make sense to compare the performance of random forest and decision tree? Results - Indicate the standard deviation of the mean values in Table 1 and Table 3. - Table 3 shows accuracy of 1. Does it imply overfitting? Or do the groups exhibit very large differences, leading to easier classification? This debate could be done in Discussion section ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data PONE-D-24-03825R1 Dear Dr. Uddin, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at http://www.editorialmanager.com/pone/ and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Nagarajan Raju Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A Reviewer #2: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: My previous comments have been addressed, therefore, the manuscript can be accepted in this current form. Reviewer #2: All my comments are successfully answered. Please take a good look to the grammar and typos while submitting the final version of the manuscript. Reviewer #4: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #4: No ********** |
| Formally Accepted |
|
PONE-D-24-03825R1 PLOS ONE Dear Dr. Uddin, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Nagarajan Raju Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .