Peer Review History
| Original SubmissionOctober 30, 2024 |
|---|
|
PCOMPBIOL-D-24-01878 NextVir: Enabling Classification of Tumor-Causing Viruses with Genomic Foundation Models PLOS Computational Biology Dear Dr. Robertson, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process, in particular the problems of contamination that might occur in the dataset as pointed out by reviewer #1. Please submit your revised manuscript within 60 days Mar 24 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Lin Hou Academic Editor PLOS Computational Biology Arne Elofsson Section Editor PLOS Computational Biology Journal Requirements: 1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full. At this stage, the following Authors/Authors require contributions: John Robertson. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form. The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions 2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 3) Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150-200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines: https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission 4) We do not publish any copyright or trademark symbols that usually accompany proprietary names, eg ©, ®, or TM (e.g. next to drug or reagent names). Therefore please remove all instances of trademark/copyright symbols throughout the text, including: - © on page: 1. 5) Your manuscript is missing the following section: Discussion. Please ensure all required sections are present and in the correct order. Make sure section heading levels are clearly indicated in the manuscript text, and limit sub-sections to 3 heading levels. An outline of the required sections can be consulted in our submission guidelines here: https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission 6) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 7) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list. 8) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. 1) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)." 2) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.". Reviewers' comments: Reviewer's Responses to Questions Reviewer #1: The manuscript by Robertson et al. presented a multi-class viral classification tool that adapts three genomic foundation models to detect oncoviruses from sequencing reads. The results showed that foundational models can be fine-tuned to perform viral classification at high accuracy, with performance comparable to well-established binary viral classifiers. However, several major issues should be carefully addressed before it can be published as a rigorous research paper, listed below. 1. NextVir is designed to identify oncoviruses from cancer cells. However, there are also diverse bacteria (Tekle et al., 2023) that are known or suspected to cause cancer. In addition, samples can be easily contaminated by environmental bacteria during sampling, processing, DNA extraction, library preparation, and sequencing. Thus, these scenarios should be considered during the design of NextVir or other similar tools. 2. The training dataset was semi-experimental, so it may not perfectly emulate actual sequencing data, particularly the diversity, highly variable coverage, and complexity of actual samples. Actually, the accuracy dropped significantly with the increase of mutations, and particularly indels, which are typical for real samples. 3. When benchmarking NextVir with other classifiers, the authors should A) use golden standard real-world metagenomes of human-associated, soil, marine, and other well-studied environments to capture the complexity of metagenomic samples. B) the authors should benchmark them at different read/contig lengths since the reads can be generated from different sequencing platforms, and it’s a common practice to detect viruses from assembled contigs, particularly those with long sequence lengths but low sequencing coverage. C) the authors should also compare the computational and memory costs when benchmarking these tools, which is vital to be adopted by the community. Reviewer #2: The authors present a novel method for multi-class oncoviral read detection and classification utilizing next-generation sequencing data. The proposed method, NextVir, builds on several recent genomic foundation models, including DNABERT-S, NucleotideTransformer, and HyenaDNA, by fine-tuning and adapting their read embeddings to achieve accurate classification of reads based on their origin. Notably, NextVir also achieves state-of-the-art performance in binary viral detection tasks. Overall, the proposed framework has a superior performance and can solve not only the task of oncolytic virus read classification, but also various other problems in genomics. 1. In the Methods section, under Input Preprocessing, please provide a more detailed explanation of the padding method used and indicate whether it will have an effect on the final result. 2. The authors propose a NextVir approach based on three foundation models, DNABERT-S, NucleotideTransformer, and HyenaDNA. It is recommended to discuss the advantages and disadvantages of the three models in depth, describing the scenarios and potential limitations of each model. 3. It is recommended that the authors explore the possibility of extending the NextVir approach to other foundational models. This will help to assess the generalizability of NextVir and its potential for application in different genomics tasks. 4. It is recommended that the authors further discuss whether it is possible to further improve the performance of the model under low coverage conditions. 5. The authors mention that despite the scarcity of MCV reads, its genome length is much shorter than that of HHV-8, which may account for the more stable accuracy of MCV classification. It is recommended that the authors further explore the specific impact of genome length on model performance. Reviewer #3: Reviewer’s comment: Summary: This paper presents NextVir, a novel framework for classifying oncogenic viruses using genomic foundation models. The study evaluates three advanced models—DNABERT-S, Nucleotide Transformer, and HyenaDNA—to distinguish between different viral genomes, including tumor-associated viruses such as HHV-8 and HPV-16. The models are tested on simulated and real genomic datasets, emphasizing their ability to handle indels, substitutions, and long-context dependencies.Key findings include DNABERT-S outperforming other models in accuracy and robustness due to its advanced fine-tuning with Low-Rank Adaptation (LoRA). Additionally, the paper highlights the computational trade-offs between accuracy and efficiency, with HyenaDNA showcasing speed advantages but struggling with certain viral classes. The robustness of NextVir to mutations and its potential application in metagenomic sequencing are discussed, positioning it as a promising tool for viral research and diagnostics. However, the study identifies limitations such as the reliance on simulated data and emphasizes the need for further validation on clinical samples. Major comments: 1. While the use of DNABERT-S, Nucleotide Transformer, and HyenaDNA is explained, no comparison is provided against other potential foundational models. Discuss why alternatives like DeepVirFinder were excluded. 2. The reliance on simulated reads raises concerns about real-world generalizability. Clarify how well the simulated dataset mimics actual tumor samples and address potential biases. 3. Beyond accuracy, consider reporting F1 score, precision, and recall to provide a more comprehensive evaluation, especially for low-representation classes like HHV-8. 4. Discuss in more depth why some models are more robust to indels and substitutions. Relate this to tokenization schemes and architectural differences. 5. Include comparisons to simpler deep learning models (e.g., CNNs or LSTMs) trained on the same dataset to contextualize the performance gains of NextVir. 6. Clarify how the random and context-supported data splits were constructed and whether they accurately reflect sequencing scenarios in real-world studies. Minor comments: 1. Tables: Include confidence intervals or standard deviations in accuracy results to highlight variability across runs. 2. Inconsistent verb tenses: In the "Results" section, the tense alternates between present and past. Ensure consistency by choosing either present (preferred for describing findings) or past tense. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: None Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Jinhao Bi [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility: To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols |
| Revision 1 |
|
PCOMPBIOL-D-24-01878R1 NextVir: Enabling Classification of Tumor-Causing Viruses with Genomic Foundation Models PLOS Computational Biology Dear Dr. Robertson, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 30 days Aug 19 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, Lin Hou Academic Editor PLOS Computational Biology Arne Elofsson Section Editor PLOS Computational Biology Journal Requirements: 1) Please upload the figures in the online submission form in a correct numerical order. 2) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. 1) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.". Reviewers' comments: Reviewer's Responses to Questions Reviewer #1: The manuscript has been comprehensively revised with additional supporting materials. I'm now satisfied with the current version and recommend acceptance as it is. Reviewer #2: We appreciate the authors for revising the manuscript. Compared with the initial version, the manuscript has been significantly enhanced in multiple dimensions, particularly in the detailed elaboration of the NextVir model's performance, which makes the content presentation more coherent and clear. However, upon reviewing the revised manuscript, we identified that some issues remain inadequately addressed. To ensure the high quality of the article, we suggest the authors further refine it accordingly. In the response letter, the authors stated that they had "dedicate the second paragraph of the new Discussion section to comparing the three foundation models used in our work". Nevertheless, the revised content remains at a general level of discussion without providing specific model evaluation metrics or citing authoritative scientific references. We recommend that the authors supplement concrete model evaluation indicators or incorporate relevant authoritative research when revising, thereby strengthening the objectivity and persuasiveness of the comparison. Reviewer #3: The author answered my doubts. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: None Reviewer #3: None ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Peng Wang Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility: To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols |
| Revision 2 |
|
PCOMPBIOL-D-24-01878R2 NextVir: Enabling Classification of Tumor-Causing Viruses with Genomic Foundation Models PLOS Computational Biology Dear Dr. Robertson, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 30 days Aug 31 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, Lin Hou Academic Editor PLOS Computational Biology Arne Elofsson Section Editor PLOS Computational Biology Journal Requirements: 1) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list. Reviewers' comments: [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility: To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols |
| Revision 3 |
|
Dear Mr. Robertson, We are pleased to inform you that your manuscript 'NextVir: Enabling Classification of Tumor-Causing Viruses with Genomic Foundation Models' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Lin Hou Academic Editor PLOS Computational Biology Arne Elofsson Section Editor PLOS Computational Biology *********************************************************** |
| Formally Accepted |
|
PCOMPBIOL-D-24-01878R3 NextVir: Enabling Classification of Tumor-Causing Viruses with Genomic Foundation Models Dear Dr Robertson, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .