Peer Review History
| Original SubmissionNovember 14, 2023 |
|---|
|
PONE-D-23-35773WilsonGenAI a deep learning approach to classify pathogenic variants in Wilson DiseasePLOS ONE Dear Dr. BK, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Mar 03 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Jianhong Zhou Staff Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 4. Thank you for stating the following financial disclosure: “Funding from the Council of Scientific and Industrial Research (CSIR) through the IndiGenApp Grant and OLP2301 Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 5. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process. 6. We note that Figure 1 in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: 1. You may seek permission from the original copyright holder of Figure 1 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” 2. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. 7. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper presents genetic variant classification using machine learning techniques, specifically TabNet and XGBoost, to classify ATP7B gene variants associated with Wilson's Disease. The study's strength lies in its robust training and validation on a high-confidence dataset and its practical application, as evidenced by successful independent verification and potential utility in clinical and research settings. I have several comments that are need to be addressed. Major: (1) Why were TabNet and XGBoost chosen as the primary models for this analysis over other deep learning or machine learning models? What specific advantages do they offer for this type of data and problem? Please provide the comparison with other relevant deep learning methods. (2) The authors mentioned that TabNet uses sequential attention for feature selection, which is instance-wise. How does this impact the generalizability of the model across different datasets or variants? Is there a risk of overfitting to specific features in the training dataset? (3) The authors note that XGBoost is effective in handling sparse data. However, it is not clear that on how this capability was specifically advantageous in current study, given the characteristics of used dataset? (4) For the models, the authors have set specific hyperparameters. The manuscript need more details about how were these parameters chosen. (5) The authors adjusted the scale_pos_weight in XGBoost for class imbalance. How significant was the class imbalance in used dataset, and how did this adjustment impact the model's performance, especially in terms of precision and recall? (6) TabNet stopped training at the 187th epoch out of a possible 1000. Was this due to an early stopping criterion based on validation accuracy? A big epoch size does not necessarily increase the accuracy of the model. How was the risk of overfitting addressed given the excessive epoch size (>100)? (7) The authors mentioned the top 20 features in feature importance plots for both models. Could the authors provide insights into what these features represent and how they contribute to the pathogenicity classification? How interpretable are these models in terms of understanding the biological significance of these features? (8) The manuscript needs more details on how specificity, negative predictive value (NPV), or area under the precision-recall curve (AUPRC) considered? (9) The test sets' composition (number of benign vs. pathogenic variants) and their source (whether they were balanced or reflective of real-world distributions) are not detailed. How might this affect the models' generalizability to other datasets or real-world scenarios? (10) The comparison with CADD and other models like RENOVO and MLVar suggests superior performance of your models. However, were these comparisons made under similar conditions (e.g., same datasets, metrics)? How do the models compare in terms of computational efficiency and scalability? (11) When reclassifying variants of uncertain significance, how did the authors validate the accuracy of these reclassifications? Is there a risk of introducing bias or errors in this process, given the uncertain nature of these variants? (12) The discussion section of the manuscript needs to be significantly expanded. These are few points the authors may consider while revising the discussion section. In discussion the authors should interpret and explain the findings, placing them in the context of the broader field. Begin by summarizing the main findings of the study, highlighting how they address the research questions or hypotheses stated in the introduction. Then, contextualize these results within the existing literature, discussing how these findings align with or differ from previous research and the potential reasons for these similarities or differences. What is the significance and implications of the results, considering both their theoretical and practical applications. Acknowledge the limitations, discussing how they might affect your findings and suggesting areas for future research to address these gaps. This section should bridge the gap between the presented research and the larger scientific community, demonstrating how this work contributes to and advances the field. Minor: (1) It would help readers to introduce Wilson's disease in the introduction section. (2) The relevance of choosing the ATP7B gene needs to be added in the introduction. (3) "Non-exonic variants and VUS were removed from the analysis and this resulted in a variant dataset of 723 unique variants, ..." Explain why. What is VUS? Expand all the abbreviations at the first use. (4) lines 106 – 113: The parameters could be presented in a table. Reviewer #2: Summary: Vatsyayan et al. applied two ML models (TabNet and XGBoost) to classify ATP7B genetic variants of Wilsen disease based on highly engineered features of each variant. Both models show very high classification accuracy, indicating the potential usability to reduce the manual evaluation efforts such as following guidelines of American College of Medical Genetics and Genomics and the Association of Molecular Pathologists. However, because of the lack of comparison with other variants classification methods e.g. disease agnostic model, it is hard to tell the novelty of WilsonGenAI and whether WilsonGenAI really adds value to Wilson's disease specific variant classification. Due to the high requirement of storage size of the WilsonGenAI, I have not evaluate the software itself. Please see the following comments for major revision: Major: 1. In introduction, please review and discuss related works. Line 174-180 should be part of the introduction. 2. In results, in addition to CADD, please compare the WilsonGenAI results with more state-of-the-art methods such as Eigen-PC, REVEL, AphaMissense, etc. Moreover, the argument of not comparing the proposed methods with RENOVO and MLVar are not convincing. Please also include these results in Table S3. Without seeing these baseline results, it is difficult to conclude TabNet and XGBoost are necessary Wilson's Disease specific model. The model comparison figure (e.g. barplot of Table S3) might be the main figure highlighted by this paper. 3. line 68, there are much more pathogenic variants than benign class. Have the authors considered whether the imbalanced distribution will influence the results? 4. line 74-76, it seems the three population used for annotation is different from the population of WilsonGen dataset. Can the authors discuss more on the potential problem of this inconsistency? 5. Figure S1. Can the authors show both training and validation loss in order to easily see whether the model is overfitting or not. 6. Figure S2. It seems the important features identied by XGBoost and TabNet are quite different but their ROC are similar. Can the authors discuss more about this? 7. Figure S3. It seems the accuracies are very unstable. Can the authors comment on this problem? 8. Figrue S4. It is weird that the total number of variants are different between the methods. 9. Since the proposed models only consider two classes, in practice, for the variants with around 0.5 predicted probability, should the user regard them as VUS? Is there a recommended threshold? This is especially important as the authors claimed WilsonGenAI could be used for clinical diagnosis. 10. For the VUS of independent datasets (line 124), is the predicted score of them around the margin of the two classes? Is there a trend or correlation between the predicted score and the 5 ordinal classes? 11. Line 189-200, is there a specific consideration to choose S855 and C271X for validation? Ideally, it would be very interesting to see if some VUS with very high predicted pathogenic probability can be validated to lead to low Copper concentration. 12. It seems the independent dataset is not available. Minor: 1. Please define abbreviation WD, VUS before using it. 2. line 149. Please round the number. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
WilsonGenAI a deep learning approach to classify pathogenic variants in Wilson Disease PONE-D-23-35773R1 Dear Dr. BK, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Muhammad Salman Bashir, M.S.C Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
| Formally Accepted |
|
PONE-D-23-35773R1 PLOS ONE Dear Dr. BK, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Muhammad Salman Bashir Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .