Peer Review History
| Original SubmissionOctober 10, 2023 |
|---|
|
PONE-D-23-33052A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvementPLOS ONE Dear Dr. Moseley, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Feb 28 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Manikkam Rajalakshmi Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. Thank you for stating the following financial disclosure: “This work has been supported by the National Science Foundation [NSF 2020026 to HNBM]. “ Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 4. Thank you for stating the following in the Acknowledgments Section of your manuscript: “This work has been supported by the National Science Foundation [NSF 2020026 to HNBM]. We thank Dr. Robert Flight for feedback on the machine learning evaluation methodology utilized in this work. We thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for their support and use of the Lipscomb Compute Cluster and associated research computing resources.” We note that you have provided funding information that is currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: “This work has been supported by the National Science Foundation [NSF 2020026 to HNBM]. “ Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This article A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement is structured properly authors did the statistical analysis. But some changes they have to do these are. 1.proof reqd the entire manuscript 2) Authors should draw graphical abstract 3) Compare your study am with existing systems Reviewer #2: The manuscript entitled “A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement” has many mistakes, authors need to rectify many portions. Should there be more precision in explaining the machine learning applications, such as the specific methodologies used by Hu et al, Baranwal et al, Yang et al, and Du et al? Could there be confirmation or additional details on the machine learning model used by Hu et al [4], especially the dataset they used and its characteristics? Should there be more information on why the 12th label was excluded in the machine learning predictions, and what criteria were used for inclusion or exclusion? Should there be confirmation or additional details on the availability of datasets, particularly the KEGG-SMILES dataset, and how researchers can access and utilize it? Could there be more explanation on the format of the KEGG-SMILES dataset, such as the structure of each line and how the SMILES data is represented? Should there be more clarification on how the dataset size varies across different publications, and what factors led to the variations? Could there be more precision in reporting metrics, especially regarding the accuracy, precision, recall, and F1 score calculations? Should there be an explanation or investigation into the reasons for the large numbers of exact duplicate entries in the KEGG-SMILES dataset, and how it might affect the validity of the results? Could there be more information on how the validity of results presented by Baranwal et al [7], Yang et al [11], and Du et al [13] is being questioned, and what specific concerns ? Should there be more details on how the scripts were modified to process and train/evaluate the model across all ten cross-validation folds, and how the original repository handled these folds? Could there be an explanation for not being able to exactly reproduce the published results due to the lack of seeding in model training? How did running each of the ten folds multiple times contribute to approximating the original results? Should there be more details on the training/model evaluation script, especially regarding the parameters used, the role of stochastic gradient descent, and the selection process for reporting scores? Should there be more information on the modifications made to Du et al's original scripts to accommodate either the KEGG-SMILES dataset containing duplicates or the de-duplicated version? Could there be more details on the statistics collected and the results derived from the scripts, especially the summary statistics and the nature of the statistical analyses performed? Should there be clarification on why the fractions in Table 5, representing the proportion of compounds in the dataset with a given pathway label, do not add up to one? How does the possibility of entries being associated with more than one pathway label contribute to this? Could there be an explanation of the rationale behind de-duplicating the dataset? How does removing recurring duplicates impact the subsequent analyses? Should there be more interpretation of the two notable observations in Table 5, especially regarding the higher percentage of duplicates in every pathway category and the lower proportion of the dataset in each pathway category in the de-duplicated dataset? Could there be more details on the suspected relation between the number of times a unique entry occurs in the original dataset and the number of pathway labels it has? How does Table 6 illustrate this relationship? Should there be clarification on how Table 6 is compressed to create Table 7? What does the resulting contingency table reveal about the relationship between the number of occurrences and the number of labels? Could there be more information on the significance of the metrics in Table 8, especially the average number of labels and the percentage of entries with multiple labels? How do these metrics differ between the non-duplicate subset, the original dataset, and the subset containing duplicates? Should there be more context on the statistical tests performed in Table 9, such as the Chi Square test, Fisher's Exact test, and Mann-Whitney U test? How do these tests contribute to the overall interpretation of the results? Could there be more interpretation of the effect sizes mentioned, such as Cramer V, common language effect size, and phi coefficient? How do these effect sizes contribute to understanding the strength of the observed relationships? Should there be more discussion on the pragmatic implications of the findings, especially the assertion that the chance that the entries were randomly duplicated is pragmatically zero? What does this imply for the validity and reliability of the dataset? Good Luck! ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement PONE-D-23-33052R1 Dear Dr. Hunter N. B. Moseley, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Manikkam Rajalakshmi Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .