Peer Review History
Original SubmissionMay 14, 2020 |
---|
PONE-D-20-14331 Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging PLOS ONE Dear Dr. Park, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 07 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Julian C Hong Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. Additional Editor Comments (if provided): Thank you to the authors for their submission. The reviewers were positive regarding this submission with recommended revisions. In particular, the perspectives regarding the use of the term "validation" varies across the reviewers given their respective backgrounds, and it would be helpful to discuss both viewpoints in the manuscript to accommodate the diversity of potential interested readers. Please see reviewer responses for specific recommendations. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors are concerned about the varying definitions of the term "validation" in medical imaging AI papers. In the machine learing world, this term usually refers to a dataset used to tune model hyperparameters and decide when to stop training optimization, but in the medical world this term usually refers to the testing of an algorithm on data not used as part of the training process at all. They show that there are many papers that use the term in the first way, and many papers that use it in the second way. The analysis seems to be appropriate and carefully done, and the writing is very clear. In the spirit of the PLOS One data sharing policy, I think the authors need to share their raw dataset now, so the reviewers can better assess the results. There are no anonymity issues that would keep them from sharing this. It would be helpful to include a table showing a few specific quotes from the 201 papers that illustrate the two uses of the term "validation". For readers not familiar with machine learning, this will help them understand the problem. It would also be interesting to know if all 201 papers included a separate held-out test set, or if some never tested their model on a held-out set (which would be bad). Since it is unlikely the use of the term "validation set" to describe a tuning set will change in the machine learning world, I do not not think medical AI papers will ever have completely standardized terminology. As long as each paper carefully explains its definitions, I do not think it is a major issue. So I am not completely on board with the authors' conclusions about the need for strict guidelines on this topic. Reviewer #2: This paper provides a systematic review and meta-analysis of the term ‘validation’ as used in deep learning studies regarding medical diagnosis. There is a major inconsistency in the medical imaging community regarding the correct use of ‘validation’ in the technical sense. This paper is well written and the topic is important to readers of both PLoS One and the medical imaging community. However, there are several aspects of this paper that I feel need to be addressed prior to publication. My general comments are as follows. First, I don’t believe this is an inconsistency problem (which implies that there is a lack of standard terminology), but rather an inaccuracy problem regarding the correct use of a well-defined, technical term. As the authors correctly point out, the inconsistency of the term ‘validation’ is most likely because the term is used in general communication to refer to testing the accuracy of a completed algorithm. However, the term means something very specific when referring to the science of machine learning, where it is used to define the tuning of hyper-parameters in a model (i.e., validation the training procedure, not the generalization capacity). This definition is canonically accepted, and in general, the term ‘validation’ should not be mis-used in the colloquial sense when referring to technical work. In my opinion, we simply need to educate the medicine community to use the appropriate terminology when applying machine learning approaches, and this confusion won’t be a problem. As such, I do not agree with lines 204-206, where the authors argue, “It would be ideal if medical journals would unify the use of the term validation to refer to to the testing step instead of the tuning step”. In my opinion, there is nothing to ‘unify’ here, and instead the medical journals need to enforce the correct use of terminology. If research related to machine learning (including deep learning) is to be published in scientific journals (including imaging journals), it should use canonical terminology. Superficially changing these technical terms to colloquial terms would only cause more confusion, and question sound scientific reasoning published in the journals. Next, while I find it interesting that papers published in high impact factor journals tend to use ‘validation’ to refer to model testing (instead of the more appropriate and accurate model tuning), I don’t think this statistic should be used as a rationale to justify the proposal in Lines 206-207 as stated above. If anything, it makes more sense to test if there was a statistically significant difference in use of ‘validation’ between technical machine learning journals and imaging journals, regardless of impact factor. The authors argue at Lines 206-207, “Such a unification in terminology use may be difficult in disciplines such as machine learning, where the term is relatively widely used to refer to the tuning step”. This line similarly does not make much sense to me; machine learning is the discipline being used, regardless of the application (i.e., medical imaging or otherwise). In the context of this paper, it is all machine learning, and imaging is simply the application. It is not a one-to-one fair comparison. On Lines 219-236, the authors discuss “internal validation vs. external validation”. Here, they are using the term in the colloquial sense, but in the paragraphs that follow, they actually explain a process that is technically already defined as “internal testing vs. external testing”. By definition, a model that has been validated on an internal dataset (i.e., the learning procedure has been (cross)-validated to identify optimal hyper-parameters) has to be tested on a separate dataset not included in the original training and validation procedure, either internally (on a subset of the dataset held out) or externally (on a completely independent dataset). The notion of “internal validation vs. external validation” is ill-defined and goes against the entire purpose of this paper. In technical machine learning workflows, it is often implied that model validation is based on some form of a cross-validation procedure, where the the training of a machine learning algorithm is applied to sub-sets (e.g., folds) of the data set to best learn the optimal hyper parameters. The authors to not mention this at all, and it is important as it may help to better define and impairment proper terminology. Finally, I think that the Discussion section needs to be substantially expanded on, to include specific examples that are related to imaging. Further, I would like to see the authors also take more of a stance on promoting the appropriate use of the term ‘validation’, as stated in the above points. While the Discussion section does a nice job at documenting the different uses of the term ‘validation’, in my opinion it falls short in promoting a good solution, and should focus primarily on well-defined definitions as a reference point. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Michael Gensheimer Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
Revision 1 |
Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging PONE-D-20-14331R1 Dear Dr. Park, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Julian C Hong Academic Editor PLOS ONE Additional Editor Comments (optional): Thank you to the authors for their work and revisions. There are minor recommendations for additional citations to include in the manuscript. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: (No Response) Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: (No Response) Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: (No Response) Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: (No Response) Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #2: In the Introduction, please include some references when discussing cross-validation, and how it fits into a training/cross validation + testing experimental design. This paper is a good example of a good deep learning implementation: C. Wang, et. al. Dose-Distribution-Driven PET Image-based Outcome Prediction (DDD-PIOP): A Deep Learning Study for Oropharyngeal Cancer IMRT Application. Frontiers in Oncology. 2020. https://doi.org/10.3389/fonc.2020.01592 Also in the introduction please add cite example references when discussing the fine-tuning of a model during the validation phase. This paper is a good example of a network that was tuned during validation, using the correct terminology very well: Y. Chang, et. al. Development of realistic multi-contrast textured XCAT (MT-XCAT) phantoms using a dual-discriminator conditional-generative adversarial network (D-CGAN). 2020. Physics in Medicine and Biology. 2020 Mar;65(6). ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Michael F. Gensheimer Reviewer #2: No |
Formally Accepted |
PONE-D-20-14331R1 Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging Dear Dr. Park: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Julian C Hong Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .