Peer Review History
| Original SubmissionAugust 14, 2024 |
|---|
|
PONE-D-24-32590 Extraction of clinical data on major pulmonary diseases from unstructured radiologic reports using a large language model PLOS ONE Dear Dr. Choi, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 28 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Harpreet Singh Grewal Academic Editor PLOS ONE Journal Requirements: Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Before we proceed with your manuscript, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible. We will update your Data Availability statement on your behalf to reflect the information you provide. Additional Editor Comments: <ol><li> Handling of Medical Abbreviations and Acronyms: The manuscript should explicitly discuss whether and how the LLMs differentiated between medical abbreviations (e.g., "PNA" for pneumonia) and their full forms. This is crucial for understanding the model’s capability in interpreting radiologic reports where abbreviations are commonly used.<li> Addressing Clinical Uncertainty: The manuscript must address how the LLMs handled clinical uncertainty present in radiologic reports, such as ambiguous diagnoses (e.g., "emphysema vs. COPD" or "lung cancer vs. metastasis"). A discussion on how the models interpreted these uncertainties and whether they required clinical correlation should be included.<li> Interobserver Pulmonologist Qualifications: The qualifications of the pulmonologists involved in setting the gold standard need to be clearly stated. This includes their experience, training, and roles within their institutions. Additionally, the manuscript should discuss whether the LLMs accounted for differences between trainee and attending radiologist reads, which could impact the results.<li> Error Rate of GPT-4: The reported error rate of 0% for GPT-4 in generating JSON formats seems implausible. The authors should re-examine this claim, providing a more nuanced explanation of what "error-free" means in this context and whether any minor or non-critical errors were overlooked.<li> Introduction of JSON Format: The manuscript introduces the concept of JSON format without prior explanation, which may confuse readers. The authors should include a brief introduction to JSON formatting, particularly explaining its relevance and use in the study.<li> Generalizability of Results: The study’s results are based on radiologic reports from Korean institutions, which may not generalize to other regions with different reporting styles. The manuscript should discuss this limitation and suggest future validation on more diverse datasets, including those from different languages and cultural contexts. The style of English usage in Korea may not be generalized elsewhere (especially in native English speaking countries). This remains a potential limitation of this study and should be discussed, especially when LLMs such as ChatGPT 4 seems to have heterogeneous performance in language translations in resource poor languages, as demonstrated in this recent study https://www.mdpi.com/2075-4426/14/9/923 <ol><li> Challenges with ILD and Pneumonia Classification: The manuscript should expand on why the models, particularly GPT-4, had lower accuracy in diagnosing conditions like ILD and pneumonia. A discussion about the radiologic features of these diseases and the need for clinical correlation would help contextualize the models' performance.<li> Clarification on Data Exclusion Criteria: The authors should provide a rationale for excluding reports with minimal findings, such as "no active lung lesions." Explaining how this exclusion might affect the model's learning and generalizability will strengthen the study's methodology section. This can also be added to the limitation section.<li> Broader Dataset and Multimodal AI Research: Future research should consider incorporating datasets from multiple regions with diverse reporting styles. Additionally, exploring multimodal AI approaches that include image data alongside text could enhance the models' ability to classify complex conditions like ILD and pneumonia.<li> Practical Implementation and Clinical Impact: A discussion on how these LLMs could be integrated into clinical workflows, particularly in reducing manual chart reviews without sacrificing diagnostic accuracy, would add practical value to the study [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors present an interesting manuscript to use a large language model to extract data on major pulmonary diseases from unstructured radiologic reports. The authors found that GPT-4 was the most accurate. Regarding statistics, the authors need to explain in explicit detail how sensitivity, specificity, and accuracy were quantified in this study. This is not found in the statistical analysis section. Is this large language model able to differentiate between acronyms vs. spelled out words (PNA for pneumonia, TB for tuberculosis, etc.) How does this model account for clinical uncertainty? If a radiologist reads "emphysema vs. COPD" or "lung cancer vs. metastasis"...can the model account for these clinical obscurities as they often required clinical correlation. What are the qualifications of the interobserver pulmonologists? Does the language model take into account the read of a trainee vs. attending? Perhaps this model can help trainees better understand their shortcomings when over-reads by attendings have different reads or diagnoses. The error rate of 0% from GPT-4 seems impossible as even attending pulmonologists are wrong a small percentage of the time when they read these images. There is also no introduction on JSON format before it is introduced in table 3 and makes it difficult to understand the meaning of that table. The more applicable use of AI will be the interpretation of the images themselves vs. just the radiology reports. Reviewer #2: This manuscript presents a compelling study on the application of large language models (LLMs), such as GPT-4, GPT-3.5, and Google's Gemini Pro, for extracting clinical data on major pulmonary diseases from unstructured radiologic reports. The study's design, methodology, and results highlight the potential for AI to enhance radiologic data interpretation, addressing a critical challenge in the healthcare field. **Strengths:** 1. **Novel Application of LLMs**: The study explores a novel use case for LLMs, showing their capacity to accurately interpret radiological reports. This is a valuable addition to the growing literature on AI in healthcare, particularly in contexts where manual review is labor-intensive. 2. **High Accuracy and Specificity**: The results demonstrate that GPT-4, in particular, performed with impressive accuracy (0.89–0.99) and sensitivity across a range of pulmonary conditions, such as pleural effusion and emphysema. This confirms the utility of LLMs in clinical settings, especially for standardizing unstructured data. 3. **Ethical Considerations and Methodology**: The inclusion of ethical approval from multiple institutions and a clear explanation of the retrospective nature of the study, combined with the involvement of experienced pulmonologists, gives the study a robust foundation. **Areas for Improvement:** 1. **Limited External Validation**: The study focuses on radiologic reports written by Korean radiologists, which may limit the generalizability of the findings to other regions where radiologic reporting styles differ significantly. Future research should consider validating these findings on more diverse datasets to assess whether the models perform consistently across languages and cultural differences in medical practice. 2. **Interpretation Challenges for Specific Conditions**: The models, particularly GPT-4, showed lower performance for diseases like interstitial lung disease (ILD) and pneumonia. It would be beneficial to provide a more in-depth discussion on why these specific conditions pose a challenge for LLMs, potentially incorporating additional medical context. 3. **Data Limitations**: The exclusion of cases with "no active lung lesions" or "no significant interval changes" may introduce a bias in the dataset. It would be helpful to include a rationale for this decision and how it may affect the study's overall results. Additionally, clarifying whether the inclusion of normal cases might help improve the models’ learning and prediction capabilities would be useful. **Suggestions for Further Research:** 1. **Broader Dataset and Multimodal AI**: Incorporating datasets from multiple regions with diverse radiologic reporting styles and languages would provide a more comprehensive understanding of the LLMs' capabilities and limitations. Furthermore, exploring multimodal AI approaches, which incorporate image data along with text, could enhance the classification of complex conditions like ILD and pneumonia. 2. **Clinical Impact and Practical Implementation**: Future studies could explore the practical application of LLMs in clinical workflows, particularly how they can complement the work of radiologists. Assessing how these models can reduce time spent on manual chart reviews without sacrificing diagnostic accuracy will be critical for real-world implementation. **Conclusion:** The manuscript contributes significantly to the literature on AI in radiology, offering valuable insights into the application of LLMs for structured data extraction from unstructured reports. While there are some limitations regarding dataset diversity and the challenges posed by certain conditions, the study lays the groundwork for further advancements in AI-driven healthcare solutions. ** Minor Revision Suggestions:** 1. **Expand on the challenges with ILD and pneumonia classification** **Section: Results (Page 8, Lines 169-170)** Add more details explaining the reasons behind the lower accuracy in diagnosing interstitial lung disease (ILD) and pneumonia. It would be helpful to mention that both diseases often present with overlapping radiologic features that require clinical correlation for accurate diagnosis. You could also discuss how human expertise still plays a role in diagnosing these complex conditions and why AI models may struggle in such cases. **Example addition:** “The lower performance for conditions like ILD and pneumonia could be attributed to the complex and often overlapping radiological features of these diseases. For instance, ILD presents with varied findings such as interlobular septal thickening and fibrosis, which can be easily confused with other conditions like pulmonary edema or post-radiotherapy fibrosis. This emphasizes the need for clinical correlation, which AI models like GPT-4 currently lack, thus affecting their sensitivity and specificity in these cases.” 2. **Clarification on data exclusion criteria** **Section: Methods (Page 7, Lines 129-132)** Add a brief explanation as to why reports with "no active lung lesions" or other minimal findings were excluded. This will help address potential concerns about selection bias. **Example addition:** “These reports were excluded to focus the analysis on instances where pulmonary diseases were present, thus allowing the models to evaluate complex cases more effectively. While this exclusion helps concentrate the model's learning, it may also introduce selection bias, and future research should consider including such cases to improve the model’s generalization.” 3. **Include a note on dataset generalizability** **Section: Discussion (Page 13, Lines 267-270)** Briefly discuss the limitations of using a dataset that originates solely from Korean radiologists. Mention the need for future validation on a broader scale. **Example addition:** “While this study provides a solid foundation for evaluating LLMs in radiology, the dataset's origin—limited to Korean radiologists—may affect the generalizability of the findings. Different radiology practices and reporting styles across countries may influence the model’s performance. Therefore, validation on more diverse datasets, including reports written in various languages and medical contexts, is necessary to assess the broader applicability of the models.” 4. **Clarification on JSON format errors** **Section: Results (Page 10, Lines 232-237)** You mention error rates for JSON formatting but do not provide an explanation of the impact of these errors. A brief clarification would help contextualize this finding. **Example addition:** “While GPT-4 demonstrated a higher accuracy in producing error-free JSON format, the few errors observed in Gemini Pro 1.0 and GPT-3.5 primarily involved incorrect formatting or incomplete answers for specific diseases. These errors, although minor, highlight the importance of refining model outputs for seamless integration into clinical systems.” ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Extraction of clinical data on major pulmonary diseases from unstructured radiologic reports using a large language model PONE-D-24-32590R1 Dear Dr. Choi, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Harpreet Singh Grewal Academic Editor PLOS ONE Additional Editor Comments (optional): please expand JSON at its first mention. this has not been addressed yet. Reviewers' comments: |
| Formally Accepted |
|
PONE-D-24-32590R1 PLOS ONE Dear Dr. Choi, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Harpreet Singh Grewal Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .