ChatGPT-4o can serve as the second rater for data extraction in systematic reviews

Mette Motzfeldt Jensen; Mathias Brix Danielsen; Johannes Riis; Karoline Assifuah Kristjansen; Stig Andersen; Yoshiro Okubo; Martin Grønbech Jørgensen

doi:10.1371/journal.pone.0313401

Peer Review History

Original SubmissionAugust 2, 2024
29 Aug 2024 Decision Letter - Weiqiang (Albert) Jin, Editor PONE-D-24-28092ChatGPT-4o Can Serve as the Second Rater for Data Extraction in Systematic ReviewsPLOS ONE Dear Dr. Motzfeldt Jensen, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== ACADEMIC EDITOR: After going through the manuscript and the reviewers' comments, the most reviewers and I suggest a Minor Revision, Please review the following details. ============================== Please submit your revised manuscript by Oct 13 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Weiqiang (Albert) Jin, Ph.D. Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process. 3. Please include a separate caption for each figure in your manuscript. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments: Dear authors: After going through the manuscript and the reviewers' comments, I suggest a Minor Revision (as a associate editor). The study is promising and offers some great insights into using AI / gpt-4o for data extraction in systematic reviews, but there are a few areas that need a bit more work. The key points to address include adding more details about the methods—like how the sample size was chosen, the specific prompts used, and how the human reviewers did their work. Also, expanding the results and discussion sections to clarify the types of errors found, ways to reduce them, and any potential limitations would help. It's also important to make sure the manuscript follows the proper guidelines for reporting AI research and to polish the language for better readability. Please consider all the suggestions given by the four reviewers and revise it, and provide us with a one by one detailed response letter. regards, Weiqiang Jin. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly Reviewer #3: Yes Reviewer #4: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Dear Editor and Authors Thank you for submitting your manuscript. I have carefully read through the entire paper and believe this study presents an innovative approach with notable potential for enhancing data extraction processes in systematic reviews. The experimental design is robust, and the results are promising.However, there are a few areas where the manuscript could benefit from minor revisions to further improve clarity and impact.In summary, I recommend minor revision.Below are my specific comments, which I hope will assist you in revising and improving the paper. 1.Insufficient Detail in Background Description: The background section mentions that systematic reviews are a key tool for translating clinical trial evidence into guidelines but does not sufficiently explain why data extraction is a crucial step in systematic reviews and how it impacts the quality of the review. It is recommended to add a discussion on the limitations of current manual data extraction methods and the potential benefits of AI intervention in this section. 2.Results Section Needs More Comparative Detail: The results section compares data extraction by ChatGPT-4o with that of human reviewers but does not provide detailed information about the data extraction process used by human reviewers. To enhance the persuasiveness of the comparison, it is suggested to include detailed information about the background and data extraction criteria of human reviewers to ensure fairness and accuracy. Additionally, a description of the human reviewers' workflow could help better understand the differences and advantages between AI and human review processes. 3.Emphasis on Study Limitations in Discussion: While the conclusion affirms the potential of ChatGPT-4o as an auxiliary tool for systematic reviews, the study does not address its limitations, such as the potential for AI to overlook complex contextual relationships or inherent biases. It is recommended to discuss these potential limitations in the discussion section and explore ways to address these issues in future research. 4.Accuracy of Keywords: The keywords cover the core concepts of the study, but it may be beneficial to include more targeted terms such as “automated data extraction” and “machine learning” to enhance search relevance. 5.Data Sources and Representativeness: The study mentions randomly selecting 11 articles to test the capabilities of ChatGPT-4o but does not elaborate on the specifics of the random selection method or the representativeness of the sample. It is recommended to provide more information on how the sample was ensured to be representative, such as whether different types of studies or data reporting quality were considered, to ensure the generalizability of the test results. Incorporating these adjustments will enhance the manuscript's overall quality and provide readers with a clearer understanding of the study's implications and applications. Thank you for considering these suggestions. Best regards, Reviewer #2: This manuscript contributes valuable insights into the use of AI, specifically ChatGPT-4o, for data extraction in systematic reviews. However, there are some points that need further clarification and refinement. 1. External Validity: 1.1 The study's focus on a single ongoing systematic review raises concerns about the generalizability of the findings. Systematic reviews can encompass a wide range of research questions, including interventional, diagnostic, and prognostic studies, etc. It would be beneficial to consider whether the inclusion of different types of studies or systematic reviews might improve the validity and reproducibility of ChatGPT-4o's data extraction capabilities. 1.2 The sample size of 11 studies in one systematic review used to evaluate validity is relatively small. This limited sample size may not be sufficient to draw robust conclusions about the general applicability of ChatGPT-4o. Further justification for the adequacy of this sample size would strengthen the manuscript. 2. Introduction: 2.1 The Introduction would benefit from a more comprehensive review of existing literature on the use of large language models (LLMs) for data extraction. For example, including references such as DOI:10.1016/j.ymeth.2024.04.005, among others, would help contextualize the novelty and contributions of the current study. 3. Methods: 3.1 ChatGPT-4o data extraction： The manuscript lacks details on the specific prompts used for data extraction with ChatGPT-4o. Providing this information would enhance the transparency and reproducibility of the study. 3.2 Comparison of data extracted by ChatGPT 4o with the reference standard: It is unclear how the comparison between ChatGPT-4o and human data extraction was conducted. Specifically, were the evaluations conducted by one or two researchers? Clarifying this aspect of the methodology is important for assessing the rigor of the study. 3.3 Validity goals and expected utility of ChatGPT as a data extraction tool: Providing a clearer reference to the categories for validity assessment would improve the clarity of the methods. 3.4 The manuscript mentions 22 data points extracted from each study but does not specify what these data points are. Detailing the specific data points would aid readers in understanding the scope of the data extraction process. 4. Results: 4.1 Consider providing examples of the typical data extraction results in the appendix. This would allow readers to better assess the performance of ChatGPT-4o. 4.2 The manuscript mentions that 5.2% of the data extracted by ChatGPT-4o was incorrect. A brief discussion of the types of hallucinations observed and potential strategies for identifying or mitigating these errors would be useful. Other Comments: 5. Adherence to Reporting Guidelines: The manuscript should align with reporting guidelines for AI-related clinical research, such as those mentioned in Flanagin et al. (2024), "Reporting Use of AI in Research and Scholarly Publication-JAMA Network Guidance." Providing details on the prompts used, the time frame of ChatGPT-4o usage, and other relevant methodological aspects would be beneficial. 6. Language and Clarity: The manuscript contains some language that could be further refined for clarity. For example, the sentence "this finding was supported when looking across information domains where agreement was lower for outcome data and a larger proportion of information was not reported" could be simplified for better readability. Reviewer #3: This is an interesting and novel study exploring the use of AI, LLM specifically, in collecting data for systematic reviews. However, I have a few comments that I would like the authors of this study to address. 1. Please provide how you arrived that a sample size of 11 would be adequate to test the validity and reproducibility of ChatGPT 4o. 2. I would be interested to know the agreement rate between the two authors? Suppose there was frequent disagreement between the two assessors, requiring a third author to intervene. Does this mean the data extracted by human standards was subjective and hence assessing ChatGPT against this subjective measure would have not shown its true capacity for extracting data from full-text papers? 3. How did you arrive at the prespecified assessment of the validity? Were there any previous similar studies using this model? 4. What was the prespecified goal of determining acceptable reproducibility? 5. All figures and tables should have a legend summarising and explaining their results. All figures and tables from this study are missing legends. Graphs are also missing titles. Reviewer #4: 1. The study notes that the questions were written without special knowledge of LLMs, which reflects an average user’s experience. However, exploring how optimized prompts could enhance AI performance might provide valuable insights. This could lead to recommendations on best practices for researchers using ChatGPT-4o. 2. The study mentions that 5.2% of the data extracted by ChatGPT-4o was false, particularly when information was not reported in the papers. A deeper analysis of these errors—beyond reporting their frequency—could provide useful guidance on how to mitigate such risks. 3. While the manuscript discusses the validity and reproducibility of ChatGPT-4o, it might benefit from a brief discussion on the ethical implications of using AI in systematic reviews, particularly regarding accountability and transparency when AI-generated data is integrated into research outputs. 4. The study is based on a specific set of 11 RCTs focusing on fall prevention in older adults. While the methodology is robust, the generalizability to other fields or types of studies remains uncertain. What are the author's suggestions to expand the study to include a broader range of topics or study types that would strengthen the conclusions? 5. To add relevant information that supports the manuscript and benefits the readers, a suggested paper for investigating how ChatGPT can find and return real references. You may cite it as follows: https://doi.org/10.1016/j.jormas.2024.101842. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Suodi zhai Reviewer #3: No Reviewer #4: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0313401.r001
Revision 1
3 Oct 2024 Author Response Dear editor and reviewers, I would like to sincerely thank you for your time and effort in reviewing our manuscript. We appreciate your valuable feedback and thoughtful suggestions, which has helped to provide more clarity and refining of our manuscript. We hope that the revised version meets your expectations, and we look forward to your further assessment. Sincerely, Mette Motzfeldt Jensen On behalf of the co-authors. Attachments Attachment Submitted filename: Response to reviewers.docx https://doi.org/10.1371/journal.pone.0313401.r002
24 Oct 2024 Decision Letter - Weiqiang (Albert) Jin, Editor ChatGPT-4o Can Serve as the Second Rater for Data Extraction in Systematic Reviews PONE-D-24-28092R1 Dear Dr. Authors of Paper PONE-D-24-28092R1, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Weiqiang (Albert) Jin, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Congratulations! The reviewers have expressed their appreciation for your work and have acknowledged its quality by recommending acceptance of your article. Well done. Before the final proofreading, please ensure that all citations in the manuscript adhere to the publication's formatting guidelines. Additionally, verify the accuracy of information for each referenced article, prioritizing published dois over preprints like arXiv. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author: I recommend citing the following two references that utilize GPT for information processing: ChatAgri: Exploring potentials of ChatGPT on cross-linguistic agricultural text classification [DOI: 10.1016/j.neucom.2023.126708] Prompt learning for metonymy resolution: Enhancing performance with internal prior knowledge of pre-trained language models [DOI: 10.1016/j.knosys.2023.110928] Reviewer #1: All comments have been addressed Reviewer #4: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #4: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #4: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #4: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #4: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #4: Thanks for addressed the comments raised in a previous round of review. So I feel that this manuscript is now acceptable for publication. ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #4: No ******** https://doi.org/10.1371/journal.pone.0313401.r003
Formally Accepted
5 Nov 2024 Acceptance Letter - Weiqiang (Albert) Jin, Editor PONE-D-24-28092R1 PLOS ONE Dear Dr. Motzfeldt Jensen, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Weiqiang (Albert) Jin Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0313401.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .