Peer Review History

Original SubmissionNovember 6, 2021
Decision Letter - Tom J. Pollard, Editor, Imon Banerjee, Editor

PDIG-D-21-00099

Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

PLOS Digital Health

Dear Dr. Okumura,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 23 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Tom J. Pollard, Ph.D.

Academic Editor

PLOS Digital Health

Journal Requirements:

1. Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published.

State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

2. Please update your Competing Interests statement. If you have no competing interests to declare, please state: “The authors have declared that no competing interests exist.”

3. In the online submission form, you indicated that “Our corpus, created in this article, is available upon request. The NHO data is not publicly available for privacy reason.”. All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information.

This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons by return email and your exemption request will be escalated to the editor for approval. Your exemption request will be handled independently and will not hold up the peer review process, but will need to be resolved should your manuscript be accepted for publication. One of the Editorial team will then be in touch if there are any issues.

4. Please provide separate figure files in .tif or .eps format only and remove any ensure that all files are under our size limit of 10MB.

For more information about how to convert your figure files please see our guidelines: https://journals.plos.org/digitalhealth/s/figures

5. Please ensure that you provide a single, cohesive .tex source file for your LaTeX revision. You may upload this file as the item type 'LaTeX Source File.' As stated in the PLOS template, your references should be included in your .tex file (not submitted separately as .bib or .bbl). Please also ensure that you are making any formatting changes to both your .tex file and the PDF of your manuscript. If you have any questions, please contact Latex@plos.org. You can find our LaTeX guidelines here: https://journals.plos.org/digitalhealth/s/latex

Additional Editor Comments (if provided):

The study is interesting and explores an important topic. Our reviewers - in particular Reviewer 1 - have raised some points that I think should be addressed prior publication. I would be grateful if you could submit a new version of the paper after responding to these points.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I don't know

--------------------

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

--------------------

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

--------------------

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: An insightful presentation of the state of the art for clinical health records and discharge summaries in Japanese hospitals as well as the application of natural language processing (NLP) on said clinical text data. NLP in Japanese, let alone for clinical discharge summaries, presents a unique challenge and we applaud the authors for gaining access to the largest clinical text database and the efforts for conducting this study.

However, there were a few key elements that the paper could be revised in order to showcase and highlight the novelty of the study. Suggestions are not limited to but include:

● The paper did not describe the unique difficulties in conducting NLP in Japanese, such as the lack of spaces between the characters and words. This is further exacerbated with the lack of open source information to combat this, and including such limitations may further emphasize the quality of this study.

● The authors do not declare any other NLP related models or research available in Japan, and the study does not list many related studies such as the following which would have been helpful in creating comparisons:

→ Preliminary development of a deep learning-based automated primary headache diagnosis model using Japanese natural language processing of medical questionnaire: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7827501/

→ Predicting Inpatient Falls Using Natural Language Processing of Nursing Records Obtained From Japanese Electronic Medical Records: Case-Control Study: https://medinform.jmir.org/2020/4/e16970/

→ A clinical specific BERT developed using a huge Japanese clinical text corpus: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0259763

● Relatedly to the aforementioned point, due to the lack of related studies, the authors do not mention how different or novel their research stands in comparison to other Japanese or oversea models.

● Furthermore, the authors mention that the results might be applicable only to Japanese clinical settings, yet since no other examples are given, this cannot be generalized and further examples as well as background information and explanation is required to make this claim.

● The authors mention that their model requires entire documents for training, and that the number of documents in their corpus was too small to be used for the training of the model; they however do not explain why this is the case. Perhaps by including examples from Japan, if not from abroad, would help readers understand their model’s novelty.

● The authors mention that in this experiment, they used three contextual units instead of n-gram units to evaluate their impact on the summarization performance and determine which unit performs the best. While this is interesting, the authors once again do not describe why they did this and why this is meaningful.

● The authors discuss that this study had used the largest multi-institutional health records archive in Japan and thus it would be worthwhile to validate the results in multiple languages, but if there are actually no other databases, perhaps the authors must first be validating in other Japanese clinical text databases, or other Asian languages that have similar grammatical structures, prior to applying firsthand to other languages that may have more open source resources available.

● The authors raised an interesting point that Japanese hospitals do not use dictation to produce discharge summaries, which could result in frequent copying and pasting from sources to summaries. However, it would be stronger for the authors to provide examples for how this enhances (or does not enhance) the model, as well as further expand on the clinical significance or rationale to dictate or not dictate, and whether there are differences in clinical outcomes or clinical workflow to explore the strengths and weaknesses of the authors’ model.

● Finally (and related to the point above), the authors do not clearly discuss why this study is useful, especially its application to real world clinical settings.

While the English and Japanese translations available are also interesting, some of the translation seem non-native or slightly incorrect, thus perhaps could use proof-reading and reconsideration (e.g. 髄液糖定量 - glucose determination → cerebrospinal fluid glucose level or glycorrhachia).

Given all these considerations, priority for PLOS Digital Health will be based on significant revisions, only after being able to convince the editor that their model has significance in real-world clinical settings in comparison to pre-existing models.

Reviewer #2: The study aims to identify the optimal granularity between 3 proposed granularities for performing extractive summary of Japanese clinical texts.

The paper is clear and well-written.

Comments:

1) Taking into consideration that the exploration of optimal granularity for extractive summary had beed previously explored, the innovation of the research, is the choice of the clinical domain. However it is not clear if the results were different from the results reached in other non-clinical domains.

2) The fact that “20-31% of the sentences in discharge summaries were created by copying and pasting", made the authors conclude that a certain amount of content can be automatically generated by extractive summarization.

However since most of the summary needs to be generated by other methods (e.g. abstractive summary), the fact that extractive summarization cannot be used to generate a full summary, raises questions about its use at all in the summarization process of clinical summaries. Would be whether other methods would make any use of the 20% that can be generated by extractive summary?

3) The background should be further expanded, for example, there are no mention about abstractive methods that create summaries without halluciantions, the authors just mention abstractive summaries "often produces fake contents that do not match the reference summary". A more complete review of previous studies should also include abstractive methods, such as implemented in the CliniText system and BT-45 for example, that did not produce any hallucinations.

Minor comments:

1) Typos: "In this study,105we adopt both of the two methods. in106Japanese"

2)

"However, it remains unclear how the summaries should be generated from the unstructured source"

It’s unclear what the authors meant by unstructured source ? Images ? Free-text from other clinical experts ? Raw data ?

--------------------

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

Attachments
Attachment
Submitted filename: Rebuttal letter-20220627.pdf
Decision Letter - Tom J. Pollard, Editor, Imon Banerjee, Editor

Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan

PDIG-D-21-00099R1

Dear Dr. Okumura,

We are pleased to inform you that your manuscript 'Exploring Optimal Granularity for Extractive Summarization of Unstructured Health Records: Analysis of the Largest Multi-Institutional Archive of Health Records in Japan' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. 

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Imon Banerjee

Section Editor

PLOS Digital Health

***********************************************************

Many thanks for your detailed response to the reviewer comments. I am satisfied that the concerns have been addressed and would be happy to move ahead with publication.

Reviewer Comments (if any, and for reference):

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .