Peer Review History
| Original SubmissionAugust 27, 2020 |
|---|
|
PONE-D-20-27001 Cohort profile: St. Michael’s Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing PLOS ONE Dear Dr. Batt, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 11 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Natalia Grabar Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. 3. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Overview: The study details the construction of the SMH-TB database, which the authors claim that could be an important resource for various future research regarding the Tuberculosis condition. Medical knowledge was used to build a set of regular expressions (regex) to extract structured data from clinical narratives, and increase the patients’ variables coverage. The regex ruleset is now available on github, and the SMH-TB could be shared via proper request. The evaluation covered a set of 200 manually annotated patients’ texts, including the calculation of Precision, Recall and F1 metrics. Main comments: In general, the article is well written and easy to understand. The study fulfills its objectives and the conclusion is in line with the results achieved. On the other hand, the work does not present any innovative methodology or technique, and has no depth in the use of natural language processing techniques, since it is limited to the use of regular expressions only. The use of some extra pre-processing steps, POS-Tagging algorithms, n-grams, vector representations are examples of techniques that could improve the results, and make the method more generalizable. In addition, there are several systems available to extract clinical concepts from clinical narratives such as cTakes, MetaMap, CLAMP (refer to this study to find more: https://www.sciencedirect.com/science/article/pii/S1532046417301685). The only real novelty is associated with the fact that the authors state that this is the first corpus of its kind available to the scientific community. More evidence about the quality of the corpus and its usefulness would be important for the study. For example, to test it in some of the tasks associated with tuberculosis research, such as those mentioned in the introduction and discussion (e.g., using the corpus as the gold standard to train a Machine Learning algorithm). Since the authors used the CHARTextract tool and straightforward techniques, the methodology could be replicated to construct new datasets focusing on diverse diseases and cohorts. Despite requiring, as in this work, the time of annotators with clinical experience for the construction of rules and manual annotation, which sometimes could be the most difficult resource to obtain. It is not clear for me how the manual labelling occurred. Have you simply marked all the variables (from Table 1) encountered within the text? Have you defined different categories for each entity labeled? Used just exact-match count? A Figure or some examples could easily clarify these questions. In the “Binomial proportions estimated from extracted variables” section, extra clarifications are needed, like: a) Which variables were normalized (because some of them are continuous and not categorical)? b) Using 0/1 binary values could affect, for instance, the use of the corpus for clinical trials screening algorithms? Because a Negative result is very different from Unknown/Not Recorded. Sometimes these values could be the difference between recruiting and not recruiting a patient. So, why not use values that are more granular? Some text samples could be provided for the reader, so it could be easier to visualize the text extraction challenges, as well as excerpts of the final database. In conclusion, the article presents the construction of a resource that can be used in future research, however, it presents a shallow methodology of natural language processing for its construction and does not prove its real usefulness and generability through an extrinsic evaluation. Minor corrections: mathematical modelling studies >>> mathematical modeling studies surrounding immigration, housing status, and insurance, and clinical information >>> surrounding immigration, housing status, insurance, and clinical information co-morbidities >>> comobidities Consider enhancing Figure 3 quality Reviewer #2: Dear authors, thank you very much for getting insights into your interesting work, I enjoyed reading it. See below my comments. TYPE OF SUBMISSION: Research article. SUMMARY: The authors describe the creation of a retrospective clinical data warehouse using structured and unstructured data specifically from tuberculosis outpatients. Rules sets were applied to extract relevant attributes from unstructured narratives and evaluated regarding precision, recall and F-measure. A descriptive analysis was calculated on the final data set. ESTABLISHMENT OF THE WORK: The motivation is clear, the paper is well-structured. The related work is insufficient. Creating a database using structured and unstructured data from EHRs for specific cohorts has been done. The authors should provide some examples of related work (retrospective cohort building from structured and unstructured data with rules sets). SUITABILITY OF THE METHODS: The methods are well described and the rules sets available via GitHub. Precision, recall and F-measure have been applied to the information extraction task, measurements demanded from the clinical NLP community. APPROPRIATENESS OF THE ANALYSES AND IMPORTANCE OF THE RESULTS: The analyses are appropriate. OTHER COMMENTS ABOUT THE PAPER: Major Revision: major_001 - Related work In general such an investigation has not been done the very first time, maybe specifically to Tuberculosis data sets. Please review in detail and improve the related work in this direction. Minor Revisions: minor_001 “We developed the first digital retrospective clinical database that combines structured data, unstructured (text) data, and variables derived from transforming unstructured data to structured data using natural language rulesets, among patients assessed in an inner-city outpatient TB clinic at St Michaels Hospital (SMH) of Unity Health Toronto in Toronto, Ontario, Canada.” the first digital -> a digital minor_002 Table 3: You can get rid of True Positive*, True Negative* and Accurary. Precision, recall, F-measure is enough. It would be interesting to know how many rules you had to adjust in sum and also per attribute. Have there been some pitfalls when developing them. Shortly discuss limitations of the approach and your experiences. Have there been attributes which were very hard to fetch with a rule set. Have you thought about applying a machine learning based approach for the information extraction task? minor_003 Are the structured entries further on mapped to a terminology for standardization e.g. SNOMED CT? (Just of interest) All the best, Markus ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Cohort profile: St. Michael’s Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing PONE-D-20-27001R1 Dear Dr. Batt, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Natalia Grabar Academic Editor PLOS ONE Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: All comments and questions raised by me have been addressed in some way in the article. Although the work continues to rely on straightforward NLP approaches, the authors make this clear in the discussion section. An additional experiment using a regression algorithm has been added, making the claim that the corpus can be useful for other tasks more robust. A new figure has been added to make it clear to the reader how they've performed the manual annotation process. In addition, more examples of textual excerpts were presented throughout the article. Therefore, I consider resolved the issues raised by me. Reviewer #2: Dear authors, thank you very much for getting insights into your interesting work, I enjoyed again reading it. See below my comments. OTHER COMMENTS ABOUT THE PAPER: Thank you for your detailed response to my comments. You clearly considered them in the current version of the paper. Overall in sum with the other reviewer suggestions the quality of the manuscript clearly approved to a level ready for publishing. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No |
| Formally Accepted |
|
PONE-D-20-27001R1 Cohort profile: St. Michael’s Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing Dear Dr. Batt: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Natalia Grabar Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .