Peer Review History
| Original SubmissionJanuary 19, 2021 |
|---|
|
PONE-D-21-01941 Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs. PLOS ONE Dear Dr. Noble, Thank you for submitting your manuscript to PLOS ONE, and for your continued patience. I am aware that this first round of review took longer than usual, but I trust that you and your co-authors understand the current difficulties in finding researchers with availability to contribute their time to the peer-review process. I am happy to report that we found two experience reviewers, who after careful consideration, suggested moderate changes, as pasted below. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jun 11 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Fernanda C. Dórea Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere. [Figure 6c is very similar to a figure in a paper submitted for publication (Radford et al 2021). In the current paper, this panel shows the trajectory of a disease outbreak which is described in the Radford et al paper as a comparison to the signal detected by topic modelling which is under discussion in the current paper. The Radford paper is primarily reporting the disease outbreak. In the current paper, we are illustrating how a novel method of automated record annotation could have highlighted the outbreak report in the Radford et al paper and feel that its inclusion clearly illustrates the utility of topic modelling whilst not in any way recapitulating the findings of the Radford et al study.] Please clarify whether this [conference proceeding or publication] was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript. 3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. 4. Thank you for stating the following in the Acknowledgments Section of your manuscript: [The data for this project was supported by generous fundingfrom BBSRC,DogsTrustas part of SAVSNET-Agile,andpreviouslyBSAVA. We wish to thank data providers in veterinary practice (VetSolutions, Teleos, CVS andindependentpractitioners)without whose support and participation this research would not be possible. Finally, we are especially grateful for the help and support provided by SAVSNET team members Beth Brant, Susan Bolanand Steven Smyth.] We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: [ADR, PJN, GN No funder reference number, DogsTrust, https://www.dogstrust.org.uk ADR, PJN, GN BB/N019547/1, Biotechnology and Biological Sciences Research Council, https://bbsrc.ukri.org ADR, PJN No funder reference number, British Small Animal Veterinary Association, https://www.bsava.com The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.] Additionally, because some of your funding information pertains to [commercial funding//patents], we ask you to provide an updated Competing Interests statement, declaring all sources of commercial funding. In your Competing Interests statement, please confirm that your commercial funding does not alter your adherence to PLOS ONE Editorial policies and criteria by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests. If this statement is not true and your adherence to PLOS policies on sharing data and materials is altered, please explain how. Please include the updated Competing Interests Statement and Funding Statement in your cover letter. We will change the online submission form on your behalf. Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests 5. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors used topic model to mine text data to mine veterinary clinical narratives. They find that one of the topic mixture probabilities (topic 17) over time are indicative of gastroenteric presenting complaint (MPC). Using Bayesian binomial linear model, they confirm that the finding is statistically significant. The study show that it is possible to use topic model to detect outbreak of novel disease. Although there is no methodological innovations in topic modeling and the analyses are standard topic model analysis, it is a novel application and an interesting angle that the authors took in looking into the outbreak using topic models. Major comments: 1. The authors were modeling temporal changes of the text data using LDA from the standard Gensim Library. Authors should use dynamic topic model instead to capture the changes of topics over time. 2. Evaluating only on the topic coherence in choosing the topic number is not sufficient because it only evaluates each topic separately not across topics. You can have coherent topics but the topics can all look the same. A proper metric is called topic quality that combines topic coherent and topic diversity (Mimno et al 2011). 3. Authors mention computational overhead caused by choosing more topics. This is possibly because they were using the full-batch LDA training on 3.5 million records in their data. LDA can be inferred by stochastic variational inference using minibatch of the records (also known as online LDA). This should be implemented in the Gensim Library. 4. I am confused on whether the model was trained on real data and then applied to the simulated data or trained and applied directly on the simulated data. It seems that the former is true. I would argument that the latter is more realistic because in real world application, you can’t train on text in the absence of the outbreak at least not in the same time interval. 5. Figure 2 and 3a are in very poor quality. A more compact display is needed. Otherwise, Figure 2 and 3a should be in supplementary. Minor comments: 6. The flow and the format of the paper makes it sometimes hard to follow. In particular, in Results section, figure legends were inserted in the middle of the text with large margin around them (e.g., page 14). 7. The authors call the veterinary clinical narratives of dogs as electronic health records (EHR). I wonder to what extent this is the case. There is no standardized data such lab tests LOINC or RxNorm prescription. Reference: D. Mimno, H. M.Wallach, E. Talley, M. Leenders, and A. McCallum. 2011. Optimizing semantic coherence in topic models. In Conference on Empirical Methods in Natural Language Processing. Reviewer #2: The objective of the study was well motivated as to the use of topic modelling to classify EHR into categories that would allow for surveillance. What was not well explained is the criteria and definition of disease outbreak. The point was made that the utility of labelling would be to facilitate detection of outbreaks. When in the development of the outbreak should the system be making the decision that an outbreak is present? If it waits until the outbreak is at its worst then obviously this would mean that there would be little utility for prevention or control. On lines 186-8, an outbreak is equated to an increase in gastroenteric MPC but that is a very vague definition and does not take into account many different sources of variation (dog show in town, new clinic hours, holidays, etc.). These sources of variation are what makes syndromic surveillance so difficult because of the problem of false positives. It was also mentioned that there is a great deal of uncertainty in topic modelling with regards to negatives - this is indeed a problem and I did not clearly see any explanations for how this was mitigated in this study. The other large issue with natural language data sets is misspellings, short forms, etc. From my own experience with human hospital and veterinary clinic records points to a huge variation of spellings unless the doctor/vet is constrained or there is post-processing to correct spellings or expand abbreviations (which is only possible for a known set of abbreviation variations). I did not see where it was explained about the processing of the data from the EHR and since I am not familiar with the UK systems for this type of data collection I am confused as to whether this is actually an issue. I was also unclear as to what the effect was if the EHR was describing more than one condition of interest. It is also unclear if the reason that this topic modelling works well is because GI has mainly 2 symptoms (vomiting and diarrhoea). It was briefly discussed with respect to the overlap between cardiac and respiratory symptoms and diagnoses but it does leave the reader waiting for a subsequent study that looks at this question. I would caution against justifying the choice of topics (30) because of computational cost without putting this into perspective. In other words, what type of computational system was used and how long did it take to produce results. I did find the use of unsupervised methods appropriate and useful for this task and think that this study is an excellent first step in a more comprehensive use of this technique. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Deborah Stacey [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs. PONE-D-21-01941R1 Dear Dr. Noble, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Fernanda Dórea Academic Editor PLOS ONE Additional Editor Comments: I believe that this was enough to be able to track what has changed and the additions made, although as pointed out by one of the reviewers, in the future you may want to specify clearly the lines/sections where additions were made, and preferentially quote them in the response letter. Based on the acceptance recommendation from reviewer 2, plus my own judgement of how satisfactorily you answered the comments from reviewer 1, I have recommended acceptance of the paper. Although some comments remain, I deemed that another round of review would not likely cause further improvements in the current methods. I do urge you to read thoroughly through the reviewer 1 comments, whose pointers on methodological aspects of the papers can be an important consideration for future work. But I do agree with authors that for the purpose of early disease detection, the methods presented and the conclusions are sound, and therefore I recommended acceptance. Reviewers' comments: 1. We are all busy. It is difficult to find the revised content when the author simply asked that “it’s revised somewhere in the manuscript and go find it yourself”. I will appreciate if the authors write the revised part directly in their response to my comments AND indicate the page number where those revised parts appear in revised main text. Only this way can I properly and efficiently assess whether they have satisfactorily addressed my comments. This pertains to all my original comments. Please also do this in the next round of revision. 2. In answering my comment 2 about evaluating topic quality as the product of topic diversity and topic coherence rather than just topic coherence, authors did not give satisfactory answer. Please perform topic quality analysis. 3. The authors didn’t understand my comment 3. They said that they chose small subset of data to experiment with the number of topics, which IS ultimately due to the inefficient computational algorithm of the LDA they used because otherwise they don’t need to subset the data to experiment topic numbers. Therefore, I suggested using the online LDA to run on all their data not just subset. 4. For my comment 6, typically you put figures and figure legends on top or bottom of the page not in the middle of the page. 5. The revised pdf is merged with the PDF of the original submission. I only find figures from the original submission not the revised figures. |
| Formally Accepted |
|
PONE-D-21-01941R1 Using topic modelling for unsupervised annotation of electronic health records to identify an outbreak of disease in UK dogs. Dear Dr. Noble: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Fernanda C. Dórea Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .