Peer Review History
| Original SubmissionSeptember 18, 2024 |
|---|
|
PONE-D-24-41279Improving Topic Modeling Performance on Social Media Through Semantic Relationships within Biomedical TerminologyPLOS ONE Dear Dr. Wei, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 12 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Zhe He, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Before we proceed with your manuscript, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible. We will update your Data Availability statement on your behalf to reflect the information you provide. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. Additional Editor Comments: The reviewers raised a number of issues including comparing with traditional topic modeling approaches, the rationale of structural topic modeling, and various technical details. Please address these issues in the revision. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Summary: This paper presents a novel approach to improving topic modeling performance on social media data in the healthcare domain. The authors introduce a semantic-type-based topic modeling pipeline that integrates UMLS concept recognition and SNOMED CT relationship-based concept decomposition into traditional topic modeling. They demonstrate the effectiveness of this approach using discussions about statin use from Reddit as a case study. The main contributions include: 1. Enhancing topic modeling by leveraging semantic relationships within biomedical terminology. 2. Improving the extraction of health-related themes from noisy social media data. 3. Validating findings using real-world clinical data from electronic health records. The innovation lies in the combination of UMLS concept recognition, SNOMED CT relationship-based concept decomposition, and traditional topic modeling to create a more robust method for analyzing health-related social media content. Specifics While the introduction does mention some related work (e.g., lines 36-52 discuss various applications of topic modeling in healthcare), the authors don't explicitly compare their approach to these existing methods in terms of performance or outcomes. Some potential comparisons could include: a) Traditional topic modeling approaches (e.g., Latent Dirichlet Allocation) on the same dataset without semantic enrichment. b) Other semantic-enhanced topic modeling methods that have been applied to healthcare social media data. c) Alternative approaches to extracting health-related information from social media, such as supervised machine learning methods or rule-based systems. Introduction: - Line 36: "Topic modeling is an unsupervised machine learning technique in natural language processing (NLP) used" - Consider rephrasing to improve flow, e.g., "Topic modeling, an unsupervised machine learning technique in natural language processing (NLP), is used" Methods: - Line 84: "We conducted web scraping of a repository containing dump files to collect the submissions and comments" - Consider providing more details on the ethical considerations of data collection from social media Results: - Figure 2: Consider adding a brief explanation of how to interpret the topic proportions in the figure caption Discussion: - Line 315: "Along with these widely recognized topics associated with statin use, we also noted novel findings relative to topic 2:" - Consider elaborating on the potential implications of these novel findings for healthcare providers and researchers Limitations and future work: - Consider expanding on potential solutions or approaches to address the limitations mentioned - Some sentences are quite long and complex. Consider breaking them down for improved readability - Ensure consistent capitalization of terms throughout the paper (e.g., "Topic Modeling" vs. "topic modeling") - Consider adding a brief section on the ethical considerations of using social media data for healthcare research Reviewer #2: The paper presents a method for improving the extraction of meaningful health-related themes from social media, which could lead to better patient insights and enhanced clinical decision-making. Comments: 1) The rationale behind using STM is not clear, as STM is used to identify the temporal variation and difference in source data. As far as temporal evidence how did the topics change over time period is not specified. 2) The Figures provided are not clear, a pipeline/ algorithm describing the entire flow would add value to the manuscript. 3) How do you address the 329 documents which did not have a matching UMLS concepts from the restricted list. 4) How is the value of K between 2-6 selected, what is the maximum number of topics generated by the model. 5) Line 124 the authors explain the decomposition of concepts to parent concepts, what is the criteria for identifying the concepts to be decomposed, please explain the logic behind this step. 6) In the EHR validation study association between Satin exposure and Mental Health was performed, even though the Subreddit was cholesterol. Why the association study of topic 2 was selected, please validate the reasoning for deciding on the EHR validation study. 7) More examples of topics generated and for each topic did the results had metadata text attached to it, please clarify with execution output. Reviewer #3: This study effectively demonstrates the integration of UMLS concept recognition and concept decomposition, based on SNOMED CT relationships, into traditional topic modeling frameworks to enhance the identification of meaningful biomedical topics. The manuscript is well-structured and clearly written, making the methodology and findings accessible. However, I have the following suggestions for the authors’ consideration that could further improve the overall clarity and impact of the manuscript. 1. In the Background section, the authors discuss the limitations of traditional topic modeling. However, in the Methods section, they focus on Structural Topic Modeling (STM). It would enhance clarity to explicitly define the gap between STM and existing methods, reinforcing why STM is a better choice for this analysis. 2. As there are several topic modeling approaches available, including STM and LDA, the authors should provide a clear rationale for selecting Structural Topic Modeling (STM) over the more widely used Latent Dirichlet Allocation (LDA). This would make the choice of approach more convincing, particularly in terms of how STM enhances the research in comparison to other methodologies. 3. Since this study uses unsupervised approaches, it is important to explain the selection of k (the number of topics) and the initialization process for clustering. The selection of k can greatly influence the results, and more detail here would enhance the paper’s rigor. 4. In the UMLS Concept Recognition section, the authors mention four semantic types. It would strengthen the paper to explain why these specific types were chosen and how they align with the context of the analysis. 5. In the STM Model Setup section, providing more information on how the number of topics (k) was chosen, and elaborating on the qualitative analysis performed to assess the top words in each topic would improve transparency. 6. In the Concept Decomposition Based on SNOMED CT Relationships section, it would be beneficial to include an example to demonstrate the concept decomposition process. This will help clarify how granularity issues with SNOMED CT were handled. 7. The process of identifying pre-coordinated expressions is unclear. Explaining whether this was done manually or automatically, and outlining the identification method would enhance understanding. The author cited a JAMIA paper “The use of SNOMED CT, 2013-2020: a literature review.” however, in this paper it said “no evaluation was made as to whether the post-coordinated expression improved the performance of normalization tasks compared to only pre-coordination coding schemes.” it did not indicate the process of pre-coordinate concept identification. Could the authors please explain how you identify pre-coordinated expression? 9. The decomposition methodology needs further illustration, especially regarding how it handles pre-coordinated concepts. Without more information, it is difficult to understand how this process works in practice. 10. If the method is not intended to be generalized, this should be explicitly addressed in the Discussion section, especially given the complexity of handling a large list of concepts. 11. Clarifying terms like "word-topic distribution" early in the paper would make the content more accessible to a broader audience. 12. The comparison of concept decomposition in the STM model is presented without sufficient explanation. If concept decomposition is a key step in improving the topic modeling, this needs to be made more explicit in the manuscript. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #2: No Reviewer #3: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Improving Topic Modeling Performance on Social Media Through Semantic Relationships within Biomedical Terminology PONE-D-24-41279R1 Dear Dr. Wei, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Zhe He, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): The authors have adequately addressed the reviewers' comments. Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: N/A ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) Reviewer #3: Thanks to the authors for thoroughly addressing my previous comments. I have reviewed the revised manuscript and the authors' detailed responses to the review feedback. I believe the authors have adequately addressed my concerns. I appreciate their thoughtful consideration and the effort they have put into the revisions. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #3: No ********** |
| Formally Accepted |
|
PONE-D-24-41279R1 PLOS ONE Dear Dr. Wei, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Zhe He Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .