Peer Review History
Original SubmissionFebruary 14, 2023 |
---|
PONE-D-23-04283SCREENER: Streamlined Collaborative Learning of NER and RE model for discovering Gene-Disease RelationsPLOS ONE Dear Dr. Kim, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by May 28 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Nguyen Quoc Khanh Le Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 3. We note that Figure 1 in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission: a. You may seek permission from the original copyright holder of Figure 1 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors describe the developed complex tool for discovering Gene-Disease Relations from text, combining tasks of the named entity recognition, relation extraction and entity linking. The work is not bad, but the scientific value and novelty do not look significant. Approaches used for NER and Entity linking are well known. Joint solutions for NER and RE, are already exists, but authors didn't mention much of them. There is a lack of analysis for the SCREENER corpus. It is worthwhile to analyze the lexical diversity of it, how does it differ or similar to AGAC in terms of the disease and genes mentioned in it.The implemented service may be seen only as an addition to the research article, and not as a result of any study. The next drawbacks must be corrected. - The Results and Discussion sections must be centered around the results of the authors' experiments. They shouldn't contain continuations of the literature review. -A markup quality of the developed corpus should be evaluated. How many annotators participated, and what is annotators agreement score? - Description for figure 2 is required, what are D, G, T. Disease, Gene and trigger? - Does the NER module extract only the Gene and Disease types, or all 12 types? Same for relations, are they extracted for 2 entity types or for all entities? If for all, then Indirect Edges connect everyone with everyone? - How exactly are entities extracted, was BIO annotation scheme used or span based method? - "We use a classifier from ELECTRA [19] model for fine-tuning biomedical literature to identify name entities (NER)". Electra is a language model that encodes text, it does not include a classifier. And what is meant by "fine-tuning biomedical literature"? How could a literature be fine-tuned ? - The RE method looks like the most original idea in this paper, but a supporting figure with scheme or a more detailed description should be provided. First, what method is used to get the entity vector if it includes multiple tokens? Secondly, it is written that the Max-pooled vector returns "the most likely relation pair from candidates". How is that? What exactly is fed into the max pooling operation, embedding vectors of all extracted candidates? Thirdly, it is written that four vectors are concatenated, including the distance embedding vector between a pair of entities. What pair of entities? Are such concatenations formed for all possible pairs among candidates? How this concatenated vector is used? Is it classified by a fully connected layer? - The authors do not specify how RE loss is calculated and how it is combined with the final loss. - Was the SCREENER corpus used only in the training, and the test part was from AGAC? Or test subset is selected from SCREENER corpus? -Were Direct Edges added to the test set? If yes, then it turns out that when comparing the models with DE and without DE, there were a different number of relations in the test sets, is it? Or were only the initial connections considered during evaluation? -If an open corpus AGAC was used, are there in a literature any results of other researchers on it? RENET2 and SCREENER tested on onlyRE task for comparison. Again, did they have the same number of relations to predict, or were DEs added in the test for SCREENER? What task did BioBERT-GAD tested on? onlyRE or NER+RE? - If a cross-validation was performed with 5 folds, why no accuracy deviation estimates are given. Are the given accuracies of 5-folds average? Authors should double-check or explain the different accuracy scores in the article. The abstract gives f1 0.7789, this result is not in the article. The authors write that SCREENER_onlyRE outperforms BioBERT-GAD in recall by 32.7%, how was this number obtained if recall scores are 0.963 and 0.444 respectively? The same with f1 scores. In the Discussion section said that RENET2 detects correlations in a sentence. What kind of correlations is in view? -The conclusion is very weak, since this is a research article, the authors should emphasize what suggestions they make in their research and what is the scientific novelty of their method.Also authors declare: "collaborative learning of NER and RE tasks is the model's novelty", but joint models for NER+RE tasks are already exists: for example:Span-based Joint Entity and Relation Extraction with Transformer Pre-training - Markus Eberts & Adrian Ulges, End-to-end Neural Coreference Resolution - Kenton Lee et al., Selivanov A. et al. Relation Extraction from Texts Containing Pharmacologically Significant -Information on base of Multilingual Language Models, Sboev A. et al. Accuracy Analysis of the End-to- End Extraction of Related Named Entities from Russian Drug Review Texts by Modern Approaches Validated on English Biomedical Corpora //Mathematics. – 2023. – Т. 11. – №. 2. – С. 354.). The novelty may lie either in the distinctive features of the proposed method, which distinguish it from analogues, but then they need to be highlighted, or in its application to a new problem. The manuscript is not ready to print and must be improved. Reviewer #2: In this paper, the authors present a new transformers-based machine learning model to perform NER for gene and disease entities and RE for gene-disease relations from the biomedical literature. Relation extraction is performed on a document level and a new joint learning objective for NER and RE is proposed. Evaluation is conducted on a newly annotated dataset and compared to two solid baselines. The results seem promising but the manuscript misses several key details to perform a realistic assessment. Major issues I have two major issues with the paper: Once with the description of the relation extraction module in the methods section and once with the evaluation setting. - How does the Relation Extraction Module work exactly? - BIOBERT also uses an attention module with the [CLS] token for classifying the relation. How does the proposed attention module differ from this? Here, concrete explanations of how the attention module is implemented are missing. - You mention that you are also using direct edge connections between entities in addition to indirect ones? How are the direct edges incorporated in the relation extraction module exactly? How does it differ from the indirect ones? - How is the NER loss integrated into the RE loss for end-to-end training? This is not clear from the text. - Building upon above, how does the relation extraction module help facilitate the cross sentence extraction aspect? - Some ablation studies would be helpful for this. - E.g., apply your method only on sentences vs whole documents? Then, compare the results. - The evaluation settings need more details. It is not really clear on which datasets evaluation is conducted on and whether both the competitors and the proposed methods are trained/fine-tuned on it. - E.g., in the result tables 2 and 3 it is not clear which dataset is evaluated, the AGAC or the new SCREENER dataset? - If evaluation is on the SCREENER dataset, are both competitors also trained on it or on their respective datasets (BioBERT on GAD and RENET2 on the RENET2 dataset)? - Is it actually in-corpus evaluation (so all datasets are trained and evaluated on SCREENER) and not cross-corpus for BioBERT and RENET2 (they are evaluated on a different dataset than they are trained on)? - If all methods are trained and evaluated in a comparable way (all in-corpus), then the results table need some more discussion. - There is a large improvement in F1-score reported for the SCREENER model compared to the baselines, where does it come from? You've reported effects from adding direct edges, maybe add some ablation studies for other effects and discuss their influence. Minor issues - Some more datasets for evaluation would be nice to have as you already mentioned in the discussion. - Why do you not also evaluate on the same dataset as RENET2 for comparison? (Or the GAD dataset for BioBERT?) This would allow for more direct comparison of the results. - GDA is also a widely used benchmark dataset. It would be nice to also evaluate on that. - Paper: RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature (Wu et al., 2019) - Have a look at some current state-of-the-art models for the GDA dataset given in: Document-level Relation Extraction as Semantic Segmentation, (Zhang et al., 2021) and SAIS: Supervising and Augmenting Intermediate Steps for Document-Level Relation Extraction, (Xiao et al., 2021). It would be nice if the results could also be compared to some of those models. - Regarding your newly annotated dataset, some more information regarding the annotation quality would be helpful. - How many annotators worked on the dataset? What was the inter-annotator agreement? How were conflicts during annotation resolved? - What were annotation guidelines of the dataset? - In section material and methods, maybe rename SCREENER web service to SCREENER entity linking and visualization web service. This makes it more clear belonging to the methods section. - I do not understand some percentage point calculations in the results, see lines 181 and 201. - Are they measured in percentage points (pp) or percent? Please check this. - An additional ablation study for the NER module would be nice to have. - How well does this perform in isolation, e.g., in comparison to BioBERT? ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
Revision 1 |
PONE-D-23-04283R1SCREENER: Streamlined Collaborative Learning of NER and RE model for discovering Gene-Disease RelationsPLOS ONE Dear Dr. Kim, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Sep 29 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Nguyen Quoc Khanh Le Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors responded to most of the comments and corrected the article accordingly. Still minor revision is needed. - There is probably a mistake in the phrase "span-based tokens" in the description of Fig 2 . Usually span is the sequence of tokens, how tokens can be span-based. - "annotation protocol are available at " ext-link-type="uri" xlink:type="simple">https://doi.org/10.5281/zenodo.7445644". Files have closed access, the annotation protocol is not available. - One of the reasons for the improvemen of f1 score is stated as follows "the SCREENER utilizes trigger word enabling a richer understanding of the context". But it's unclear what does authors mean by that. If SCREENER, RENET2 and BioBERT-GAD were trained and tested on the same edges, they all used edges that connect trigger words with a disease or a gene and that way utilized trigger words. Also other methods build token embeddings considering context around it too. - "Detailed information on the annotation process is described in Supplemental Text 3". The archive with supplemental materials is not attached to the revision 1. Reviewer #3: In this paper, the authors present a novel model which learns and extracts document-level relations between genes and diseases. By adding direct and indirect edges between genes and diseases, they improve the performance of the model, and achieve an F1 score of 0.875, which is superior to the other existing approaches. This makes their approach novel and state-of-the-art. The complementary SCREENER web platform adds to the significance of their approach and the paper overall. Given that this is a R1 version of the paper, many problematic parts have already been addressed. This leaves the paper at a good state, where no major issues or concerns remain. The only concern I have is the novelty and importance of this new approach. It is obviously an incremental improvement, but the authors should provide more arguments as to why this is worth publishing in the journal, and not a conference or workshop, for instance. Other than that, the paper is overall very good, understanable and easy to follow. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
Revision 2 |
SCREENER: Streamlined Collaborative Learning of NER and RE model for discovering Gene-Disease Relations PONE-D-23-04283R2 Dear Dr. Kim, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Nguyen Quoc Khanh Le Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
Formally Accepted |
PONE-D-23-04283R2 SCREENER: Streamlined Collaborative Learning of NER and RE model for discovering Gene-Disease Relations Dear Dr. Kim: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Nguyen Quoc Khanh Le Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .