Peer Review History
| Original SubmissionMarch 17, 2023 |
|---|
|
PONE-D-23-07957Entity Linking for real-time geolocation of natural disasters from social network postsPLOS ONE Dear Dr. Caillaut, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by May 26 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Baby Gobin Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. In your Methods section, please include additional information about your dataset and ensure that you have included a statement specifying whether the collection and analysis method complied with the terms and conditions for the source of the data. 3. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 4. Thank you for stating the following in the Acknowledgments Section of your manuscript: "The work presented in this article was carried out with funding from the Agence Nationale de la Recherche (ANR) within the framework of the CARNOT institutes, as well as within the the R´eSoCIO project co-funded by ANR under the grant ANR-20-CE39-001. Opinions expressed in this paper solely reflect the authors’ view; the ANR is not responsible for any use that may be made of information it contains. The authors would like to thank F. Boulahya, Y. Retout, C. Mato, A. Montarnal, B.Farah and F. Smai for their help in annotating data, as well as the MAIF Foundation for its support to the design and development of the SURICATE-Nat platform. We also thank BCSF for giving us access to its French macroseismic data distribution webservice, and L. Bernede from M´et´eo France for providing us with the rainfall data recorded during Alex storm." We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: "The work presented in this article was carried out with funding from the French Research Agency (ANR - https://anr.fr/en/) within the framework of the CARNOT institutes (Gaëtan Caillault), as well as within the the RéSoCIO project co-funded by ANR under the grant ANR-20-CE39-001 (Samuel Auclair & Cécile Gracianne). Opinions expressed in this paper solely reflect the authors' view; the ANR is not responsible for any use that may be made of information it contains. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. "Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 6. We note that Figure 4 and 6 in your submission contain [map/satellite] images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright. We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission: a. You may seek permission from the original copyright holder of Figure 4 nd 6 to publish the content specifically under the CC BY 4.0 license. We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text: “I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.” Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].” b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only. The following resources for replacing copyrighted map figures may be helpful: USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/ The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/ Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/ Landsat: http://landsat.visibleearth.nasa.gov/ USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/# Natural Earth (public domain): http://www.naturalearthdata.com/ 6. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 5 in your text; if accepted, production will need this reference to link the reader to the Table. Additional Editor Comments: You are required to update your manuscript based on the comments of the reviewers . [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper discusses the challenges of using social networks to develop situational awareness during natural disasters and proposes a method for detecting and geolocating French messages on Twitter to build maps in real-time. The authors demonstrate that their system performs as well as state-of-the-art systems and can contribute to automatic social network analysis for crisis managers. In my opinion, this work turns out to be technically valid even if it shows limitations as specified below: - Novelty aspects: explain the novelty aspects of the paper better, also through concrete examples. The paper seems to be not very innovative, i.e. the application of known techniques adapted to the French context. - Related work (1): For each manuscript cited the main differences with the proposed approach must be highlighted. A comparative table could help the readers to understand the differences among the different works present in the literature and the strengths of this work. - Related work (2): The paper "Using Social Media for Sub-Event Detection during Disasters" it turns out to be a related work very close to the one proposed. In particular, it proposes a technique for identifying the sub-events that occur after a disaster. How is your work different? - Dataset and code: To allow the reproducibility of the experiments it is necessary to publish the datasets on a public repository and share the link in the paper. Also, the method code should be made public to allow experiments to be reproduced and the results obtained to be validated. - Experiments: Since the authors of this paper have a dataset with labels, why didn't you carry out quantitative analyses on the accuracy of your method (F1-score) in detecting events? Reviewer #2: In this paper, the task of detecting and geolocating information in French language is addressed. The contributions are a data set and an entity-linking pipeline. Due to the following reasons, I would like to recommend that the paper needs significant revisions before publishing: - the proposed models need more detailed descriptions - the narrative of the paper needs to be enhanced to ensure that the reader can clearly see "the plot" - the task is not well described and changes from EL (place mentions in texts) to geolocation of natural disasters - the latter is definitely not in scope of the involved methods as single tweets are processed (geo-located) - the training data itself is partially generated from ML models, of which the actual performance is unclear - this needs to be further investigated - more quantitative experiments related to the proposed models are missing but definitely required: how well is the NER-step working? how do errors propagate? what is the effect of the cross-encoder? - an interpretation of the scores in table 6 is missing: what are reasons for the poor F1-scores? - while also highlighted in the manuscript title, the real-time aspect is not addressed at all - Why are entities involved, which will actually have not coordinate (e.g. person)? More detailed comments: ### Abstract "show that despite these additional constraints" --> not sure to which constraints is referred here? ### Introduction 32/33 "one of the main issues of these automatic analyses is the ability to correctly place the information extracted in a map" - might be, but not the first one - how about overload reduction? While reading the first two chapters, a question comes up: what is the actual motivation of entity linking here? Why do the authors expect a benefit for the task (hypothesis)? It may be clear or the reader might have an idea, but reading the author's perspective would give more insights here. 61: "we propose a pipeline to automatically geolocate natural disasters from tweets" --> this sentence suggests that events will be detected, but the task is related to single tweets, when I get it right. Hence, geolocating "natural disasters" might not be the appropriate term here. Would actually be good to also read from the obtained results / performance abilities of the proposed method in the introduction ### Task description and related works I like that we can find a description of the task first. It may be good to provide a list of examples to explicitly (1) show the different types of toponyms, the authors are actually addressing, and (2) the target types of place description (points, polygons,...?). Later (2.2), we can find The task "mentioned location prediction" is also known as geoparsing. "but we will focus more on methods addressing a similar crisis management context" I can understand this approach, but sometimes it is worth doing a review independently from a domain - it may turn out that there are approaches around that are not well known in disaster management but worth to test? Just a thought.. In line with reference 26, this one might also be of interest: https://ieeexplore.ieee.org/abstract/document/9711571 In addition to the mentioned approaches in the related work, it would be good to also read about the open issues / drawbacks they have - at least those that are addressed in this paper. While reading 2.2, I am wondering why the description of the EL task is not part of the introduction of the chapter (where the task is described)? This would give a comprehensive overview of all involved sub-tasks that are addressed in this work followed by related work. In turn, chapter 2.2 focuses a lot on the task and only a few related works are mentioned. I would recommend to re-structure this chapter. 134: The meaning of BIO could be explained. 156: dated --> outdated? 180: "Detecting and geolocating natural disasters" --> this suggests that events are detected. However, this is not in line with the task description, which works on a document-level 3.1: Here, the first part of the model is described. For better understandability, I would favor to get a first overarching overview of the proposed method followed by the detailed descriptions. Figure 1: In its current form, the figure does not contain all relevant information. For instance, what do the green vectors/matrices contain/represent? How is the final embedding actually is computed? 211: "At inference time, entity embeddings can be pre-computed,": This statement is a bit confusing, as the embeddings for the entities are already pre-computed (i.e., at inference-time, only the mention embedding has to be computed)? Which metric is actually used to compare embedding vectors? 217ff: "Then, we propose to rely on the first token, instead of the CLS token, of an entity mention to produce mention embeddings, thus allowing to embed all the entity mentions of a document at the same time." --> It would then be required to explain, how multiple mentions - especially the expected varying number - is handled 218: "we propose to rely on the first token": why not B and I considered (as some mentions might consist of more than 1 or 2 tokens)? OK, addressed in line 225.. 3.2 Cross-Encoder: The description is not quite detailed and could be enhanced. How does it eventually help to mitigate the aforementioned potential errors? 288: "the best performances are obtained with": this statement seems to be quite holistic - I would rather say that this is true according to the conducted experiments in the mentioned paper? 324: "beforehand, The" -> the 342ff: When I see the list of annotation labels considered, I am a bit confused. Taking into account types that do not describe places differs from the task? 360: "We then applied this classifier to classify all the French Wikipedia entities ": This tends to be critical, as the labels will potentially not have the quality of manual annotations. It would be required to read about test results using unseen test data. 385ff: These two sentences are representative for the writing style of the whole article, where the order of information presented appears to be a bit confusing. I just had a look the the paper "Entity Linking in 100 Languages" - the writing style is very crisp, clear and structured. I would recommend to revise the manuscript to achieve that readers can follow a bit easier. 4.3: It is not clear, how the data is used when the EL-related part is missing? 400: "it then appears to be extremely valuable to help the model build coherent representations for tweets" --> this statement needs to be confirmed. 406: "geolocate natural disasters from social networks": well, event if this is the application, it is still about place mentions, right? I would say that it of course depends on the involved document types - if Twitter is the target platform, a model needs to be trained with data from Twitter. But it might not be necessary to use disaster tweets to detect and link the place names. Might be a good idea for an experiment? Table 5 seems not to be mentioned or referenced in the text. Table 6: EL performance is shown here. It would also be interesting to see, how well the mentions actually are detected as this would directly influence the linking. A (maybe stupid) question related to this: what would happen, if we simply take the NER results search the entities with these keywords (maybe allowing for some letter permutations)? 473: "given in Appendix 7 and 7"--> 7 appears twice 493: "Since our model detects non-geographical entities too": this rises the question, why the other types are then actually still included? Doesn't this make the task more complex for the models? Table 7: "The number between parenthesis indicates the number of tweets, mentions or entities which have been localized inside the area impacted by the earthquake/storm.": as there are around 250 distinct entities found, it would be of interest (and feasible in terms of effort) to know the quality here (i.e., false positive rates or other metrics?). 498: EMSC should be explained 548: "While this last analysis does not prove that our model predictions are correct".. this was exactly what I was thinking here: I would at first be interested in how well the proposed models actually perform - this is necessary. Another aspect: it is well known, that keyword-based filtering of tweets comes at the cost of a bad precision. How is ensured, that non-related tweets are not used in this analysis? 6.2: As the title is "Alex storm", it is a bit surprising/unexpected to read about floods in 6.2.2? 598 (Discussion): "Our model seems to be able to capture coherent representations of real natural disasters." --> very vague - has to be underpinned with quantitative experiments 638: "Without being able to assess in detail the capacity of the model to detect all the geo-locatable features with precision": why not taking the usual approach of random samples? 640: "the model is able to capture the overall footprint of earthquakes and flash floods": the proposed model can actually do EL and since the linked entities have geo-coordinates, they can be shown in a map. The fact that a footprint is visible, is related to the data and the users that report on an event. however, it is not clear, if the data contains many eyewitness-reports or (as commonly observed) sympathy/support messages. as the tweets are basically identified based on keywords, it is likely, that there are many false positives contained. this needs to be investigated based on quantitative and qualitative experiments. 650: "Furthermore, external expert knowledge, such as maps,.." this is actually a crucial point. no emergency response manager will solely decide based on a twitter system. they in fact already have well-established routines and have local knowledge so that we need to identify the information gaps ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
PONE-D-23-07957R1Entity Linking for real-time geolocation of natural disasters from social network postsPLOS ONE Dear Dr. Caillaut, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The provided revision of the paper is valid and overall of good quality, however the reviewers have raised some further points to address which I believe would improve the manuscript and may allow a revised version to reach an acceptable level for the publication of the paper. You are then required to update your manuscript based on the comments of the reviewers. Please submit your revised manuscript by Apr 19 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sergio Consoli Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: All comments have been addressed Reviewer #4: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: I Don't Know Reviewer #4: No ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #3: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors, while providing concise responses to the reviewers' inquiries, have successfully enhanced the paper. Consequently, I recommend accepting the submission. Reviewer #3: In general, the paper looks well-worked-upon and useful; I need to especially praise the authors for their willing to improve a concrete practice, rather than elaborate on a method that will never be actually used in real life. I am not questioning the general approach; neither I am against the method in its major elements (design, pipeline, evaluation etc.). I appreciate the work with the datasets; they include nearly all available relevant data, as well as the new training dataset created by the authors. But I have several reservations that, to my viewpoint, need to be addressed, to make the paper less ‘technical’ and a bit closer to real life, including the practical disaster fighting by both professionals and ordinary people. These include conceptual issues and issues of methodology. CONCEPTUAL ISSUES 1. The information on disasters is usually gathered not from social media (which may be highly misleading, as no fact-checking is done), but by special taskforces. Thus, why use your tool instead of using the fact-checked evidence collected by professionals? And what to do when the data from the former contradict the latter? What would your advice be? I think this issue needs to be mentioned in the paper, as stating unequivocally that such a tool only helps may, too, be misleading. We know from many examples, even before Twitter, e.g. from fires in Greece and other European countries that people tended to listen to media that reported on fires, not the firemen, and came to wrong places for help. Social media are an even more distorted mirrors of events; they may intensify, distort, misrepresent, and diminish the scale of the event. Thus, it needs to be clarified in the paper how exactly the tool is used and what for, knowing that social media may mislead the viewer, and how the info is doublechecked; this will explain why and how you wish to enhance its work. 2. However, even if your tool will represent the social media reality 100%, there is no guarantee that it corresponds 100% to what is happening on the ground. This, in its turn, means that the uses of the tool need to be limited to, say, disaster alerts – or, on the opposite, the tool needs to be ML-enhanced to be able to go beyond naïve search for locations and keywords but get trained to detect markers of both disasters and locations beyond particular names/keywords, in order to critically improve its quality. METHODS AND RESULTS 3. The authors are right in saying that there are too many papers on Twitter geolocation; but I am not sure they are right in disregarding all those that tackle issues beyond natiral disaster detection. This is especially wrong in terms of methodology: Many papers that do not focus on monitoring disasters still provide valuable improvements for the detection method itself. A short review may be found in Blekanov et al. (2022) – see https://dl.acm.org/doi/abs/10.1007/978-3-031-22131-6_2. As an example of what is omitted by the authors, this paper itself might also be of help, as it compares three approaches to enhance geolocation of Twitter users, including via ML-based analysis of their text corpora, and reports improving user geolocation. I am not sure how it will work on the community/local level, but it worked well on national level, as the authors claim. 4. In connection with (4): Analysing individual tweets may be productive for ‘red alerts’, but may remain endlessly non-productive for geolocation detection, however good annotation, model training, and fine-tuning may be. As a ‘future work’ prospect, I would still mention analysis of tweet pools. You tell that parsing the history of the network is unfeasible when you discuss such options, and claim you do not use prior knowledge in your model; but this is not about parsing the whole network but only the tweets of a particular account, of which the tweets would allow for quicker recognition of the residence of a given user who posted the info on a disaster. The same goes from the friendship network, but here the data may be more misleading (e.g., if one would detect my location based on my Facebook feed, they would be wrong by 4,400 km). 5. Low Recall and F-score are not explained (I think the reviewers had already mentioned that). Either explain why this result is sufficient and cannot be higher (is it so for tweets because of their length?) or at least highlight the results that you orient to and explain why recall and F-score are not that important (which I honestly doubt). Reviewer #4: This manuscript presents an entity-linking pipeline for geolocating French Tweets posted during natural disasters. Furthermore, the manuscript introduces an annotated entity-linking data set consisting of French Wikipedia corpus and French tweets collected during major disasters. I would recommend accepting this paper after clarifying a few points (elaborated below). Strengths: • The qualitative evaluation is good, demonstrating the utility of the proposed model for disaster response. • The paper is well-written. The organization of the paper could be improved, but it remains easy to follow. • The creation of an annotated data set that can be used for disaster response Named Entity Recognition and Entity Linking tasks in other languages is a valuable contribution, given that the mainstream research in this area is focusing on English data. Concerns: • The authors should clarify whether the proposed entity-linking pipeline presented in this manuscript presents a novel contribution. Are there similar entity-linking models that have been proposed to address this specific issue? Is the proposed architecture unique? A comparison with previous works is recommended. • The cross-encoder component in the proposed pipeline is not clear. Clarifications are needed on whether this cross-encoder was trained separately or on the output of the bi-encoder. It will be helpful to add a figure illustrating all the pipeline’s components and how they are linked to each other’s. • The categories of 'RISKNAT' and 'DAMAGES' (page 11) are unclear. Are they location-related categories? How do they differ from the 'GEOLOC' category? The authors provided a few examples of the 'RISKNAT' category in page 17 “information describing the phenomenon” such as “heavy rain” and “strong gusts of wind”, which do not seem to be location related. It would be beneficial if the authors explain more what these two categories are referring to in page 11. • In section 5.2 (Results), the authors stated “Recently, a comparison of several NER models showed that state-of-the-art F1 scores are smaller than 75% [64], meaning that the results of the Bi-Encoder are excellent”. It is not clear whether the F1 scores mentioned here are those trained and tested on French data. If it is the case, I think the F1 score obtained on French and English data are not directly comparable. There might be some variations in the way locations are mentioned in these two languages. Even though the scope of this paper is French tweets, I would suggest training and testing the NER component of the proposed pipeline on an existing English Benchmark (e.g., the one proposed in “Suwaileh R, Elsayed T, Imran M, Sajjad H. When a disaster happens, we are ready: Location Mention Recognition from crisis tweets. International Journal of Disaster Risk Reduction. 2022; p. 103107”) and compare its performance to the state-of-the-art NER models to support the above conclusion. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: No Reviewer #4: Yes: Ghaith Rabadi ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.
|
| Revision 2 |
|
Entity Linking for real-time geolocation of natural disasters from social network posts PONE-D-23-07957R2 Dear Dr. Caillaut, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Sergio Consoli Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #4: The authors have sufficiently addressed my previous comments. They highlighted the novelty of the proposed approach, which is an acceptable contribution worth of publishing. The experiments were conducted using a manually annotated data set comprising 339 tweets. The small size of the data set used for evaluation is a main limitation of this research. The authors have provided a justification for the direct comparison of their NER on French data against the state-of-the-art English NER results, which I find satisfactory at this stage. This presents a promising direction for future work, particularly in understanding how NER works across different languages, aiming to develop cross-language NER models. Overall, the manuscript was well-written and clear, with no major issues identified. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #4: Yes: Ghaith Rabadi ********** |
| Formally Accepted |
|
PONE-D-23-07957R2 PLOS ONE Dear Dr. Caillaut, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Sergio Consoli Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .