Exploring prompts to elicit memorization in masked language model-based named entity recognition

Yuxi Xia; Anastasiia Sedova; Pedro Henrique Luz de Araujo; Vasiliki Kougia; Lisa Nußbaumer; Benjamin Roth

doi:10.1371/journal.pone.0330877

Peer Review History

Original SubmissionJanuary 29, 2025
13 May 2025 Decision Letter - Thiago P. Fernandes, Editor PONE-D-25-05072Exploring prompts to elicit memorization in masked language model-based named entity recognitionPLOS ONE Dear Dr. Xia, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jun 27 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Thiago P. Fernandes, PhD Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following financial disclosure: “This research has been funded by the Vienna Science and Technology Fund (WWTF) [10.47379/VRG19008] “Knowledge-infused Deep Learning for Natural Language Processing”.” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 3. Please note that your Data Availability Statement is currently missing the repository name. If your manuscript is accepted for publication, you will be asked to provide these details on a very short timeline. We therefore suggest that you provide this information now, though we will not hold up the peer review process if you are unable. 4. We are unable to open your Supporting Information file [supporting.zip]. Please kindly revise as necessary and re-upload. Additional Editor Comments: Please respond all comments and highlight them in the version of the ms. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I think the content of the manuscript is novel and appropriate, but the structure could be improved. Specifically, 1. Throughout the manuscript, both generalized and model-specific findings are reported. I believe the findings would be easier to follow and refer back to (and possibly be more impactful) if either all the model-specific findings were reported first, then all the generalized findings (or if each sub-section reported all the model-specific findings first followed by all the generalized findings for that sub-section). 2. More precise definition of “memorization”. For example, reference 3 in this version of the manuscript includes > Definition 3.1. A string s is extractable with k tokens of context from a model f if there exists a (length-k) string p, such that the concatenation [p \|\| s] is contained in the training data for f, and f produces s when prompted with p using greedy decoding. followed by two paragraphs contrasting this definition with other definitions of the term. If the definition used here is the same as one used in one of the references, a simple “as defined in [reference]” would be perfectly clarifying. 3. I think answering these questions is beyond the scope of this manuscript, but the questions themselves should be acknowledged: a. How much memorization poses an unacceptable risk? b. How do we weigh risk versus performance in determining acceptable risk in various contexts? This work can be an answer to “how do we compare the memorization-related risk of a given NER model?” but why we’re measuring memorization and how to interpret the M-MEM scores relative to the risk could be clearer. 4. The very end of the conclusion (lines 352‒354) states, “Overall, we demonstrate the importance of using diverse prompts (or generating target prompts by following guidance in Sections 5.4 and 5.5) to detect memorization of NERs for future privacy studies with a comprehensive analysis.” Rather than recommending readers refer to the middle of the Results and Analysis section of this paper, the guidance could be presented as a step-by-step guide (either as its own (sub-section) in the manuscript or in an online resource or a supplemental document). Having the information in a paper like this is useful, but if we want people who read the paper to follow the guidance, we should reduce the friction to doing so (by making it easy to refer to the guidance in an isolated way). 5. Some editorial pedantry: a. Lines 6‒8: I think you mean, “While training data can be extracted through direct prompt querying and text generation in GPT models, that method is not applicable to Masked Language Models (MLMs).” (“that method” instead of “a method that”) b. Line 48: I don’t think we want the word “only” here. This work shows me that prompt engineering can increase the performance gap, but not that nothing else can. c. Line 323: I think “Our work fills a research gap” rather than “Our work fills the research gap” is more appropriate; the work does fill a gap, but “fills the […] gap” implies there’s no need for further research here. d. Lines 343‒344: I think you mean “400 automatically generated prompts” instead of “automatically generated 400 prompts.” Reviewer #2: Summary: This paper studies named-entity recognition (NER) in masked language models (MLM) and assesses whether models finetuned on NER datasets will be more biased toward named-entities seen during finetuning (referred to as memorization in the paper). The paper focuses on how different types of linguistic contexts (referred to as prompts in the paper) affect the surprisal scores models assign to entity tokens. Specifically, the authors collect a large sample of linguistic contexts using an LLM and construct NER examples by inserting a controlled set of named-entities (ones seen vs. unseen during finetuning) into the context sentences. The results show that model surprisal scores differ substantially with respect to the context sentence used, supporting an argument that a diverse set of contexts should be used when evaluating models’ abilities to memorize text. Strengths: - Results show that different contexts used to prompt the model can substantially affect the amount of memorization reflected in model outputs, raising the importance of trying different contexts when assessing a model’s ability to memorize text. - A diverse set of experiments have been performed to assess the effect of linguistic context on memorization of named-entities. Weaknesses: - The paper presents its novelty by claiming that masked language models cannot perform the same type of evaluations as autoregressive language models (LMs) because MLM’s “inability to generate text”. However, it is possible to generate text using MLMs and refutes the authors' claims in several places in the paper (line 12 and 341). - The choice of the evaluation metric in Equation 1 needs to be justified (see questions for more details). Under the current formulation, the metric is inherently biased toward seen entities. The validity of the experimental results is in question because this metric is relevant in all subsequent experiments. - The authors chose to use non-standard machine learning terminologies in the paper that makes it unnecessarily confusing to read and can potentially mislead the reader on what the contributions are. Specifically, the following terminologies seem to deviate from their standard use from ML/NLP literature: 1) ‘in-train’ and ‘out-of-train’: It is more standard to refer to such examples as ‘in-sample’ and ‘out-of-sample’. Alternatively, they can be referred to as ‘seen’ and ‘unseen’ examples. 2) What the authors have been referring to as ‘prompts’ in most of the paper are really ‘linguistic contexts’ or ‘context sentences’. Prompt usually refers to the language used to describe a specific task, but in this paper, the models are evaluated against different linguistic contexts that contain certain types of named entities. In other words, it is the contexts that are different when two different ‘prompts’ are compared, not the task description. Furthermore, the authors referred to one of their experimental conditions as ‘prompt engineering’ (line 180) which further exacerbates the abuse of terminology. 3) The term ‘memorization’ in the context of language modeling typically refers to the case where an LM outputs parts of its training text verbatim during inference time. In the case of NER specifically, memorization would refer to the case where the model reproduces, or assigns high probability scores for, the exact pair of linguistic context and named-entity. Under the paper’s setup, however, the context is not necessarily paired with the inserted named-entity during training/finetuning, so I would refer to what the authors are measuring as “model bias towards seen named-entities” rather than memorization. Questions/comments: - Line 12 and 341. Why are MLM-based models unable to generate text? MLMs work differently from autoregressive LMs but can still be used to perform text generation. These statements should be clarified. - What’s so special about MLMs that sets them apart from autoregressive LMs when it comes to memorization? Is there something special about your experimental approach that differentiates it from previous approaches applied to autoregressive LMs? - In equation 1, what’s the rationale behind taking the maximum probability score of an entity’s beginning (B-e) and inside (I-e) tokens instead of taking the average? Under the current formulation, as long as one of the inside tokens attains a high probability score, that specific entity would receive a high score. This formulation would cause a bias towards seen (i.e., in-train) entities. For example, a model may assign equally low probabilities to the first token of a seen entity and an unseen entity (which suggests equal surprisal), but as soon as you condition on the first token, the model would assign much higher likelihood scores to the inside tokens of seen entities because they’ve been seen during finetuning. Under the current formulation, therefore, the seen entity would receive much higher scores despite the model being equally surprised when the first token of the entity is evoked. Typos & Suggestions: - Line 7. Broken sentence. - Line 76 and 342. ‘previous works’ -> ‘previous work’ - Line 114. Extra number? - Line 123. “In-train names are randomly sampled from Dout for each entity type”. Do you mean out-train here? Otherwise, you are sampling training examples from the unseen name set? - Line 339. ‘pre-training or’ -> ‘pre-training of’ - Line 343. ‘generated’ -> ‘generate’ ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jon Cluce Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0330877.r001
Revision 1
23 Jun 2025 Author Response Dear Editors and Reviewers, We thank the reviewers for their thoughtful and constructive feedback on our manuscript, titled "Exploring Prompts to Elicit Memorization in Masked Language Model-based Named Entity Recognition" (Manuscript ID: [PONE-D-25-05072]). We greatly appreciate the time and effort dedicated to evaluating our work, and we have carefully considered each comment in preparing our revised manuscript. Below, we provide a point-by-point response to each reviewer’s comment. We believe that the revisions have significantly improved the quality, clarity, and impact of the paper, and we hope the updated version addresses the reviewers’ concerns satisfactorily Reviewer #1: I think the content of the manuscript is novel and appropriate, but the structure could be improved. Specifically, 1. Throughout the manuscript, both generalized and model-specific findings are reported. I believe the findings would be easier to follow and refer back to (and possibly be more impactful) if either all the model-specific findings were reported first, then all the generalized findings (or if each sub-section reported all the model-specific findings first followed by all the generalized findings for that sub-section). Response: We address the comment by reporting the model-specific findings first and then reporting the generalized findings in Sections 5.1, 5.2, and 5.4. 2. More precise definition of “memorization”. For example, reference 3 in this version of the manuscript includes > Definition 3.1. A string s is extractable with k tokens of context from a model f if there exists a (length-k) string p, such that the concatenation [p \|\| s] is contained in the training data for f, and f produces s when prompted with p using greedy decoding. followed by two paragraphs contrasting this definition with other definitions of the term. If the definition used here is the same as one used in one of the references, a simple “as defined in [reference]” would be perfectly clarifying. Response: We clarified our definition of memorization in Section 3.3 by adding Definition 3.1. 3. I think answering these questions is beyond the scope of this manuscript, but the questions themselves should be acknowledged: a. How much memorization poses an unacceptable risk? b. How do we weigh risk versus performance in determining acceptable risk in various contexts? This work can be an answer to “how do we compare the memorization-related risk of a given NER model?” but why we’re measuring memorization and how to interpret the M-MEM scores relative to the risk could be clearer. Response: We addressed the two questions mentioned above and the interpretation of the M-MEM scores in Section 6, Discussion. 4. The very end of the conclusion (lines 352‒354) states, “Overall, we demonstrate the importance of using diverse prompts (or generating target prompts by following guidance in Sections 5.4 and 5.5) to detect memorization of NERs for future privacy studies with a comprehensive analysis.” Rather than recommending readers refer to the middle of the Results and Analysis section of this paper, the guidance could be presented as a step-by-step guide (either as its own (sub-section) in the manuscript or in an online resource or a supplemental document). Having the information in a paper like this is useful, but if we want people who read the paper to follow the guidance, we should reduce the friction to doing so (by making it easy to refer to the guidance in an isolated way). Response: We changed the corresponding sentence in the Conclusion section, as we provide all the prompts and implementation code as a supplemental document. 5. Some editorial pedantry: a. Lines 6‒8: I think you mean, “While training data can be extracted through direct prompt querying and text generation in GPT models, that method is not applicable to Masked Language Models (MLMs).” (“that method” instead of “a method that”) b. Line 48: I don’t think we want the word “only” here. This work shows me that prompt engineering can increase the performance gap, but not that nothing else can. c. Line 323: I think “Our work fills a research gap” rather than “Our work fills the research gap” is more appropriate; the work does fill a gap, but “fills the […] gap” implies there’s no need for further research here. d. Lines 343‒344: I think you mean “400 automatically generated prompts” instead of “automatically generated 400 prompts.” Response: We changed the manuscript according to the above points. Reviewer #2:: - The paper presents its novelty by claiming that masked language models cannot perform the same type of evaluations as autoregressive language models (LMs) because MLM’s “inability to generate text”. However, it is possible to generate text using MLMs and refutes the authors' claims in several places in the paper (line 12 and 341). Response: We acknowledge that MLMs can generate text, but less naturally or sequentially than autoregressive LMs (e.g., GPT). Techniques such as Gibbs sampling have been used to generate text with pre-trained MLMs [1]. However, this can not be applied to fine-tuned MLMs for NER, because the token reconstruction head used during pre-training of MLM is replaced with a task-specific classification head which is restricted to the target space, e.g., named entity labels [2, 3]. This makes standard generation-based memorization probing inapplicable to NER models. We address this point by revising the sentence on Line 12 “inability to generate text” to “During fine-tuning for NER, the model's objective shifts entirely to sequence labeling, and the token reconstruction head used during pre-training is replaced with a task-specific classification head. This new output layer predicts only NER tags and no longer retains the capability to generate raw text. Hence, NER models cannot be directly assessed for memorization via text generation, such as prompting them to reproduce verbatim in-sample entities given a prefix, a strategy commonly applied to auto-regressive models”. We also extend this in the Related Work and Conclusion (original line 341, but now Line 417) Sections. [1] Wang A, Cho K. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. In: Bosselut A, Celikyilmaz A, Ghazvininejad M, Iyer S, Khandelwal U, Rashkin H, et al., editors. Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation. Minneapolis, Minnesota: Association for Computational Linguistics; 2019. [2] Yang Z, Ding M, Guo Y, Lv Q, Tang J. Parameter-Efficient Tuning Makes Good Classification Head. In: Goldberg Y, Kozareva Z, Zhang Y, editors. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics; 2022. p. 7576–7586 [3] Keraghel I, Morbieu S, Nadif M. Recent Advances in Named Entity Recognition: A Comprehensive Survey and Comparative Study; 2024. - The choice of the evaluation metric in Equation 1 needs to be justified (see questions for more details). Under the current formulation, the metric is inherently biased toward seen entities. The validity of the experimental results is in question because this metric is relevant in all subsequent experiments. Response: We explained the reviewer’s confusion about the formula in the question below. - The authors chose to use non-standard machine learning terminologies in the paper that makes it unnecessarily confusing to read and can potentially mislead the reader on what the contributions are. Specifically, the following terminologies seem to deviate from their standard use from ML/NLP literature: 1) ‘in-train’ and ‘out-of-train’: It is more standard to refer to such examples as ‘in-sample’ and ‘out-of-sample’. Alternatively, they can be referred to as ‘seen’ and ‘unseen’ examples. Response: We address this comment by replacing in-train and out-of-train with “in-sample” and “out-of-sample”. 2) What the authors have been referring to as ‘prompts’ in most of the paper are really ‘linguistic contexts’ or ‘context sentences’. Prompt usually refers to the language used to describe a specific task, but in this paper, the models are evaluated against different linguistic contexts that contain certain types of named entities. In other words, it is the contexts that are different when two different ‘prompts’ are compared, not the task description. Furthermore, the authors referred to one of their experimental conditions as ‘prompt engineering’ (line 180) which further exacerbates the abuse of terminology. Response: While we acknowledge that “prompt” can refer to task instructions in autoregressive models, in the context of masked language models, it has also been widely used to describe input templates used for probing or prediction [1, 2, 3]. We follow this precedent in referring to our sentence-level templates as prompts. We added these papers as citations in our paper for clarification (Line 20). [1] Jiang, Z., Xu, F. F., Araki, J., & Neubig, G. (2020). How can we know what language models know? Zhengbao Jiang, Frank F. Xu, Jun Araki, Graham Neubig; How Can We Know What Language Models Know?. Transactions of the Association for Computational Linguistics 2020; 8 423–438. doi: https://doi.org/10.1162/tacl_a_00324 → They automatically construct sentence-level prompts to evaluate factual knowledge in masked LMs. [2] Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 55, 9, Article 195 (September 2023), 35 pages. https://doi.org/10.1145/3560815 → This comprehensive survey discusses various prompting techniques across different language models, including the use of prompts as input templates for MLMs to perform specific tasks. [3] Ziyang Xu, Keqin Peng, Liang Ding, Dacheng Tao, and Xiliang Lu. 2024. Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15552–15565, Torino, Italia. ELRA and ICCL. → This paper provides a pertinent example of using the term "prompt" to refer to sentence-level templates in the context of masked language models (MLMs). 3) The term ‘memorization’ in the context of language modeling typically refers to the case where an LM outputs parts of its training text verbatim during inference time. In the case of NER specifically, memorization would refer to the case where the model reproduces, or assigns high probability scores for, the exact pair of linguistic context and named-entity. Under the paper’s setup, however, the context is not necessarily paired with the inserted named-entity during training/finetuning, so I would refer to what the authors are measuring as “model bias towards seen named-entities” rather than memorization. Response: We acknowledge that in generative language modeling, "memorization" typically refers to a model reproducing verbatim sequences from its training data during free-text generation [1]. In our setup, NER models are not evaluated on reproducing exact entities. Rather, we assess whether the model assigns systematically higher confidence to entities seen during training (in-sample entities) than out-of-sample ones. We use the term memorization in alignment with prior works that extend this notion beyond verbatim reproduction to include biased confidence or likelihood assignment toward seen inputs in structured prediction tasks [2, 3]. In our work, this manifests as a model that assigns higher probabilities to in-sample entities, which we argue constitutes a privacy-relevant form of memorization in NER models. We have revised the manuscript to explicitly add clarification to avoid confusion with the generative case. (Definition 3.1) [1] Carlini N, Tramer F, Wallace E, Jagielski M, Herbert-Voss A, Lee K, et al. Extracting training data from large language models. In: 30th USENIX Security Symposium (USENIX Security 21); 2021. p. 2633–2650. [2] Carlini N, Liu C, ´Ulfar Erlingsson, Kos J, Song D. The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks. USENIX Security Symposium. 2019 [3] Ali RS, Zhao BZH, Asghar HJ, Nguyen T, Wood ID, Kaafar D. Unintended Memorization and Timing Attacks in Named Entity Recognition Models. In: Proceedings on Privacy Enhancing Technologies. PETS 2023; 2023. Available from: http://arxiv.org/abs/2211.02245. Questions/comments: - Line 12 and 341. Why are MLM-based models unable to generate text? MLMs work differently from autoregressive LMs but can still be used to perform text generation. These statements should be clarified. Response: We clarified this comment above. - What’s so special about MLMs that sets them apart from autoregressive LMs when it comes to memorization? Is there something special about your experimental approach that differentiates it from previous approaches applied to autoregressive LMs? Response: Auto-regressive language models (e.g., GPT models) can be directly probed for memorization by prompting them to generate text given a prefix. If a model reproduces long sequences seen during training, especially verbatim, this is a strong signal of memorization. Thus, memorization in autoregressive models is typically studied through generation-based evaluations. However, this can not be applied to fine-tuned MLMs for NER, because the token reconstruction head used during pre-training of MLMs is replaced with a task-specific classification head which is restricted to the target space, e.g., named entity labels. This makes standard generation-based memorization probing inapplicable to NERs. What’s unique about our experimental approach: Our approach is tailored to evaluate memorization within the fine-tuned MLM framework, particularly in NER. Instead of prompting the model to generate text, we: (1) Create diverse prompt templates (contextual sentences) in which candidate named entities (in-sample vs. out-of-sample) are inserted. (2) Measure prediction confidence scores on the [MASK] token for both in-sample and out-of-sample entities. (3) Define memorization as the model’s preference for training entities (higher confidence), even when inserted into novel linguistic contexts. This allows us to quantify memorization behavior in a structured prediction task, without relying on generative capacity. While previous works have examined memorization in autoregressive models largely for text generation tasks, our work is one of the first to provide a systematic and quantifiable approach to studying memorization in fine-tuned MLMs. We added this part in the Introduction, Related Work and Conclusion Sections to further distinguish our work from methods used for the autoregressive models. - In equation 1, what’s the rationale behind taking the maximum probability score of an entity’s beginning (B-e) and inside (I-e) tokens instead of taking the average? Under the current formulation, as long as one of the inside tokens attains a high probability score, that specific entity would receive a high score. This formulation would cause a bias towards seen (i.e., in-train) entities. For example, a model may assign equally low probabilities to the first token of a seen entity and an unseen entity (which suggests equal surprisal), but as soon as you condition on the first token, the model would assign much higher likelihood scores to the inside tokens of seen entities because they’ve been seen during finetuning. Under the current formulation, therefore, the seen entity would receive much higher scores despite the model being equally surprised when the first token of the entity is evoked. Response: We appreciate the reviewer’s careful attention to Equation 1 and the thoughtful critique. However, we believe there may be a misunderstanding regarding how the entity-level confidence score is computed. To clarify: when evaluating each token in a target entity, the NER model outputs probabilities over all possible labels (e.g., B-PER, I-PER for person entities). Since Attachments Attachment Submitted filename: Response to Reviewers.pdf https://doi.org/10.1371/journal.pone.0330877.r002
23 Jul 2025 Decision Letter - Thiago P. Fernandes, Editor PONE-D-25-05072R1Exploring prompts to elicit memorization in masked language model-based named entity recognitionPLOS ONE Dear Dr. Xia, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please respond all comments and highlight the changes in the revised ms. Please submit your revised manuscript by Sep 06 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Thiago P. Fernandes, PhD Academic Editor PLOS ONE Journal Requirements: 1. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for addressing the concerns I raised in my review. I'm satisfied with this revision, with a couple tiny revisions: 1. The GitHub link currently points the "main" branch, which is a moving target. Please either create a tag to point to or point to an SHA so the paper will reference a static version of the code. 2a. Reference 12 has subscript "a0" in place of "_a_0" in the DOI. 2b. Some references use DOIs and others use HTTP links where DOIs are available, and some don't have links despite being available online. I suggest using DOIs where available, and HTTP links where DOI links aren't available but the source is available online. This recommendation is very soft; I don't think this should prevent publication. Reviewer #2: (No Response) ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jon Cluce Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0330877.r003
Revision 2
5 Aug 2025 Author Response Dear Editors and Reviewers, We thank the reviewers for their thoughtful and constructive feedback on our manuscript, titled "Exploring Prompts to Elicit Memorization in Masked Language Model-based Named Entity Recognition" (Manuscript ID: [PONE-D-25-05072]). We greatly appreciate the time and effort dedicated to evaluating our work, and we have carefully considered each comment in preparing our revised manuscript. Below, we provide a point-by-point response to each reviewer’s comment. We believe that the revisions have significantly improved the quality, clarity, and impact of the paper, and we hope the updated version addresses the reviewers’ concerns satisfactorily 1. The GitHub link currently points the "main" branch, which is a moving target. Please either create a tag to point to or point to an SHA so the paper will reference a static version of the code. Response: We have created a tag named v1.0 and updated the GitHub link in the paper to point to a static version of the code. 2a. Reference 12 has subscript "a0" in place of "_a_0" in the DOI. Response: We have revised "a0" to "_a_0". 2b. Some references use DOIs and others use HTTP links where DOIs are available, and some don't have links despite being available online. I suggest using DOIs where available, and HTTP links where DOI links aren't available but the source is available online. This recommendation is very soft; I don't think this should prevent publication. Response: We have revised the references by adding necessary DOIs or HTTP links for each referred paper. Sincerely, Yuxi Xia On behalf of all co-authors Attachments Attachment Submitted filename: Response_to_Reviewers_auresp_2.pdf https://doi.org/10.1371/journal.pone.0330877.r004
7 Aug 2025 Decision Letter - Thiago P. Fernandes, Editor Exploring prompts to elicit memorization in masked language model-based named entity recognition PONE-D-25-05072R2 Dear Dr. Xia, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Thiago P. Fernandes, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for addressing my final comments. I am satisfied with this version of the manuscript. I think there's still some weirdness in the formatting of some of the references, but editorial should sort that out prior to publication. ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jon Cluce ******** https://doi.org/10.1371/journal.pone.0330877.r005
Formally Accepted
Acceptance Letter - Thiago P. Fernandes, Editor PONE-D-25-05072R2 PLOS ONE Dear Dr. Xia, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Thiago P. Fernandes Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0330877.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .