Peer Review History
| Original SubmissionMay 28, 2025 |
|---|
|
Dear Dr. Hinds, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 16 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Hessameddin Ghanbar Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: Yes Reviewer #2: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: Yes ********** Reviewer #1: This manuscript presents an important and well-structured study investigating the impact of instruction detail on the quality of sentiment annotation. The study design is rigorous and scientifically sound, with appropriate sample size and power analyses. The data robustly support the conclusions drawn by the authors. The statistical analyses are appropriate and carefully conducted, including the use of Krippendorff’s alpha for inter-annotator agreement, which is well suited for nominal data. Both descriptive and inferential statistics are applied correctly. The authors have made all relevant data and materials publicly available on the Open Science Framework, promoting transparency and reproducibility. The manuscript is clearly written in standard English, with no major language or structural issues, facilitating reader comprehension. Suggestions: 1. Future work could benefit from exploring annotator diversity across different linguistic and cultural backgrounds. 2. A more detailed discussion on the advantages of soft labeling for model development would strengthen the manuscript. 3. Including a brief reflection on the ethical considerations related to participant exclusion due to failed attention checks would enhance transparency. Overall, this manuscript is a valuable contribution to the field, with novel insights and sound methodology. I recommend acceptance after minor revisions. Reviewer #2: Dear Authors, While the study is well-intentioned and thematically relevant, it suffers from critical weaknesses across nearly every major section of the paper. These limitations undermine both the theoretical contribution and the reliability of the empirical findings. The abstract, while clearly structured, fails to convey the most unexpected and important findings of the study—namely, that detailed annotation instructions did not improve, and in fact appeared to reduce, inter- and intra-annotator agreement. This counterintuitive result, which contradicts the study’s own hypotheses, is central to the paper's impact and should be explicitly flagged in the abstract. Moreover, the methodological framing is vague. Key elements such as dataset size, the number of annotators, the nature of the statistical measures employed, and the form of qualitative analysis are all omitted, reducing the abstract's utility as an informative summary. The introduction, while providing adequate background on the importance of sentiment annotation in NLP, suffers from conceptual underdevelopment. The authors introduce the idea of “interpretive ambiguity” but fail to define it rigorously or connect it to established theoretical frameworks from discourse analysis, pragmatics, or subjectivity studies. The discussion of sentiment is similarly reductive, conflating it with emotion, evaluation, and stance without engaging with the rich multidisciplinary literature that differentiates these concepts. Furthermore, the critique of existing datasets is primarily descriptive and lacks the depth needed to justify the construction of a new dataset. There is insufficient examination of how previous datasets have handled ambiguity or why existing annotation protocols are inadequate. This results in a justification for AmbiSent that appears more speculative than necessary. The articulation of hypotheses and research questions is another area of concern. The authors posit that detailed instructions will increase both inter- and intra-annotator agreement, but they offer no compelling theoretical or empirical rationale to support this assumption. Moreover, the operational definitions of key terms such as “annotation quality,” “diversity of reasons,” and “challenge” are imprecise. The research questions are loosely formulated, overlapping in scope and lacking clear analytical boundaries. There is also no hierarchical organization among the hypotheses and exploratory questions, which complicates interpretation of results and weakens the study’s inferential structure. The methodology section is detailed in parts but lacks essential clarity in others. Most notably, the instructional manipulation—comparing “minimal” versus “detailed” instructions—raises concerns about ecological validity. The level of detail provided in the “detailed” condition is extensive and arguably unrealistic for real-world annotation workflows, which typically involve minimal training to ensure scalability. There is no empirical evidence that annotators actually absorbed or adhered to the complex instruction set, nor is there any direct measure of cognitive load or comprehension. Additionally, while the authors emphasize their commitment to interpretive subjectivity by using soft labels, they paradoxically use native speaker judgments to define preliminary ground truths, a move that contradicts their own critique of fixed labeling. This contradiction undercuts the theoretical coherence of the study. The construction of the AmbiSent dataset introduces further problems. The dataset consists entirely of artificially constructed sentences, which, while enabling control over sentence types, severely limits ecological validity. The sentences are not sampled from naturalistic corpora, such as Twitter, Reddit, or customer reviews, but are instead created by crowdworkers for the express purpose of being ambiguous. This makes it unclear whether the types of ambiguity captured are reflective of those typically encountered in real-world sentiment classification tasks. Moreover, the dataset lacks any baseline set of unambiguous items, making it difficult to isolate the effect of ambiguity from the effect of instructions. There is also no linguistic analysis of the dataset in terms of sentence complexity, length, syntactic structure, or lexical difficulty—all of which are known to influence annotation behavior. The statistical analyses conducted are limited in scope. The authors rely primarily on Krippendorff’s alpha as their measure of agreement, without triangulating results with other relevant metrics such as Cohen’s kappa, Fleiss’ kappa, or Gwet’s AC1. Confidence intervals and effect sizes are not reported, and there is no mention of multiple comparison corrections, despite the number of tests conducted. The analysis does not account for annotator- or item-level variation through mixed-effects modeling, nor does it explore potentially confounding variables such as annotator demographics, prior experience, or annotation speed. These omissions weaken the robustness of the quantitative claims. The qualitative strand of the study, based on thematic analysis of annotator feedback, is underdeveloped. The analysis was conducted by a single coder, with no evidence of reliability checks, coding framework development, or triangulation. The themes presented are not supported by participant quotations or tied back to theoretical constructs. There is no engagement with established methods in qualitative research, such as grounded theory or discourse-oriented coding, and no attempt to integrate qualitative insights meaningfully with the quantitative findings. Most notably, the authors miss the opportunity to use qualitative feedback to explain the unexpected result that minimal instructions outperformed detailed ones—a finding that could have significantly benefited from nuanced qualitative interpretation. In the discussion and conclusion, the authors acknowledge the surprising outcome that more instructions led to less consistent annotation, but their explanation is superficial and largely speculative. There is no engagement with cognitive theories that might explain such findings, such as cognitive overload, decision fatigue, or schema misalignment. The implications for future annotation design are also vague. While the authors suggest that more attention should be paid to instruction crafting, they provide no practical guidelines or frameworks to support this claim. Similarly, there is little discussion of how their findings might inform automated methods for managing label noise or disagreement in machine learning systems. Overall, while the manuscript tackles a compelling problem and demonstrates a commendable attempt at methodological triangulation, the study suffers from serious conceptual and methodological limitations. These include a lack of theoretical grounding, ecological validity issues in dataset design, incomplete statistical reporting, a superficial qualitative analysis, and insufficient integration between study components. These limitations collectively reduce the manuscript’s contribution to the field of annotation science and sentiment analysis. Kind regards, Reviewer ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.
|
| Revision 1 |
|
Dear Dr. Hinds, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 25 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Wei Lun Wong Academic Editor PLOS ONE Journal Requirements: If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions??> Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: Yes ********** Reviewer #1: Thank you for the opportunity to review this rigorously conducted and highly relevant manuscript. The study addresses a critical, yet often overlooked, aspect of NLP research: the human annotator. The mixed-methods design is a particular strength, providing deep, explanatory insights that purely quantitative work often misses. The manuscript is well-structured, the methodology is sound, and the conclusions are well-supported by the data. It is a valuable contribution to the field. Below are my comments, which are largely minor suggestions to further strengthen an already excellent manuscript: 1. Data Integrity & Transparency: Exclusion Process: Please state the total number of participants initially recruited before exclusions (e.g., "Out of XXX initially recruited, 18 were excluded..."). This provides full transparency on attrition. 'Correct Response' Benchmark: Briefly clarify in the Methods that the "correct response" was strictly defined by alignment with the preliminary ground truth established by the three native speakers. A short sentence on the rationale for using this benchmark would be helpful. 2. Statistical Rigor: Multiple Comparisons: The exploratory comparisons of inter-rater reliability across sentence types are informative. Please explicitly state in the figure caption or results text that these comparisons are exploratory and were not adjusted for multiple testing, acknowledging this as a limitation for this specific analysis. 3. Qualitative Depth: Inter-coder Reliability: The thematic analysis is insightful. To further bolster its rigor, please report inter-coder reliability metrics (e.g., Cohen's Kappa on a sample) if a second coder was involved. If the analysis was conducted by a single coder, please acknowledge this as a standard limitation in the Methods. Theoretical Discussion: The link to dual-process theory in the discussion is excellent. Please deepen this slightly by more explicitly mapping the identified annotation approaches (e.g., 'word-focused' -> Type 1 processing, 'empathic role-taking' -> Type 2 processing). 4. Language and Copyediting: The manuscript is very well-written. Please perform a final proofread to correct minor formatting artifacts from the track changes process and ensure consistency. Specific items to check: Page 24, Abstract: Correct "text-classificationnatural" to "natural" and "thntused" to "used". Page 33, Procedure: Correct subject-verb agreement: "The succession... are" should be "The succession... is". Page 122, Introduction: Correct the citation error [1, 2, 3],[4, 5, 6, ... ,33] to the intended format (likely [1-3]). Consistency: Ensure consistent formatting of "Fig" vs. "Fig." and "Session" vs. "session" throughout. 5. Minor Clarifications: Preliminary Ground Truth: In the Procedure section, briefly note that the native speakers validating the stimuli were independent raters (if they were). Krippendorff's Alpha Interpretation: A brief interpretation of the alpha values (e.g., α=0.50-0.59 indicates "moderate" disagreement) in the context of sentiment annotation would aid readers. This is a strong paper. My comments are intended as constructive suggestions to help you achieve the highest possible level of clarity and rigor in the final version. I congratulate the authors on a fine piece of work. Reviewer #2: (No Response) ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org
|
| Revision 2 |
|
Disambiguating sentiment annotation: A mixed methods investigation of annotator experience and impact of instructions on annotator agreement PONE-D-25-28779R2 Dear Dr. Hinds, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support . If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Wei Lun Wong Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions??> Reviewer #2: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #2: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #2: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #2: Yes Reviewer #3: Yes ********** Reviewer #2: (No Response) Reviewer #3: The authors have diligently revised the manuscript in response to the previous round of reviews. It's clear that careful attention was paid to Reviewer #1's comments, and the manuscript is significantly strengthened as a result. The additions regarding data integrity and transparency are welcome. Specifically, the clarification of the participant exclusion process, including the initial recruitment numbers and reasons for exclusion, provides valuable context . Similarly, defining the "correct response" benchmark based on the initial ground truth established by independent native English speakers enhances methodological clarity. In terms of statistical rigor, the authors have appropriately clarified that the comparisons of inter-rater reliability across sentence types were descriptive and exploratory, thereby addressing the concern about multiple comparisons adjustments in this specific context . The added interpretation of the obtained Krippendorff's alpha values, contextualizing them within common benchmarks and considering the impact of the large annotator sample size, is also a helpful addition for readers . The qualitative depth has been enhanced, particularly through the more explicit mapping of the identified annotation approaches (word-focused, holistic, empathic) onto Type 1 and Type 2 processing within the dual-process theory framework . This deepens the theoretical contribution regarding annotator behavior. The requested copyediting fixes and minor clarifications appear to have been successfully implemented, including corrections to grammar , consistent terminology (e.g., "Session") , noting the independence of raters validating stimuli , defining "soft labels" upon introduction , and incorporating the point about motivating instruction adherence even when it conflicts with intuition . The discussion regarding the generalizability limitation due to the use of crowdworkers has also been added appropriately . The study remains a strong contribution, offering valuable insights into the complexities of sentiment annotation through its robust mixed-methods design and the creation of the useful AmbiSent dataset. The investigation into how instructions impact agreement and the exploration of annotator strategies are highly relevant to the NLP community. For future work, it would be valuable to explore how task engagement and cognitive framing could be enhanced through adaptive instruction systems or dynamic feedback loops that sustain annotator attention and align task framing with underlying cognitive tendencies. Extending this research beyond crowdworkers and native English speakers to include multilingual, cross-cultural, or domain-expert annotators could also illuminate how cultural context and professional expertise modulate interpretive ambiguity. Finally, leveraging the AmbiSent dataset to train or fine-tune machine learning models on soft labels could empirically demonstrate how modeling uncertainty improves the robustness and fairness of sentiment classifiers. If the authors wish, they can include these points in the future work section. ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #2: No Reviewer #3: No ********** |
| Formally Accepted |
|
PONE-D-25-28779R2 PLOS ONE Dear Dr. Hinds, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Wei Lun Wong Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .