Peer Review History
| Original SubmissionApril 14, 2025 |
|---|
|
Dear Dr. Zaitsu, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 19 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Emine Ozdemir Kacer Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Please provide a complete Data Availability Statement in the submission form, ensuring you include all necessary access information or a reason for why you are unable to make your data freely accessible. If your research concerns only data provided within your submission, please write "All data are in the manuscript and/or supporting information files" as your Data Availability Statement. 3. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please delete it from any other section. 4. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. [Note: HTML markup is below. Please do not edit.] Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Partly Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: No ********** 2. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** Reviewer #1: Summary This manuscript presents two complementary studies on distinguishing AI-generated from human-written Japanese text. In Study 1, the authors compare 350 texts from seven LLMs (ChatGPT variants, Claude, Gemini, Copilot, Llama3.1, Perplexity) against 100 human-written public comments using Japanese stylometric features (function-word unigrams, POS bigrams, and phrase patterns). Multidimensional scaling (MDS) and a random-forest classifier (with leave-one-out cross-validation) showed that stylometry can almost perfectly separate AI vs. human text (RF accuracy ~99.8%). In Study 2, 403 Japanese participants judged a set of 9 example texts (eight LLM-generated, one human) as “AI” or “Human” and rated their confidence. Participants performed poorly (often at chance) and were especially fooled by more advanced models (ChatGPT(o1)), aligning with prior findings that untrained humans struggle to detect AI-generated text. The topic is timely given the rapid evolution of LLMs, and the use of stylometry for authorship analysis is well-motivated. Major Comments Scope and Generalizability: The corpus in both studies is limited to public comments on government documents. While this domain allows controlled comparisons, it is quite specific. The authors should explicitly acknowledge that stylometric patterns may differ in other genres (e.g. creative writing, social media) or languages. Clarify in the Discussion that findings may not generalize beyond Japanese public-comment texts. It would strengthen the paper to mention whether the authors plan to test other text types in future work. Classification Methodology: The authors use a random forest (RF) with leave-one-out cross-validation (LOOCV) to classify texts (Study 1). This approach is reasonable to avoid overfitting, but some details are missing. The paper should specify RF parameters (number of trees, depth, etc.) and whether class imbalance was handled (350 AI vs 100 human texts). The reported accuracy (99.8%) is extremely high. To validate this, the authors could report a confusion matrix or feature importance to ensure no trivial features dominate. For example, length differences (human texts avg 661 chars vs LLM texts up to 1137 chars) might partly drive the result; clarifying that features were normalized (which they were, by percentage) is good. Overall, the methodology seems sound, but more transparency about the RF and CV would help readers assess reproducibility. Study 2 Design Limitation: Study 2’s design is inherently constrained by using only one example per LLM. The authors themselves note: “Study 2 was limited because we presented only one public comment for each LLM”. This is a serious limitation. With a single text per model, any idiosyncratic content or phrasing could bias participants’ judgments. The manuscript should emphasize how this limitation affects the interpretation: participants might have been reacting to the specific text content rather than general style. A stronger discussion of this point is needed. For future work, multiple texts per LLM (and human) should be used to average out content effects. The authors should make clear that Study 2 primarily illustrates feasibility rather than providing definitive, generalizable detection rates. Connection to Prior Work: The Introduction and Discussion cite relevant studies, but the paper could better contextualize the human-evaluation results. For example, Clark et al. (2021) found that untrained evaluators distinguish GPT-3 from human text only at random chance aclanthology.org. Similarly, Köbis & Mossink (2020) report that participants failed to reliably detect GPT-2–generated poetry arxiv.org. The current findings (participants’ confusion, low accuracy) align well with these, but the paper should explicitly mention such parallels. Citing these works in the Discussion would strengthen the claim that “humans’ AI detection abilities are limited.” The authors do cite Clark et al. in passing, but an explicit linkage to these results and perhaps a brief summary (e.g. “in line with previous reports of chance-level human performance aclanthology.org arxiv.org ”) would help readers understand the broader context. Interpretation of Stylometric Differences: The finding that Llama3.1 texts occupy a different stylistic space (Fig 1–4) is interesting. The authors speculate it might be due to Llama’s smaller parameter count. This is plausible, but is speculative. The Discussion should present this more cautiously (e.g. “one possibility is”). Other factors (different training data, tokenization, or style by design) could also play a role. If possible, the authors could consider a simple follow-up analysis (e.g. train RF to differentiate Llama vs other LLMs) or at least frame the parameter-size explanation as a hypothesis. Presentation of Results: Some results would benefit from clearer explanation. For example, the logistic regression on confidence (Table 3) is technically sound, but the text could better highlight the main takeaway (ChatGPT(o1) significantly increased odds of human-like judgments and higher confidence). Also, the labeling in Table 1 (participants’ judgments) is a bit dense; the authors might spell out key points in the text. More generally, ensure every table and figure is referenced and explained clearly. Minor Comments Language and Typos: The manuscript contains numerous small language issues. These should be carefully corrected. Examples include: "detection of disabilities by humans" (L691-692) – likely meant “detection by humans” or “detection difficulties for humans”. In the Abstract, “overly AI detection abilities were limited” should read “overall AI detection ability was limited.” The term “judgement confidents” appears (L1955-1958) – it should be “confidence.” The acronym mix-up “LMM-generated” (L864) should be “LLM-generated.” Please proofread for such errors. It may help to have a native English speaker review the phrasing. Clarity of Terms: The manuscript uses “ChatGPT (GPT-4o and o1)” and later simply “ChatGPT(o1)” vs “ChatGPT(GPT-4o)”. This notation is confusing. It would be clearer to fully name these variants once (e.g. “ChatGPT (based on GPT-4.0)” vs “ChatGPT (GPT-4o1)”) and then use consistent shorthand. Similarly, the term “LLM (Large Language Model)” should be defined at first use in the Introduction. Figure Captions: Ensure all figures have descriptive captions and legends. The text refers to “Fig 1–4” but readers should not have to guess axes or symbols. Since the file we received lacked figure content, I cannot check fully, but reviewers should verify that figure captions are complete in the submitted version. Data Availability: The manuscript’s Data Availability statement is incomplete. PLOS ONE requires that all data (the Japanese text corpus, survey responses, etc.) be made available. The authors should state where data and code can be accessed (e.g. a public repository) or justify any restrictions. If not already included, please add a full Data Availability statement. Ethics Statement: Study 2 involved human participants. It is good that ethics approval from Mejiro University is mentioned (Ethics Committee No.24人-036). Ensure this is clearly stated in the Methods (it appears only in the submission interface and references). It might also be helpful to mention that participants gave informed consent. Reference to Self-Published Work: The authors cite their prior PLOS ONE studies [10–11]. While building on those is fine, the manuscript should emphasize the new contributions here (multiple new LLMs, human survey). Avoid reusing large verbatim text from those works. (I did not detect obvious overlap, but it is worth verifying against [44]). Recommendation This work addresses an important and timely issue with a generally sound approach and interesting findings. The combination of stylometric classification and human judgment experiments is valuable. However, the manuscript needs minor revisions before it is suitable for publication. Key improvements include: clarifying methodological details (e.g. RF parameters, cross-validation), acknowledging the limitations of the survey design more strongly, and correcting language/typos. Addressing the above points will considerably strengthen the paper. Reviewer #2: The manuscript, “Stylometry can reveal artificial intelligence authorship, but humans struggle: A comparison of human and seven large language models in Japanese”, presents a technically sound and well-structured study. It addresses an important and timely issue by comparing machine-based stylometric detection with human judgments in the Japanese language context. The methodology is rigorous, the analyses are appropriate, and the data support the conclusions. Ethical approval and data availability are also adequately addressed. There are, however, several areas where the manuscript could be strengthened: 1- In Study 2 design ,Only one text per LLM was used in the human judgment task. This limits the generalizability of the findings, as responses may partly reflect topic effects rather than model stylistics. While the authors acknowledge this, it should be emphasized more strongly, with clearer discussion of how future studies could incorporate multiple texts per model. 2- Although generally intelligible, some phrases are awkward (as “overly AI detection abilities were limited”). A professional English edit would improve readability and ensure the results and interpretations are conveyed with precision. 3- The MDS and violin plots are central to the findings but may not be immediately interpretable for all readers. Figure legends should be expanded to explain more clearly what the axes, scales, and distributions represent. 4- The finding that ChatGPT-o1 “misleads humans more effectively” is supported, but the discussion should avoid overgeneralization. Alternative explanations, including topic familiarity, surface-level linguistic cues, or cognitive biases, should be considered more explicitly. Reviewer #3: The design of this complex study and the technical application of several complex statistical analyses are well done and documented clearly in the manuscript. The explanation of the data in support of the five clearly delineated hypotheses is expertly and concisely done. The data source is appropriately identified, and ethical standards review is documented. Illustrated formulas support documentation of statistical analysis, and tables are relevant. The manuscript is well written and organized for the complexity of the study. Reviewer #4: Comments to authors: 1. The authors restrict their analysis to three features (phrase patterns, part-of-speech bigrams, and unigrams of function words). While these are established, the exclusion of higher-level syntactic or semantic features (e.g., dependency relations, embeddings-based features) is a limitation. The paper should justify why only these features were used and discuss whether the results might change with richer stylometric representations. 2. The multinomial logistic regression model includes distance as a covariate, but the authors do not clearly explain how distances were computed or standardized across features. More methodological clarity is needed. 3. The reliance on metric MDS is justified in terms of interpretability, but other dimensionality reduction techniques (e.g., t-SNE, PCA) might capture non-linear relationships more effectively. A comparative analysis, or at least a sensitivity check, would strengthen the claim that MDS is the best choice. 4. The Random Forest classifier achieved near-perfect accuracy (99.8%), which raises concerns of overfitting. Was cross-validation (other than LOOCV) performed? Were hyperparameters tuned? Were the results stable across different random seeds? The authors should provide more transparency on classifier robustness. 5. The ‘human-written’ texts come only from the Japanese government's public comments. This corpus is domain-specific, formal, and possibly homogeneous. The findings may not generalize to other text genres (e.g., creative writing, casual conversation). The authors should either include a second human corpus or clearly acknowledge this limitation. 6. All LLM outputs were generated with zero-shot prompts. However, the prompting strategy strongly influences output style. Without testing few-shot or instruction-tuned prompts, the conclusions about stylometric detectability may not generalize. At a minimum, this limitation should be discussed. 7. The authors should write a limitations section. ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Stylometry can reveal artificial intelligence authorship, but humans struggle: A comparison of human and seven large language models in Japanese PONE-D-25-18282R1 Dear Dr. Zaitsu, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support . If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Emine Ozdemir Kacer Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions??> Reviewer #2: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #2: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #2: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #2: Yes Reviewer #3: Yes ********** Reviewer #2: I have reviewed the revised submission thoroughly and am satisfied with the amendments made. The authors have addressed the points raised in the initial review comprehensively. Reviewer #3: (No Response) ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #2: No Reviewer #3: No ********** |
| Formally Accepted |
|
PONE-D-25-18282R1 PLOS ONE Dear Dr. Zaitsu, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Emine Ozdemir Kacer Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .