Peer Review History

Original SubmissionFebruary 20, 2025
Decision Letter - Alessio Luschi, Editor

Dear Dr. Zhang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 23 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Alessio Luschi, Ph.D.

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.   

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

**********

Reviewer #1: The paper is about the challenge of distinguishing human translations from those generated by

Large Language Models (LLMs) by utilizing dependency triplet features and a Multi-Layer

Perceptron (MLP) classifier. The paper is well-written, but I recommend to authors for adding a pseudocode for the proposed approach.

Reviewer #2: This paper presents a study on distinguishing between human translations and LLM-generated translations using dependency triplet features. The integration of dependency syntax and part-of-speech combinations into a classification framework is novel and insightful. The work represents a shift from the conventional focus on original vs. human-translated texts toward the more timely distinction between human and AI-generated translations. The classification results are strong, and the linguistic insights derived from the feature analysis are valuable for understanding the stylistic and structural patterns characteristic of LLM translations. The emphasis on interpretable features is a welcome aspect of the applicability of the findings. That said, I have several reservations:

1. Framing the Methodology as Deep Learning:

The paper refers to the approach as deep learning, but the model is a multilayer perceptron with two hidden layers. It is relatively shallow and operates on a low-dimensional input (~100 features). While MLPs are technically neural networks and universal approximators, this setup lacks the hierarchical representation learning typically associated with deep models. The framing could be made more precise to avoid overstating the complexity of the approach. A promising direction to call it deep learning could be the transferability of the model or features. The paper should address pretraining on one big corpus and finetuning on another to check the generality of the features and align more with deep learning practices around domain generalization.

2. Hyperparameter Tuning and Overfitting Concerns:

Table 5 presents hyperparameters obtained via random search, but the values (e.g., dropout of 0.25212 and L2 regularization of 0.00081) seem overly specific. Such precision suggests potential overfitting to the validation set. A sensitivity analysis would help assess how robust the model is to small changes in these hyperparameters.

3. Unclear Data Split:

Table 6 shows 156 test examples, while Table 2 describes a training set of 388. However, it’s unclear how the test set was constructed. Is it held out from the 388 examples?

4. Missing Simple Baselines: The paper would benefit from comparison with simpler baselines. For instance, a Naive Bayes classifier could serve as a lightweight alternative that may reveal whether LLM translations disproportionately rely on certain syntactic patterns/triplets (e.g., ChatGPT overusing particular phrasing). Would Naïve Bayes identify the discriminative power of individual features similar to MLP’s results?

I recommend clarifying the issues above to enhance the technical rigor of this interesting study.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: Yes:  Olcay Kursun

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

Revision 1

Manuscript ID: PONE-D-25-08735

Title: Machine Translationese of Large Language Models: Dependency Triplets, Text

Classification, and SHAP Analysis

Authors: Shukang Zhang; Chaoyong Zhao

Dear Editor and Reviewers

We sincerely appreciate your time and constructive feedback. Below is our point-by-point response to all comments.

Response to Academic Editor

1.Style Requirements: We have carefully reviewed the manuscript and believe it now aligns with PLOS ONE’s formatting guidelines, including file naming conventions. The text has been adjusted to follow the provided style template to the best of our knowledge. Please let us know if any further modifications are needed.

2.Data/Code Sharing: The data and author-generated code has been uploaded to GitHub https://github.com/KiemaG5/LLM-translationese.

Response to Reviewer 1

Comment: “Add pseudocode for the proposed approach.”

Response:

We appreciate the suggestion and have added detailed pseudocode as in Fig 2 to clarify the methodology. The pseudocode outlines:

1.Dependency Parsing: Extraction of triplets

2.Feature Engineering: Vectorization of triplet frequencies .

3.Model Training: Cross-validation workflow.

4.SHAP Interpretation: Global, local and dependence analysis of discriminative features

5.Feature Reduction: Iterative pruning of low-importance features while monitoring F1-score to assess robustness

Response to Reviewer 2

Comment 1: “Framing the Methodology as Deep Learning”

Response:

We appreciate the suggestion and have revised terminology to avoid overstatement:

1.Replaced “deep learning” with “neural network-based approach”.

Comment 2: Hyperparameter Tuning and Overfitting”

Response:

We appreciate the reviewer’s insightful suggestion regarding model optimization. However, our primary goal was to validate the general utility of features, not model optimization. Thus:

1.ExpandedMethodology to include 16 classifiers (SVM, MLP, CatBoost, etc.) from scikit-learn and other libraries, all using default parameters to demonstrate feature robustness without hyperparameter tuning.

2.Top F1 Score on Test Set: SVM (93%), MLP (92%), CatBoost (91%), significantly above random (50%). Lower performers (e.g., Decision Tree: 74%, Naïve Bayes: 79%) still surpassed chance, confirming feature discriminability.

Comment 3: “Unclear Data Split”

Response:

We appreciate the suggestion, and have clarified in Sections Text classification and Model performance

1.10-fold cross-validation on the full dataset (776 samples).

Comment 4: “Missing Simple Baselines”

Response:

We appreciate this constructive suggestion and have added:

1.16 models including Decision Tree (74%), Naïve Bayes (79%) and K-Nearest Neighbors (81%) as baselines.

2.SHAP analysis focused on SVM, MLP, CatBoost & Decision Tree, Naïve Bayes K-Nearest Neighbors (top and last 3 performing models) to identify the similarity and dissimilarity in globally impactful features.

We believe these revisions address the concerns accordingly. Thank you again for your valuable input.

Sincerely,

Shukang Zhang

East China Normal University

mikeashjd@163.com

Attachments
Attachment
Submitted filename: Response to Reviewers.docx
Decision Letter - Alessio Luschi, Editor

Dear Dr. Zhang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Aug 31 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Alessio Luschi, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

1. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 

2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

Reviewer #1: I am satisfied with the current version. The authors addressed all comments with well-presented manner.

Reviewer #2: I appreciate the effort to revise the manuscript and add new experiments. However, my first three concerns remain unresolved, so I cannot recommend acceptance at this time.

1. Deep models: Authors relabeled the approach as neural network based. Yet the manuscript still states: "More recent research has employed deep models such as BERT and Llama". If those systems are relevant, please explain why they were not compared or supply a lightweight finetuned baseline to show whether they outperform your MLP.

2. Hyperparameter precision and overfitting: The first draft reported highly specific hyperparameters (for example they had a dropout rate of 0.25212) without a robustness check. The revision now uses many more classifiers but with default settings. This change sidesteps, rather than answers, the overfitting question. The defaults are sometimes not competitive. Please optimize hyperparameters, but without overfitting.

3. Data split and SHAP analysis: Crossvalidation is not a substitute for a heldout test set, and computing SHAP on CV folds risks explaining overfit patterns. Requested action: adopt a three-way split or nested CV, recalculate SHAP on unseen data, and list the top triplets across models and folds. The SHAP-based feature importance analysis appears to have been computed on data used in CV, potentially overlapping with training folds. This undermines the reliability of the interpretation. SHAP values should ideally be computed on a heldout validation set to avoid explaining model behavior on the same data it was trained on. And then a final check can be performed on the leftout never-before-seen testset.

Reviewer #3: This paper explores how to tell apart human translations and those generated by large language models using dependency triplet features and machine learning classifiers. The idea is clear, the methods are solid, and the results are impressive — especially the high F1 scores across models and the use of SHAP for explaining model behavior.

What I liked:

- The dependency triplet feature design is a smart and interpretable way to capture syntax.

- Testing 16 different classifiers shows the authors really wanted to check robustness.

- SHAP analysis adds a lot of value by explaining why the models work the way they do.

- The public availability of the dataset and code is great — it makes the work reproducible.

Minor suggestions:

- The writing is mostly clear, but a few parts are a bit dense or technical. A light language check would help.

- The dataset is well-constructed, but since it’s from one book and one language pair (Chinese-English), it would be good to briefly mention this as a possible limitation.

- The authors say they used default parameters in the classifiers. A quick line explaining why they didn’t tune them would help readers understand the choice.

- Some figures (especially SHAP plots) could be a bit clearer or higher resolution in the final version.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org

Revision 2

Response to Reviewer #1

We sincerely thank Reviewer #1 for their positive assessment and are pleased that they found our revisions satisfactory.

Response to Reviewer #2

We thank Reviewer #2 for their detailed and rigorous feedback. We have addressed the three main concerns as follows:

1. On the comparison with deep models: We appreciate the suggestion to compare with models like BERT and Llama. The primary focus of our paper is to evaluate the power of interpretable, linguistic features (dependency triplets) in distinguishing translation types. This feature-engineering approach is fundamentally different from end-to-end deep learning models, which operate on embeddings and are less directly interpretable. We have clarified this scope in the manuscript and acknowledged that a direct comparison with fine-tuned deep models is a valuable direction for future research.

2. On hyperparameter tuning and overfitting: We thank the reviewer for raising this important point. Our decision to use default parameters for the 16 classifiers was deliberate. Our central hypothesis was that the proposed features are inherently discriminative. By demonstrating strong performance (e.g., F1-score of 0.92 with MLP) across a wide range of models without specific tuning, we provide robust evidence for the general utility of our features. We believe this is a stronger testament to the features’ power than optimizing a single model. We have added a clear explanation of this rationale in the Methodology section.

3. On the data split and SHAP analysis: We thank the reviewer for this insightful comment. We agree that a held-out test set is a gold standard. We chose k-fold cross-validation as it provides a more robust and less biased estimate of generalization performance than a single random split.

Regarding the SHAP analysis, our goal was to explain what patterns the models learned during this robust CV process. Our methodology for calculating SHAP values aligns with standard examples in the official SHAP documentation. To address the valid concern about explaining overfit patterns, we highlight that our findings are based on the consistent high performance across 16 different classifiers. The fact that our features work well across such a diverse set of models strongly suggests that the patterns identified by SHAP are genuinely discriminative and not artifacts of a single, potentially overfit model. We have clarified our methodology and reasoning in the revised manuscript. We also acknowledge that a three-way split or nested CV is a highly rigorous standard, which we will certainly consider for future extensions of this work.

Response to Reviewer #3

We sincerely thank Reviewer #3 for their encouraging and constructive feedback. We have addressed all minor suggestions in the revised manuscript:

1. A thorough language check has been performed to improve clarity.

2. We have added a discussion of the dataset’s limitations (single book and language pair) in the Discussion and Conclusion sections.

3. We have included a rationale in the Methodology section for using default classifier parameters.

4. All figures, especially the SHAP plots, have been regenerated in higher resolution for better clarity.

Attachments
Attachment
Submitted filename: Response_to_Reviewers_auresp_2.docx
Decision Letter - Alessio Luschi, Editor

Machine Translationese of Large Language Models: Dependency Triplets, Text Classification, and SHAP Analysis

PONE-D-25-08735R2

Dear Dr. Zhang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support .

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Alessio Luschi, Ph.D.

Academic Editor

PLOS One

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #2: Yes

Reviewer #3: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #2: Yes

Reviewer #3: Yes

**********

Reviewer #2: This is the second revision of this manuscript. The authors have adequently addressed my comments.

Reviewer #3: Thank you for addressing all the previous comments.

1. The language has been improved and the manuscript now reads clearly and professionally.

2. The discussion now includes the dataset limitation (single book and language pair), which adds transparency to the scope of your findings.

3. The rationale for using default parameters in the classifiers is clearly stated and acceptable for the comparison-focused goals of the study.

4. The updated figures, especially the SHAP plots, are now much clearer and publication-ready.

I recommend acceptance.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #2: No

Reviewer #3: No

**********

Formally Accepted
Acceptance Letter - Alessio Luschi, Editor

PONE-D-25-08735R2

PLOS One

Dear Dr. Zhang,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr Alessio Luschi

Academic Editor

PLOS One

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .