Peer Review History

Original SubmissionJuly 28, 2025
Decision Letter - James R Faeder, Editor, Alexey Onufriev, Editor

PCOMPBIOL-D-25-01510

Nucleotide context models outperform protein language models for predicting antibody affinity maturation

PLOS Computational Biology

Dear Dr. Matsen IV,

Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

​Please submit your revised manuscript within 60 days Oct 28 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter

We look forward to receiving your revised manuscript.

Kind regards,

Alexey Onufriev

Academic Editor

PLOS Computational Biology

James Faeder

Section Editor

PLOS Computational Biology

Additional Editor Comments:

One of the critical comments to address is novelty of the proposed model; including a detailed comparison with ARMADiLLO, as proposed by one of the reviewers, makes sense.

Journal Requirements:

1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full.

At this stage, the following Authors/Authors require contributions: Mackenzie M. Johnson, Kevin Sung, Hugh K. Haddox, Yun S. Song, Julia Fukuyama, Ashni A. Vora, Tatsuya Araki, Gabriel D. Victora, and Frederick A. Matsen IV. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form.

The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions

2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019.

3) Thank you for stating "Upon publication, processed data files will be made available on Zenodo." We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. 

4) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published.

1) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

5) Thank you for stating that "G.D.V. is an advisor for and holds stock of the Vaccine Company. T.A. is currently an employee of Pfizer Inc." Please amend your 'Competing Interests' statement and declare all competing interests beginning with the statement "I have read the journal's policy and the authors of this manuscript have the following competing interests:"

Note: If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Authors:

Please note that one review is uploaded as an attachment.

Reviewer #1: In there manuscript ”Nucleotide context models outperform protein language models for predicting antibody affinity maturation” Johnson et al. develop and benchmark a model taking nucleotide sequence context into processes for modelling of affinity maturation. This approach has scientific relevance as hypermutation is strongly influenced by the nucleotide sequence context both in terms of promoting or preventing substitution. The authors conclude that such an approach performs better that other models. The authors might however also consider the following aspects:

In the past, the ARMADiLLO tool (e.g. 10.1093/nar/gkad398) uses a similar concept to assess evolution of antibodies. How do these methodologies compare in terms of accuracy of evolution prediction?

Evolution does, as the authors predict, not only depend on the linear protein sequence but also on the nucleotide sequence that encodes it. However, evolution also depends on the structural context that neither methodology really approach. Some residues (e.g. but not limited to C41, W41, W52, and C104) are so important for the structure that they never evolve. Others may evolve but are restricted to evolution to conserved substitutions (e.g. L, I, M; nucleotide sequence prediction softwares (e.g. ARMADiLLO) would typically overestimate substitutions at such sites) while others are able to accommodate essentially any residue. Other parts, not specifically discussed, of the domains, such as the upper and lower cores, may have similar restrictions. In some cases, germline genes encode unusual resides (not encoded by other genes, such as residues 46 and 81 of human IGHV1-8-derived antibodies) that are highly mutated (see for instance DOI: 10.3389/fimmu.2017.01433), sites that may be missed by a nucleotide-focused prediction tools. The authors ought to discuss the structural context and its role in prediction. Are combinations of models as applied here better able to incorporate the structural perspective than without the need for explicitly incorporating a structural component into the computational process?

The authors removed sequences that had >10 mutations in any window of 20 consecutive sites (lines 442-445). I understand the rational but how does this impact in particular analysis of diversification of CDRs where higher levels of diverse substitutions may be observed. Please discuss.

Minor comments

Lines 338-346 appears to be Discussion and should be moved to that section.

The authors should indicate the reference sets (incl version number) that was used to annotate germline gene origin.

Have models been trained solely on sequences derived from the same germline gene or also on sequences derived from other genes? Residues far apart in the linear sequence may come into close proximity in the folded protein and this may impact mutability in particular of residues that are structurally important. Please comment.

Reviewer #2: Timely benchmark: EPAM (code released) shows nucleotide-context SHM models outperform protein LMs across human repertoires and a Replay system. With the revisions, this would be an important community reference.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: Raw data has been obtained from external sources. They, not the present authors, might not be able to release the data.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

Figure resubmission:

While revising your submission, we strongly recommend that you use PLOS’s NAAS tool (https://ngplosjournals.pagemajik.ai/artanalysis) to test your figure files. NAAS can convert your figure files to the TIFF file type and meet basic requirements (such as print size, resolution), or provide you with a report on issues that do not meet our requirements and that NAAS cannot fix.

After uploading your figures to PLOS’s NAAS tool - https://ngplosjournals.pagemajik.ai/artanalysis, NAAS will process the files provided and display the results in the "Uploaded Files" section of the page as the processing is complete. If the uploaded figures meet our requirements (or NAAS is able to fix the files to meet our requirements), the figure will be marked as "fixed" above. If NAAS is unable to fix the files, a red "failed" label will appear above. When NAAS has confirmed that the figure files meet our requirements, please download the file via the download option, and include these NAAS processed figure files when submitting your revised manuscript.

Reproducibility:

To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Attachments
Attachment
Submitted filename: Final_version_comments_20Aug2025.docx
Revision 1

Attachments
Attachment
Submitted filename: rebuttal.pdf
Decision Letter - James R Faeder, Editor, Alexey Onufriev, Editor

Dear Dr. Matsen IV,

We are pleased to inform you that your manuscript 'Nucleotide context models outperform protein language models for predicting antibody affinity maturation' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Alexey Onufriev

Academic Editor

PLOS Computational Biology

James Faeder

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The comments made by the reviewers have been appropriately addressed.

Reviewer #2: All my comments have been addressed. I have no additional comments.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Formally Accepted
Acceptance Letter - James R Faeder, Editor, Alexey Onufriev, Editor

PCOMPBIOL-D-25-01510R1

Nucleotide context models outperform protein language models for predicting antibody affinity maturation

Dear Dr Matsen IV,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .