Peer Review History

Original SubmissionSeptember 18, 2025
Decision Letter - Fei Guo, Editor, Ferhat Ay, Editor

PCOMPBIOL-D-25-01907

Controllable protein design via autoregressive Direct Coupling Analysis conditioned on principal components

PLOS Computational Biology

Dear Dr. CAREDDA,

Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Feb 15 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter

We look forward to receiving your revised manuscript.

Kind regards,

Fei Guo

Academic Editor

PLOS Computational Biology

Ferhat Ay

Section Editor

PLOS Computational Biology

Journal Requirements:

If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

1) Your manuscript is missing the following sections: Methods.  Please ensure all required sections are present and in the correct order. Make sure section heading levels are clearly indicated in the manuscript text, and limit sub-sections to 3 heading levels. An outline of the required sections can be consulted in our submission guidelines here:

https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission

2) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines:

https://journals.plos.org/ploscompbiol/s/figures

3) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list.

4) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published.

1) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)."

2) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

3) If any authors received a salary from any of your funders, please state which authors and which funders..

If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The manuscript describes an extension of the standard DCA, routinely used in connection with structure prediction of proteins, to include annotated functional data. In my opinion, this is a very relevant study that further improves a technique which has already proven seminal in the field of computational biology. Therefore, I recommend its publication with minor revision, mainly focused on the clarity of the text.

1. "Lack of control over the output distribution" (line 26) seems really pessimistic; maybe it is not exactly what the authors meant to say.

2. Little is said in the main text of the functional features that are included in the model (the y_k quantities). It could be beneficial to the reader to have a clear idea of what they are from the very beginning.

3. As usual, the gaps are treated as a type of amino acid, but I would find it strange they contribute actively to defining function, as eq. 2 would suggest. Do they obey some special rule?

4. A classic work by Sanders and Schneider (Proteins Structure. Funct. Gen. 9, 56, 1991) defines a curve in the space of hamming distance/sequence length that separates homologs from pairs with unknown relation, saturating at distance 0.75 for long chains. Can the maximum Hamming distance found in the generated sequence (line 179) related to that?

5. Predictions are compare with experimental structure with a RMSD (line 200), but it is well known that RMSD displays very nonlinear effects, structural variations producing large changes in the RMSD when this is small and negligible changes when it is large. Can the mismatch observed in the case of PF00014 be analysed in terms of number of common contacts, defined in some appropriate way?

6. I would have preferred to see some results concerning mutations (line 288) in the main text. Is it possible to distinguish between effects of mutations on thermodynamic stability and on function (I.e., all the rest)?

Reviewer #2: Minor Comments

Overall, the manuscript presents an interesting approach for conditioning autoregressive protein sequence models on PCA-derived features. The method is clearly explained at a high level, and the comparisons against existing DCA-based models are valuable. I have several minor comments that, if addressed, would improve clarity, reproducibility, and contextual grounding of the work.

1. Clarification of model architecture

The manuscript refers to the use of an autoregressive model but does not describe the underlying architecture. Please specify whether the model is transformer-based, attention-based, convolutional, or uses another design, and include the total number of parameters or layers.

2. Dataset sizes and splitting strategy

The authors should provide explicit details on the size of the training MSAs (number of sequences, average length), the size of the evaluation datasets, and the procedure used to split or select data for training and evaluation. Clarifying whether they rely on standard Pfam splits or their own selection criteria would improve experimental transparency.

3. Discussion of recent MSA-based generative models

The introduction would benefit from situating the method within the context of more recent MSA-based generative models such as PoET-2 and Tranception, in addition to bmDCA and ArDCA. A brief comparison or rationale for focusing on DCA-based baselines would help readers understand the positioning of this work.

4. Structural prediction confidence metrics

The structural evaluation relies on RMSD comparisons between predicted and experimental structures. It would be helpful to also report pLDDT or other confidence metrics from AlphaFold 3 or ESMFold when folding generated sequences.

5. Amino acid composition analysis

To further validate the generative capabilities of the model, the authors could provide amino acid frequency distributions for natural versus generated sequences. This would allow readers to assess whether the model captures basic sequence-level statistics in addition to PCA and Hamming distance metrics.

6. Details of DMS evaluation

For the DMS experiments, please report the number of mutations or samples included and provide the performance metrics of each model (FeatureDCA, bmDCA, ArDCA) directly in the main text. This would help clarify the extent and rigor of the DMS evaluation.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Figure resubmission:

While revising your submission, we strongly recommend that you use PLOS’s NAAS tool (https://ngplosjournals.pagemajik.ai/artanalysis) to test your figure files. NAAS can convert your figure files to the TIFF file type and meet basic requirements (such as print size, resolution), or provide you with a report on issues that do not meet our requirements and that NAAS cannot fix.

After uploading your figures to PLOS’s NAAS tool - https://ngplosjournals.pagemajik.ai/artanalysis, NAAS will process the files provided and display the results in the "Uploaded Files" section of the page as the processing is complete. If the uploaded figures meet our requirements (or NAAS is able to fix the files to meet our requirements), the figure will be marked as "fixed" above. If NAAS is unable to fix the files, a red "failed" label will appear above. When NAAS has confirmed that the figure files meet our requirements, please download the file via the download option, and include these NAAS processed figure files when submitting your revised manuscript.

Reproducibility:

To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Revision 1

Attachments
Attachment
Submitted filename: Risposte Reviewers PLoS.pdf
Decision Letter - Fei Guo, Editor, Ferhat Ay, Editor

Dear Dr CAREDDA,

We are pleased to inform you that your manuscript 'Controllable protein design via autoregressive Direct Coupling Analysis conditioned on principal components' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Fei Guo

Academic Editor

PLOS Computational Biology

Ferhat Ay

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: All my criticisms have been addressed satisfactorily. The result is a convincing and interesting manuscript, that can be published in its present form.

Reviewer #2: Authors have addressed all of my comments!

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Formally Accepted
Acceptance Letter - Fei Guo, Editor, Ferhat Ay, Editor

PCOMPBIOL-D-25-01907R1

Controllable protein design via autoregressive Direct Coupling Analysis conditioned on principal components

Dear Dr CAREDDA,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .