Peer Review History

Original SubmissionMay 22, 2025
Decision Letter - Ilya Ioshikhes, Editor, Lun Hu, Editor

PCOMPBIOL-D-25-01030

Similarity-based transfer learning with deep learning networks for accurate CRISPR-Cas9 off-target prediction

PLOS Computational Biology

Dear Dr. Makarenkov,

Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

​Please submit your revised manuscript within 60 days Aug 25 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter

We look forward to receiving your revised manuscript.

Kind regards,

Lun Hu

Academic Editor

PLOS Computational Biology

Ilya Ioshikhes

Section Editor

PLOS Computational Biology

Journal Requirements:

1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full.

At this stage, the following Authors/Authors require contributions: Jeremy Charlier, Zeinab Sherkatghanad, and Vladimir Makarenkov. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form.

The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions

2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019.

3) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: 

https://journals.plos.org/ploscompbiol/s/figures

4) We have noticed that you have uploaded Supporting Information files, but you have not included a complete list of legends. Please add a full list of legends for your Supporting Information files after the references list.

5) Some material included in your submission may be copyrighted. According to PLOSu2019s copyright policy, authors who use figures or other material (e.g., graphics, clipart, maps) from another author or copyright holder must demonstrate or obtain permission to publish this material under the Creative Commons Attribution 4.0 International (CC BY 4.0) License used by PLOS journals. Please closely review the details of PLOSu2019s copyright requirements here: PLOS Licenses and Copyright. If you need to request permissions from a copyright holder, you may use PLOS's Copyright Content Permission form.

Please respond directly to this email and provide any known details concerning your material's license terms and permissions required for reuse, even if you have not yet obtained copyright permissions or are unsure of your material's copyright compatibility. Once you have responded and addressed all other outstanding technical requirements, you may resubmit your manuscript within Editorial Manager. 

Potential Copyright Issues:

i) Figures 1B, and 3. Please confirm whether you drew the images / clip-art within the figure panels by hand. If you did not draw the images, please provide (a) a link to the source of the images or icons and their license / terms of use; or (b) written permission from the copyright holder to publish the images or icons under our CC BY 4.0 license. Alternatively, you may replace the images with open source alternatives. See these open source resources you may use to replace images / clip-art:

- https://commons.wikimedia.org

- https://openclipart.org/.

6) Please provide a detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published.

1) Please clarify all sources of financial support for your study. List the grants, grant numbers, and organizations that funded your study, including funding received from your institution. Please note that suppliers of material support, including research materials, should be recognized in the Acknowledgements section rather than in the Financial Disclosure

2) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)."

3) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

4) If any authors received a salary from any of your funders, please state which authors and which funders..

If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d

7)  Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well.  

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This manuscript focuses on showing the impact of transfer learning on CRISPRCas9 off-target identification. The number of used similarity metrics are enough. The used datasets are comprehensive. The problem has been extensively studied before, but transfer learning is the main novelty here. Even though the manuscrupt is written quite well, I have the following comments:

Major Comments:

1- Why did we use 7 in matrix? It is not clear to me as discussed at the top of page 11.

2- Can you explain in a bit more detail why MLP suprisingly performs better than sequence-based models such as GRU?

Minor Comments:

1- Can you mathematically describe the evaluation metrics?

2- The number of provided references are enough. But, in terms of transfer learning and transformers on financial domain, can you cite the following paper as well?

a- Gezici, A.H.B. and Sefer, E., 2024. Deep transformer-based asset price and direction prediction. IEEE Access, 12, pp.24164-24178

Reviewer #2: 1. Strengthen Introduction Regarding Algorithmic Novelty:

While you effectively highlight the challenge of data limitations and the benefit of transfer learning, more explicitly state that existing transfer learning applications in CRISPR-Cas9 (e.g., DeepCRISTL, C-RNNCrispr) often lack a principled method for source dataset selection. This will better position your similarity-based pre-evaluation as the key innovation, rather than claiming novelty in the transfer learning algorithms themselves. Clarify that your contribution is in optimizing the transfer learning process through intelligent source selection, not in inventing new deep learning architectures for transfer.

2. Elaborate on Data Representation and Distance Metrics (Methods):

Theoretical Basis for Distance Metrics: In Section 1.7, beyond presenting the formulas, delve deeper into the theoretical reasons why cosine, Euclidean, and Manhattan distances might behave differently with your specific 7×L encoded sgRNA-DNA sequence pairs, which are flattened into 7L vectors. Discuss how each metric responds to features like point-wise mismatches, insertions, or deletions given your encoding strategy (e.g., cosine's sensitivity to direction in high-dimensional sparse spaces, Euclidean's sensitivity to magnitude, Manhattan's robustness to outliers). This will provide a more sophisticated understanding of why cosine distance ultimately performs best.

Normalization Details: Clearly specify the exact Min-Max normalization formula used for "1-normalized_average_distance" in Section 1.7 or a new "Data Processing" subsection. Explicitly state whether this normalization is applied per distance metric independently or across all metrics collectively. This is crucial for reproducibility and transparent interpretation of "similarity values" in Table 3.

3. Detailed Justification of Sampling and Bootstrapping Strategies (Methods):

Provide a clearer rationale for choosing a bootstrapped target dataset size of 250. Is this based on empirical observation of minimum samples needed for stable predictions, or a specific computational constraint? Similarly, justify the choice of n=5,000 iterations for random sampling within source datasets for distance calculation. Explain how these choices balance computational feasibility with the statistical representativeness of the similarity assessment, particularly in relation to the original large datasets.

4. In-depth Discussion of Similarity-Performance Link (Results and Discussion):

Interpreting Metric Discrepancies: While you note that Euclidean and Manhattan distances provide larger differences between recommended and non-recommended sources but cosine gives higher overall similarity, delve deeper into why this occurs. For instance, do Euclidean/Manhattan distances, being sensitive to absolute differences, better highlight stark dissimilarity, even if overall magnitude is less important for transferability than feature direction (captured by cosine)? Connect these observations more directly to the underlying biological or sequence characteristics.

Illustrating Negative Transfer Avoidance: Explicitly discuss how your similarity-based pre-selection strategy mitigates negative transfer. Can you point to instances where a source dataset, if chosen solely by size or availability (without similarity assessment), would have led to suboptimal or negative transfer, and how your method successfully avoided this? Providing even a hypothetical or qualitative example could strengthen this point.

Performance Beyond Similarity: While you effectively demonstrate that high cosine similarity often correlates with top model performance, also discuss cases where a source dataset with slightly lower cosine similarity might still yield competitive or even superior performance for specific models due to other factors (e.g., a richer representation of specific rare patterns important for the target task, or the specific model's inductive biases aligning better with that particular source's data distribution). This would add nuance to the "reliability" claim.

5. Refine Conclusion and Future Work:

Concrete Future Directions: Beyond proposing "more advanced similarity measures" or "adaptive transfer learning strategies", offer concrete examples. For instance, could your similarity framework be extended to incorporate biological contextual information (e.g., cell type, experimental conditions) beyond just sequence similarity? Could new deep learning architectures for domain adaptation (which explicitly address distribution shifts between source and target) be integrated with your similarity-based selection?

Broader Impact: Elaborate on the broader impact of your dual-layered framework. How specifically does it "streamline the transfer learning process"? Does it reduce trial-and-error, save computational resources, or enable faster development of CRISPR-Cas9 off-target prediction models for new experimental data?

6. Minor Revisions:

Figure Titles and Labels: Ensure consistency between figure titles and the text. For example, Figure 4's subplot titles refer to "distance" (e.g., "Cosine distance for CD33 dataset") , while the main text and Table 3 refer to "similarity" (which is 1 - normalized distance). Please unify this terminology to avoid confusion. Suggestion: "Cosine Similarity (1 - Normalized Distance) for CD33 dataset".

Reviewer #3: The work has written well and has significance merits but there are minor changes are required for acceptance.

1. Clarify what the term “discriminator score” refers to in the dual anomaly scoring mentioned in the abstract.

2. Rewrite the opening sentence of the introduction for brevity and improved clarity.

3. Reformat the literature review section into bullets or a table to clearly highlight key study comparisons.

4. Ensure consistent usage of the term “self-adaptive” throughout the manuscript.

5. Revise lengthy sentences for better grammar and readability, particularly in the introduction.

6. Add a summary paragraph highlighting similarities and differences among reviewed studies to contextualize SAD-GAN’s contribution.

7. Define and explain all mathematical symbols (α, β, G, D) at their first mention for clarity.

8. Elaborate on how alerts are triggered and communicated during anomaly detection.

9. Justify the use of non-overlapping time windows in Section 3.4 by comparing with alternatives like sliding windows.

10. Provide details on how static and dynamic thresholds are implemented in Section 4.5.

11. Clarify how the system enables autonomous anomaly detection and response without prior planning.

12. Include a discussion on the limitations or drawbacks of the proposed approach.

13. Expand the future work section by outlining 4–5 specific research directions.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: None

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Emre Sefer

Reviewer #2: No

Reviewer #3: Yes: surjeet dalal

Figure resubmission:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions.

Reproducibility:

To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Revision 1

Attachments
Attachment
Submitted filename: PLOS_CB_rebuttal.pdf
Decision Letter - Ilya Ioshikhes, Editor, Lun Hu, Editor

Dear Dr. Makarenkov,

We are pleased to inform you that your manuscript 'Similarity-based transfer learning with deep learning networks for accurate CRISPR-Cas9 off-target prediction' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Lun Hu

Academic Editor

PLOS Computational Biology

Ilya Ioshikhes

Section Editor

PLOS Computational Biology

***********************************************************

All reviewers were satisified with the changes made in this revision, and they have no further comments.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors have addressed all my comments, so I accept

Reviewer #2: The authors have carefully addressed all my previous comments. The revisions are adequate, and the additional explanations significantly improve the clarity and rigor of the manuscript. I believe the paper is now suitable for publication in its current form.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Emre Sefer

Reviewer #2: No

Formally Accepted
Acceptance Letter - Ilya Ioshikhes, Editor, Lun Hu, Editor

PCOMPBIOL-D-25-01030R1

Similarity-based transfer learning with deep learning networks for accurate CRISPR-Cas9 off-target prediction

Dear Dr Makarenkov,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofia Freund

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .