Peer Review History

Original SubmissionOctober 31, 2024
Decision Letter - Yang Lu, Editor

PCOMPBIOL-D-24-01884

Zimin patterns in genomes

PLOS Computational Biology

Dear Dr. Georgakopoulos-Soares,

Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 60 days Mar 03 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter

We look forward to receiving your revised manuscript.

Kind regards,

Yang Lu, Ph.D.

Academic Editor

PLOS Computational Biology

Jian Ma

Section Editor

PLOS Computational Biology

Journal Requirements:

1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full.

At this stage, the following Authors/Authors require contributions: Nikol Chantzi, Ioannis Mouratidis, and Ilias Georgakopoulos-Soares. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form.

The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions

2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019.

3) Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150-200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines:

https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission

4) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines:

https://journals.plos.org/ploscompbiol/s/figures

5) We notice that your supplementary Figures, and Tables are included in the manuscript file. Please remove them and upload them with the file type 'Supporting Information'. Please ensure that each Supporting Information file has a legend listed in the manuscript after the references list.

6) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well.

- State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)."

- State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.".

If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In this work, the authors aimed to investigate Zimin words (sequences with identical prefix and suffix) in genomic sequences, specifically exploring "Zimin avoidmers" - sequences that do not contain any Zimin patterns - across multiple organisms. The authors analyzed the Telomere-to-Telomere (T2T) complete human genome as well as other eight model organisms for Zimin words and Zimin avoidmers. In human genome, the authors found that all k-mers above 105 base-pairs contain Zimin words. The Zimin avoidmers have an inhomogeneous distribution in the human genome, and are most enriched in coding sequence (CDS) and human satellite 1 regions. Furthermore, the authors found Zimin avoidmers are associated with loci that have increased sequence diversity in the human genome, but they have a lower insertion and deletion mutation rates. Lastly, the authors identified Zimin avoidmers across model organism genomes and found that Zimin avoidmers are inhomogeneously distributed across model organismal genomes.

The study is well organized and the manuscript is well written. While to study the question of Zimin words in genomes is interesting, I have several concerns and suggestions relating to the biological implications and applications the study.

Major Comments:

1. While the study provides a novel computational approach to analyzing genomic sequences, the broader biological implications are not fully explored. The discussion section hints at potential connections to non-B DNA conformations but it would be better if the authors could expand the discussion on potential applications of Zimin words to synthetic biology and genome studies.

2. The authors note a significant difference between the theoretical maximum Zimin word length and their observed maximum in all the model organisms. While they attribute this to the finite genome size and genome repetitiveness, a more in-depth mathematical and biological explanations of why this discrepancy occurs through mutation process and evolution would strengthen the manuscript.

Minor Comments:

1. Figure 5 is blur and low resolution.

2. Figure 5 legends, there are two B panels.

Reviewer #2: This manuscript investigates the Zimin pattern—where words have identical prefixes and suffixes—in the set of k-mers from the telomere-to-telomere (T2T) human reference genome, as well as reference genomes of eight other model organisms. The authors first explored the pattern of Zimin avoidmers, which are sequences that avoid the Zimin Zn pattern in the T2T genome. They concluded that for k-mer lengths exceeding 104 bp, every k-mer contains a Zimin word. Zimin avoidmers are enriched in coding regions and HSAT1B repetitive regions, and there is a negative correlation between k-mer diversity and the density of the Zimin pattern.

Next, the authors conducted simulations to examine the invariance of Zimin sequences to the introduction of indels and single nucleotide substitutions. They observed a strong enrichment or depletion of Zimin patterns around indels but not substitutions, likely due to nearby short tandem repeats. Finally, they compared Zimin avoidmers across the genomes of model organisms and note that the density of Zimin avoidmers is higher in prokaryotic genomes than in eukaryotic genomes.

Overall, the manuscript is well-written, and the analyses are robust. Below are some minor suggestions and general questions:

1. It would strengthen the manuscript to justify the examination of Zimin patterns in genomes by highlighting the repetitive nature of genomic sequences and presenting a specific hypothesis, rather than stating that “Zimin words have never been examined in bioinformatics.”

2. Figure 3 analysis: Please provide a justification for focusing solely on 8-mers in the analysis presented in Figure 3.

3. In Figure 4C, it would be beneficial to include a comparison of short tandem repeat enrichment in regions that are not Zimin avoidmers.

4. Zimin Words in the Human Genome: What is the pattern of Zimin words (as opposed to Zimin avoidmers) in the human genome?

5. Reverse Complements of Zimin Sequences: Similarly, what is the pattern of the reverse complements of Zimin sequences? Exploring it may further elucidate the characteristics of Zimin patterns but not necessarily to address.

Reviewer #3: The manuscript introduces the concept of Zimin avoidmers and examines their distribution and enrichment across the human reference genome and eight additional model organisms. Key findings include the inhomogeneous distribution of avoidmers, enrichment in coding regions and satellite sequences, and their association with sequence diversity, suggesting potential applications in understanding genomic sequence organization. While the manuscript presents an interesting mathematical concept applied to genomics, several significant concerns prevent me from recommending this submission for acceptance in its current form:

1. While the manuscript introduces an interesting theoretical concept’s applicaiton, it does not sufficiently explore its biological implications or potential applications, leaving the findings somewhat isolated from practical utility for computational biology. Without biological (either from functional or evolutionary) interpretation, this manuscript might be more suitable for journals focusing on computer science or natural language processing.

2. The study assumes that exact matches to Zimin words are meaningful, but this premise lacks biological plausibility. In genomic contexts, spontaneous mutations happen on the genome with certain rate, motifs or secondary structures like hairpin loops are typically error-tolerant and allow for mismatches or variations to some degree. Thus, focusing solely on exact matches to Zimin words is unrealistic and limits the biological relevance of the findings.

3. Statistical rigor is insufficient; there is no statistical significance reported throughout this manuscript. Claims regarding avoidmer distributions, enrichment in genomic compartments, and comparisons across organisms are made without appropriate statistical testing or measures of significance, undermining the reliability of the conclusions.

4. In the section "In the reference human genome every k-mer contains a Zimin word after 104 base-pairs," the statement "with a maximum Zimin avoidmer length of 104 bp emerging on chromosome 17" is inconsistent with Figure 1d, which indicates that chromosome 7 has the maximum avoidmer length.

5. In the section "In the reference human genome every k-mer contains a Zimin word after 104 base-pairs," the authors conclude that "Zimin avoidmers are significantly more GC-rich than the genome average." However, this conclusion should be drawn by comparing the GC content of Zimin avoidmers to that of k-mers containing Zimin words of the same specific length within the genome. Additionally, the authors should provide statistical evidence to support the significance of this difference.

6. In the section "Zimin avoidmers are inhomogeneously distributed across model organismal genomes," the authors claim that Zimin avoidmers are most enriched in genic and CDS regions of the genome. However, this claim is problematic because genic and CDS regions represent only a small fraction of the entire genome. The analysis does not account for the inherent bias introduced by sequence length, as longer sequences are inherently more likely to contain Zimin words due to their size. To make this claim more robust, the authors must normalize for sequence length to accurately assess enrichment and eliminate biases arising from differences in genomic region sizes.

7. In the section “In silico saturation and germline mutagenesis of Zimin avoidmers in the human genome,” the indels are constructed with a specific length of 1 (single-base insertion or deletion), which represents a highly idealized scenario. We suggest extending the analysis to include indels of varying lengths to better reflect real-world genomic mutational processes and evaluate whether the conclusions hold under more biologically realistic conditions.

8. Many of the parameter choices in the study require justification. It is important to assess whether the conclusions remain consistent when varying the bin window length of the genome. Exploring the validity of the findings under different parameter settings is necessary to ensure the robustness and generalizability of the results.

9. The introns, exons, and CDS regions of a gene are not consecutive but rather scattered as separate segments across the genome. It is unclear how Zimin avoidmers are calculated for these fragmented regions. Does the analysis involve concatenating these segments into a single sequence, or are avoidmers identified separately within each segment? This methodological detail is crucial and should be explicitly described to clarify the approach used for Zimin avoidmer identification in these regions.

10. Figures (such as Figure 2) are not aligned between the subplots, and the writing should also be improved.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

Figure resubmission:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions.

Reproducibility:

?>

Revision 1

Attachments
Attachment
Submitted filename: Response_to_reviewers_Zimin_Avoidmers.docx
Decision Letter - Yang Lu, Editor

PCOMPBIOL-D-24-01884R1

Zimin patterns in genomes

PLOS Computational Biology

Dear Dr. Georgakopoulos-Soares,

Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 30 days Nov 19 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Yang Lu, Ph.D.

Academic Editor

PLOS Computational Biology

Jian Ma

Section Editor

PLOS Computational Biology

Additional Editor Comments (if provided):

Reviewer #1:

Journal Requirements:

1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full.

At this stage, the following Authors/Authors require contributions: Nikol Chantzi, Ioannis Mouratidis, and Ilias Georgakopoulos-Soares. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form.

The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions

2) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list.

3) Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well.

- State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)."

- State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.".

If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Review comments (Remarks to the authors)

Thanks to the authors for their detailed and comprehensive additional analysis to address my comments. Most of my comments have been addressed in this revision. I just have a few more comments based on the authors’ response.

Major:

1.The authors addressed my second comment on mathematical and biological explanations of the Zimin avoidmer lengths. I like the author’s analysis on random permutations of the studied genomes for the mathematical explanations. For biological explanations, in additional to the genome complexity, I’d like to see if the Zimin avoidmer pattern is associated with mutation rates (there are multiple mutation rate models, for example, the authors could find the mutation rates in human genome here: https://www.nature.com/articles/s41588-023-01562-0) or the genome constraint in human genome (for example, the gnocchi score, https://www.nature.com/articles/s41586-023-06045-0). Such analysis would be interesting to broader readers, especially in the human genetics field.

Minor:

1.Figure 5A and 5B captions are mis-matched.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

Figure resubmission:

Reproducibility:

To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Revision 2

Attachments
Attachment
Submitted filename: RTR 2 Zimin.docx
Decision Letter - Yang Lu, Editor

Dear Dr. Georgakopoulos-Soares,

We are pleased to inform you that your manuscript 'Zimin patterns in genomes' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology.

Best regards,

Yang Lu, Ph.D.

Academic Editor

PLOS Computational Biology

Jian Ma

Section Editor

PLOS Computational Biology

***********************************************************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Thanks to the authors for addressing my comments on expanding the biological explanations for Zimin avoidermers using mutation rate models. The manuscript is now well written and comprehensive.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Formally Accepted
Acceptance Letter - Yang Lu, Editor

PCOMPBIOL-D-24-01884R2

Zimin patterns in genomes

Dear Dr Georgakopoulos-Soares,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Aiswarya Satheesan

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .