Peer Review History
| Original SubmissionFebruary 4, 2025 |
|---|
|
PGENETICS-D-25-00135 Unravelling epigenetic regulation of gene expression with explainable AI - a case study leveraging degron data PLOS Genetics Dear Dr. Chhatbar, Thank you for submitting your manuscript to PLOS Genetics. After careful consideration, we feel that it has merit but does not fully meet PLOS Genetics's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Jul 14 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosgenetics@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgenetics/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to any formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, Charles G. Danko, Ph.D. Guest Editor PLOS Genetics John Greally Section Editor PLOS Genetics Aimée Dudley Editor-in-Chief PLOS Genetics Anne Goriely Editor-in-Chief PLOS Genetics Additional Editor Comments : Our sincere apologies for the delay in assessing your manuscript. Journal Requirements: 1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full. At this stage, the following Authors/Authors require contributions: Kashyap Chhatbar, Adrian Bird, and Guido Sanguinetti. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form. The list of CRediT author contributions may be found here: https://journals.plos.org/plosgenetics/s/authorship#loc-author-contributions 2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 3) Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150-200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines: https://journals.plos.org/plosgenetics/s/submission-guidelines#loc-parts-of-a-submission 4) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/plosgenetics/s/figures 5) We notice that your supplementary Figures are included in the manuscript file. Please remove them and upload them with the file type 'Supporting Information'. Please ensure that each Supporting Information file has a legend listed in the manuscript after the references list. 6) Some material included in your submission may be copyrighted. According to PLOSu2019s copyright policy, authors who use figures or other material (e.g., graphics, clipart, maps) from another author or copyright holder must demonstrate or obtain permission to publish this material under the Creative Commons Attribution 4.0 International (CC BY 4.0) License used by PLOS journals. Please closely review the details of PLOSu2019s copyright requirements here: PLOS Licenses and Copyright. If you need to request permissions from a copyright holder, you may use PLOS's Copyright Content Permission form. Please respond directly to this email and provide any known details concerning your material's license terms and permissions required for reuse, even if you have not yet obtained copyright permissions or are unsure of your material's copyright compatibility. Once you have responded and addressed all other outstanding technical requirements, you may resubmit your manuscript within Editorial Manager. Potential Copyright Issues: i) Figure 5. Please confirm whether you drew the images / clip-art within the figure panels by hand. If you did not draw the images, please provide (a) a link to the source of the images or icons and their license / terms of use; or (b) written permission from the copyright holder to publish the images or icons under our CC BY 4.0 license. Alternatively, you may replace the images with open source alternatives. See these open source resources you may use to replace images / clip-art: - https://commons.wikimedia.org 7) Thank you for stating "The raw sequencing data used in this study were obtained from the Gene Expression Omnibus (GEO) and analyzed as described." Please note that your Data Availability Statement is currently missing the DOI/accession number of each dataset OR a direct link to access each dataset. If your manuscript is accepted for publication, you will be asked to provide these details on a very short timeline. We therefore suggest that you provide this information now, though we will not hold up the peer review process if you are unable. 8) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. 1) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)." 2) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 3) If any authors received a salary from any of your funders, please state which authors and which funders. 9) Please ensure that the funders and grant numbers match between the Financial Disclosure field and the Funding Information tab in your submission form. Note that the funders must be provided in the same order in both places as well. Currently, the funders are different in both locations. Reviewers' comments: Reviewer's Responses to Questions Reviewer #1: Chhatbar et al present an analysis of transcriptional regulation based on ChIP-seq data using methods of explainable AI in combination with perturbation data. The approach consists of learning models of RNA polymerase occupancy at promoters as the prediction target using the binding patterns of chromatin associated factors as features. These models are then interpreted using Shapley additive explanations (SHAP-values). The authors make use of the gene specific feature importance quantified by the SHAP values to prioritize the key regulators of transcription. By comparing the SHAP values between direct targets of specific chromatin regulators and other genes in perturbation experiments the authors attempt to infer causal relationships. This aspect of the study is very noteworthy. There are however a number of issues that need to be be clarified. The manuscript lacks depth in the description of the methods. For example: - how are promoter regions defined? - how exactly was the Pol2 signal per gene quantified? - Which gene annotations were used? - What was the architecture of the MLP model? - Which hyper parameters were searched, which one were the optimal combination? - how were the direct targets in the perturbation analyses defined? The manuscript lacks a comparison to other predictive modelling approaches (prediction of Pol2 from chromatin factors) that were carried out before. Notably, XGBoost has been used successfully and there is also a specific implementation of. SHAP for this algorithm. The manuscript heavily relies on the interpretation of the model. Therefore it is key to evaluate the ability of the model to generalise. The authors should demonstrate the ability of the model to make predictions in cell lines not used for training. How is the prediction performance on other cell lines? The advantage of using SHAP over other feature importance metrics should be demonstrated more clearly. For example: does it provide better results than using gini or similar metrics in random forest classifiers? The authors should also compare the their results to literature that uses graphical models (or partial correlations) to distinguish direct and indirect relations between chromatin regulators such as https://doi.org/10.1016/j.celrep.2016.01.008 or https://doi.org/10.1093/nar/gku1234 Figure 4A is not easy to interpret. What do the grey lines represent? Reviewer #2: In "Unravelling epigenetic regulation of gene expression with explainable AI - a case study leveraging degron data," Chhatbar and colleagues demonstrate how to successfully predict RNA Pol-II occupancy based on binding data of various epigenomic factors. The authors extensively use Shapley Additive Explanations (SHAP) values to interpret the biological underpinnings of their DNN-based predictions. Before publication, the authors should consider a couple of improvements and additional analyses. 1. Since this study primarily builds on the advantages of SHAP and because the readership of PLOS Genetics may not be familiar with Shapley values, I recommend including a paragraph explaining some of the fundamentals of SHAP theory. As the authors present a case study, such an introduction would be critical for all readers who want to conduct similar analyses. In particular, it would help to safeguard against problematic interpretations. 2. The interpretation of SHAP values can be problematic at times. This is particularly the case when the independence assumption is violated, for example, when using Lundberg and Lee. I understand that the marks investigated in the context of this study are, by design, dependent. While I acknowledge that this issue is inherent to the data, I want to encourage the authors to discuss this problem explicitly and explain how it should be addressed when interpreting the values. 3. There are some questions concerning the SHAP analysis. From the paper, the coalition sizes, i.e., the number of features, were somewhat limited. For instance, the analysis involving only GSE199805 used SET1A, ZC3H4, and H3K4me3 (M=3) to predict Pol-II occupancy. To what extent are these features correlated in the data? Is there a substantial difference between the correlations obtained from gene bodies and promoters? 4. It would be interesting to know how the overall Pol-II occupancy prediction performance changes for subsets of markers, e.g., when only using ZC3H4 and H3K4me3. Or even a single marker. 5. To appreciate the feature importance differences for target and non-target genes, i.e., SHAP's potential to infer causality, it would be critical to know which strategies and thresholds were used to make this distinction: How were the target genes defined precisely? How many targets and non-targets are there for the individual analyses? 6. I understand that the authors' GitHub repo probably contains all the information about the architecture, training procedure, thresholds, etc. Nevertheless, I recommend including essential information, such as the exact training strategy using "Scikit-learn's data-splitting functionality" (p. 11), e.g., to explicitly state how overfitting problems were addressed ... 7. Since MLPs are rather bloated, it would be interesting to see how a decision tree-based approach, e.g., random forest or gradient boosting, would work. Minor issues: - I may have missed it, but $|med(SHAP)|$ shown in several figures is not defined in the paper. - In Figure 1B, the test and train data points are not distinguishable. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: No: Not obvious from the manuscript where the data can be found Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility: To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols |
| Revision 1 |
|
Dear Dr Chhatbar, We are pleased to inform you that your manuscript entitled "Modelling transcription with explainable AI uncovers context-specific epigenetic gene regulation at promoters and gene bodies" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Charles G. Danko, Ph.D. Guest Editor PLOS Genetics John Greally Section Editor PLOS Genetics Aimée Dudley Editor-in-Chief PLOS Genetics Anne Goriely Editor-in-Chief PLOS Genetics BlueSky: @plos.bsky.social ---------------------------------------------------- Comments from the reviewers (if applicable): Congratulations! Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #2: The authors have addressed all my questions. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #2: None ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-25-00135R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. |
| Formally Accepted |
|
PGENETICS-D-25-00135R1 Modelling transcription with explainable AI uncovers context-specific epigenetic gene regulation at promoters and gene bodies Dear Dr Chhatbar, We are pleased to inform you that your manuscript entitled "Modelling transcription with explainable AI uncovers context-specific epigenetic gene regulation at promoters and gene bodies" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. For Research Articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .