Predicting Affinity Through Homology (PATH): Interpretable binding affinity prediction with persistent homology

Yuxi Long; Bruce R. Donald

doi:10.1371/journal.pcbi.1013216

Peer Review History

Original SubmissionFebruary 9, 2025
6 Apr 2025 Decision Letter - Jeffrey Skolnick, Editor, Nir Ben-Tal, Editor PCOMPBIOL-D-25-00260 Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology PLOS Computational Biology Dear Dr. Donald, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 30 days Jun 06 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript. Kind regards, Jeffrey Skolnick Academic Editor PLOS Computational Biology Nir Ben-Tal Section Editor PLOS Computational Biology Additional Editor Comments: This is a very interesting method for predicting binding affinities. Both the reviewers and we are enthusiastic about this work. Please address the reviewer comments in your revised version. Congratulations on a very nice contribution. Journal Requirements: 1) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 2) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 3) Please ensure that all Figure files have corresponding citations and legends within the manuscript. Currently, Figures 8, 9, and 10 in your submission file inventory do not have in-text citations. Please include the in-text citations of the figures. 4) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list. Please cite and label the supplementary tables and figures as “S1 Table” and “S2 Table,” "S1 Figure", S2 Figure" and so forth. 5) Some material included in your submission may be copyrighted. According to PLOSu2019s copyright policy, authors who use figures or other material (e.g., graphics, clipart, maps) from another author or copyright holder must demonstrate or obtain permission to publish this material under the Creative Commons Attribution 4.0 International (CC BY 4.0) License used by PLOS journals. Please closely review the details of PLOSu2019s copyright requirements here: PLOS Licenses and Copyright. If you need to request permissions from a copyright holder, you may use PLOS's Copyright Content Permission form. Please respond directly to this email and provide any known details concerning your material's license terms and permissions required for reuse, even if you have not yet obtained copyright permissions or are unsure of your material's copyright compatibility. Once you have responded and addressed all other outstanding technical requirements, you may resubmit your manuscript within Editorial Manager. Potential Copyright Issues: i) Figure 3. Please confirm whether you drew the images / clip-art within the figure panels by hand. If you did not draw the images, please provide (a) a link to the source of the images or icons and their license / terms of use; or (b) written permission from the copyright holder to publish the images or icons under our CC BY 4.0 license. Alternatively, you may replace the images with open source alternatives. See these open source resources you may use to replace images / clip-art: - https://commons.wikimedia.org - https://openclipart.org/. 6) Thank you for stating "The PDBBind dataset can be found at http://pdbbind.org.cn/. The BioLiP dataset (containing BindingDB and Binding MOAD) can be found at https://www.aideepmed.com/BioLiP/weekly.html. The DUD-E dataset can be found at https://dude.docking.org/." Please provide direct links to access each dataset. Reviewers' comments: Reviewer's Responses to Questions Reviewer #1: The authors present the development of a machine learning approach to predict binding affinities for molecular complexes based on their three-dimensional structures. The main algorithm is called PATH+, and its strength is not increased accuracy beyond existing methods, but rather increased generalizability across datasets and a potential for greater interpretability at comparable or better accuracy. PATH+ also has reduced computational complexity compared to other methods. The gains in these dimensions beyond accuracy are noteworthy, and it is important that algorithmic development in the field progress across all of these dimensions, and others as well. PATH+ makes use of advances in computational topology, including persistent homology, and develops useful persistence fingerprints. The authors also present a scoring algorithm PATH–, that distinguishes binders from non-binders. The paper is reasonably well written and reasonably clear. The suggestions below are made for the authors to consider and incorporate at their own discretion. (1) The standard sections of the paper are exceedingly short. By the end of page 12, one has completed abstract, author summary, introduction, results, and discussion. While some results are presented, much is only alluded to and a great deal is claimed but not shown at the level necessary to make a convincing case to the reader. As but one example, we are told that PATH+ reduces the 14,472 features of a previous method, TNet-BP, which also predicts binding affinity using persistent homology, to only 10 features in PATH+, but there is no real discussion of the differences and what insights we can gather from those differences. Too much of the paper is buried in the overly long methods section. (2) The claim is made that PATH+ is fully interpretable, but the point is made by argument and not effectively by illustration. We are told this is true because persistence fingerprints are readily interpretable in terms of atomic interactions, and that the decision-tree structure inherent in PATH+ leads to decision-tree-like constructs on atomic interactions, which are therefore interpretable (it reads like a proof). But the real question is whether the interpretations that result from machine learning correspond to those of structure biophysical chemistry, or whether they are something different? And if different, are they at least as useful? Two examples from the text, HIV and CA II, read much more like rationalizations than actual explanations that come from a complete analysis of the ML result. Working out examples in much more detail and much more critically than has been done here, would more meaningfully support the point that the interpretability of PATH+ can lead to trust, which otherwise remains to be seen. (3) The exact problem statement may be somewhere in the paper, but this reader didn’t find it clearly stated at the start. Are unbound structures the input, bound structures, or both? The answer appears to be bound structures, but is this a requirement? Much has been written about conformational changes on binding and the effects that this can have on binding energy. Does the use of bound structures limit the accuracy that can be achieved? (4) The authors repeated state that the persistence fingerprint encodes information about topology at different scales, but what about other biophysical features thought to be important for binding? What is encoded in a persistence fingerprint, what additional information could be added by learning over the data, and how does this correspond to what a physics-based modeling approach could capture? Conformational energy changes upon binding? Conformational entropy losses upon binding? Desolvation penalties? The hydrophobic effect? Hydrogen bonds? Quantum-mechanical effects? pH dependence of binding due to titration changes? Temperature dependence? How dependent is the algorithm on knowing accurate titration states? It appears that PATH+ uses a set of concepts that are not directly related to these biophysical concepts but that are none-the-less useful. The extent to which they can be related to biophysical explanations could be interesting to discuss, and might lead to suggestions for additional features for future investigation. The presentation here seems to claim too much and give short shrift to what are truly good questions. (5) The last sentence of the first paragraph should be reconsidered (“A reliable ranking of docking poses, based on affinity, potency, or other biophysical properties, is essential for accurate SBDD [69].”) The authors don’t seem to believe this based on their introduction of PATH–; they seem to realize that separating the vast majority of non-binders and weak binders from strong binders is valuable, and some reasonable ranking of strong binders is even better. But certainly correct ranking of the weak binders is not at all necessary. I point this out because the topic of the paper is important, and there are many places where a thoughtful argument is necessary to present important subtleties. (6) In the last sentence of the first paragraph of section 2.1, the word “predictions” should be “prediction” (7) The opposition distance is not defined until the methods section, although it is used earlier. It would be simple enough to say what it was when first used. Reviewer #2: The study introduces PATH (Predicting Affinity Through Homology), an interpretable algorithm for predicting protein-ligand binding affinity. It uses a new method to compute persistent homology features for protein-ligand complexes that is more computationally efficient than previous methods and independent of protein size. The algorithm uses internuclear persistent contours (IPCs) and persistence fingerprints to represent protein-ligand interactions. The SI details the persistent homology, construction of IPCs, and feature selection along with the benchmarking of PATH against previous binding affinity prediction algorithms. Overall, this manuscript is clear, concise, and well-written. The research topic is both interesting and relevant to PLOS Computational Biology. The results indicate that PATH performs comparably to other state-of-the-art methods while exhibiting less overfitting, and it also allows for interpretability. The source code for PATH is available in a GitHub repository. Below are some specific concerns: The persistence fingerprint used in the PATH algorithm consists of 10 features. While this reduction in dimensionality is presented as an advantage, does increasing the number of features to around 20, where the performance of trained GBRs saturates (see Fig. 14), enhance overall performance? Additionally, how does the computational time increase with the addition of more features? While the authors demonstrate the algorithm's performance on various datasets, such as PDBBind and BindingDB, it remains unclear how well the method generalizes to different protein families and ligand types. The authors provided a detailed comparison of PATH with other state-of-the-art approaches, but I did not find the algorithm's relative strengths and weaknesses compared to other deep learning-based affinity prediction methods. In SI: Fig. 14, the authors presented RMSE versus the number of remaining features, identifying 10 features as the persistence fingerprint with 100 estimators (trees). How these RMSEs vary with 13 estimators as mentioned in Fig.15. SI: Table 6; R-square values < -1 ? Please check the order of references, figures, and tables appearing in the main text. Reference in section heading: 4.2 The TNet-BP algorithm [13]; is not required. Reviewer #3: The authors descried a topology-based approach, PATH, in structure based drug affinity prediction. They claimed that, in contrast to the more popular deep learning or physics-based approaches, PATH is interpretable and can avoid over fitting and adaptable to new targets, and it runs faster than existing topology based approach too. I must admit that this is not an area that I am familiar, although I do dabble in the drug design area. I do not feel qualified to critique this work. I do have the following comments. First of all, is 120+ references necessary? and the numbering doesn’t follow the order they appear in the paper, which makes me think some of these texts may be taken from a thesis. Deep learning-based approach does suffer from lack of interpretability but there are ways to remedy this to identify residues or chemical groups that are important in binding. I felt the Results section is a little thin, as they only tested on one dataset and provided one comparison figure (Fig 4), and one case study HIV-1 protease. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013216.r001
Revision 1
3 Jun 2025 Author Response Attachments Attachment Submitted filename: response-to-reviewers.pdf https://doi.org/10.1371/journal.pcbi.1013216.r002
10 Jun 2025 Decision Letter - Jeffrey Skolnick, Editor, Nir Ben-Tal, Editor Dear Dr. Donald, We are pleased to inform you that your manuscript 'Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Jeffrey Skolnick Academic Editor PLOS Computational Biology Nir Ben-Tal Section Editor PLOS Computational Biology *********************************************************** This revised paper has thoughtfully and appropriately addressed the concerns and comments of the reviewers. This revised version certainly merits publication. https://doi.org/10.1371/journal.pcbi.1013216.r003
Formally Accepted
Acceptance Letter - Jeffrey Skolnick, Editor, Nir Ben-Tal, Editor PCOMPBIOL-D-25-00260R1 Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology Dear Dr Donald, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013216.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .