Enhancing ERα-targeted compound efficacy in breast cancer threapy with ExplainableAI and GeneticAlgorithm

Zeonlung Pun; Qiaoyun Xue; Yichi Zhang

doi:10.1371/journal.pone.0319673

Peer Review History

Original SubmissionNovember 9, 2024
9 Nov 2024 Author Response https://doi.org/10.1371/journal.pone.0319673.r001
29 Dec 2024 Decision Letter - Manikkam Rajalakshmi, Editor PONE-D-24-51400Enhancing ERα-Targeted Compound Efficacy in Breast Cancer Threapy with ExplainableAI and GeneticAlgorithmPLOS ONE Dear Dr. Pun, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Feb 12 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Manikkam Rajalakshmi Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service. The American Journal Experts (AJE) (https://www.aje.com/) is one such service that has extensive experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. Please note that having the manuscript copyedited by AJE or any other editing services does not guarantee selection for peer review or acceptance for publication. Upon resubmission, please provide the following: The name of the colleague or the details of the professional service that edited your manuscript A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a supporting information file) A clean copy of the edited manuscript (uploaded as the new manuscript file)” 3. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process. 4. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Authors proposed "Enhancing ERα-Targeted Compound Efficacy in Breast Cancer Threapy with ExplainableAI and GeneticAlgorithm" the structure of the article is well structured. but authors should follow the following comments. 1.Proofread the entire manuscript. 2. Draw a graphical abstract of your proposed approach 3. compare your approach with existing approaches. Reviewer #2: The manuscript entitled “Enhancing ERα-Targeted Compound Efficacy in Breast Cancer Threapy with ExplainableAI and GeneticAlgorithm” has many mistakes, authors need to rectify many portions. • Is the connection between ERα absence and decreased Cyclin D1, PCNA, and TGFβ levels accurately supported by the cited studies? • Is the term "traditional QSAR methods" appropriately defined, or should it be clarified for readers unfamiliar with QSAR modeling? Is it required IC50 or biological activity initially. • Please explain AI and Genetic Algorithms in the proposed workflow adequately? • Are the limitations of the discussed studies, such as over-reliance on specific software or lack of detailed criteria, presented accurately and objectively? • Is "question D of the 18th Chinese Graduate Mathematical Modeling Contest" the correct source of the dataset, or should it be clarified further? • Is the website link format correct and functional, as it lacks a clear structure (e.g., "www.shumo.com/wiki/doku.php")? • Are "5 key ADMET properties" adequately defined for the target audience? Should explanations of binary classifications (e.g., "1" or "0") be elaborated further for clarity? • What are the methods (SHAP and LassoNet) adequately introduced for a reader unfamiliar with these techniques, or should more context be provided about their use cases? • Are the machine learning models (e.g., MLP, SVM, LightGBM) described in sufficient detail for understanding their roles and differences in predicting bioactivity? • Should the evaluation metrics (e.g., R², MSE) for model performance be described in more detail to ensure clarity? Good Luck! ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Shahzaib Ahamad ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0319673.r002
Revision 1
6 Jan 2025 Author Response We sincerely thank the reviewers for their valuable time and effort in providing insightful and constructive feedback on our manuscript. Your comments and suggestions have significantly contributed to improving the clarity, rigor, and overall quality of our work. We deeply appreciate your dedication and expertise, which have helped us refine our study and ensure its alignment with the highest academic standards. Once again, we are truly grateful for your thoughtful review and support. Response to reviewer 1 Q: Proofread the entire manuscript. A: Thank you for the reviewer’s suggestion. In response, we have carefully proofread the entire manuscript to ensure clarity, consistency, and accuracy. We have also updated the LaTeX template to comply with the PLOS One format. Additionally, we have thoroughly checked our results and ensured the use of unified terminology throughout the manuscript to improve readability and professionalism. Q: Draw a graphical abstract of your proposed approach A: Thank you for the reviewer’s suggestion. In response, we have provided a flowchart (Figure 1) that illustrates our proposed workflow, which serves as a graphical abstract. The flowchart outlines the four main components of our workflow: (1) molecular descriptor selection using SHAP and LassoNet, (2) bioactivity prediction with the LightGBM model, (3) ADMET property prediction, and (4) identification of the optimal compound using a genetic algorithm. We believe this visual representation provides a clear and concise overview of our approach. Q: compare your approach with existing approaches. A: Thank you for the reviewer’s insightful suggestion. We have compared our proposed workflow with existing approaches, drawing from the following key literature: 1, Wu J, Kong L, Yi M, Chen Q, Cheng Z, Zuo H, et al. Prediction and screening model for products based on fusion regression and xgboost classification. Computational Intelligence and Neuroscience. 2022 2, Singh K, Ghosh I, Jayaprakash V, Jayapalan S. Building a ML-based QSAR model for predicting the bioactivity of therapeutically active drug class with imidazole scaffold. European Journal of Medicinal Chemistry Reports. 2024;11:100148. 3, Jhanwar B, Sharma V, Singla R, Shrivastava B. QSAR-Hansch analysis and related approaches in drug design. Pharmacologyonline. 2011;1:306–344. 4, Daoui O, Elkhattabi S, Chtita S, Elkhalabi R, Zgou H, Benjelloun AT. QSAR, molecular docking and ADMET properties in silico studies of novel 4, 5, 6, 7-tetrahydrobenzo [D]-thiazol-2-Yl derivatives derived from dimedone as potent anti-tumor agents through inhibition of C-Met receptor tyrosine kinase. Heliyon. 2021;7(7). We have provided a brief introduction to these existing methods in the "Prediction of Bioactivity" section and presented the comparative results in the "Comparison of Different Existing Methods" section. The results demonstrate that our proposed workflow outperforms the existing methods by effectively utilizing variable selection techniques to identify the most important descriptors. By integrating these techniques with the LightGBM model, our workflow achieves superior performance, attaining an R-square value of 77.2%. Response to reviewer 2 Q:Is "question D of the 18th Chinese Graduate Mathematical Modeling Contest" the correct source of the dataset, or should it be clarified further? Is the website link format correct and functional, as it lacks a clear structure (e.g., "www.shumo.com/wiki/doku.php")? A: Thank you for your question regarding the source of our dataset. The dataset used in this study is publicly available and was provided by the China Association for Science and Technology. It was originally made available as part of a data mining competition organized by the Ministry of Education of China. We have corrected the original URL and ensured its accuracy: https://www.shumo.com/wiki/doku.php?id=%E7%AC%AC%E5%8D%81%E5%85%AB%E5%B1%8A_2021_%E5%85%A8%E5%9B%BD%E7%A0%94%E7%A9%B6%E7%94%9F%E6%95%B0%E5%AD%A6%E5%BB%BA%E6%A8%A1%E7%AB%9E%E8%B5%9B_npmcm_%E8%AF%95%E9%A2%98 For the convenience of readers, we have also made all raw data and code available on GitHub, with the corresponding link provided in the Appendix. Q: Is the connection between ERα absence and decreased Cyclin D1, PCNA, and TGFβ levels accurately supported by the cited studies? A:Thank you for the reviewer’s question. Upon review, we acknowledge that the statement regarding the connection between ERα absence and decreased Cyclin D1, PCNA, and TGFβ levels is not directly supported by the cited studies in our paper. This was an oversight on our part. Our intention was to highlight the critical role of ERα in the development of breast cancer therapies, but this particular statement was not relevant to our argument. Therefore, we have removed this statement from the manuscript to better align the discussion with the core focus of our research. Q: Is the term "traditional QSAR methods" appropriately defined, or should it be clarified for readers unfamiliar with QSAR modeling? Is it required IC50 or biological activity initially. A: Thank you for the reviewer’s suggestion. The term "traditional QSAR methods" refers primarily to the early approaches that used statistical models, particularly linear models, to correlate molecular structure with biological activity for quantitative analysis. These methods were developed to address the limitations of subjective judgments based solely on chemists' experience. To clarify this for readers, we have added additional details to the Introduction section and introduced the classic Hansch model as an example. Additionally, we have provided a brief explanation of IC50 in the Introduction section to ensure clarity and logical flow. Q: Please explain AI and Genetic Algorithms in the proposed workflow adequately? A: Thank you for the reviewer’s insightful comment. In response, we have added detailed explanations of AI and Genetic Algorithms, as well as their integration into the proposed workflow, at the beginning of the Methods section. Below is the explanation we have included: In this section, we propose a comprehensive workflow that integrates machine learning models and Genetic Algorithms to assess the importance of each Molecular Descriptor, predict the bioactivity and ADMET properties of each component, and identify the most promising compounds for developing drugs for breast cancer therapy. Compared to traditional QSAR linear models, machine learning models excel in their ability to capture complex, nonlinear relationships between Molecular Descriptors and the bioactivity of each component, thereby achieving higher predictive accuracy. Additionally, the use of explainable AI algorithms enables us to interpret the prediction process, providing insights that break the ‘black box’ nature of machine learning models and enhancing their reliability in drug discovery. The Genetic Algorithm plays a crucial role in this workflow, as it allows the optimization process to consider both bioactivity and ADMET properties simultaneously. By combining the predictions of bioactivity and ADMET models, Genetic Algorithms ensures that the optimal potential compounds identified are not only highly bioactive but also exhibit favorable ADMET profiles, making them more suitable for drug development. We hope this explanation adequately addresses the reviewer's question. Q: Are the limitations of the discussed studies, such as over-reliance on specific software or lack of detailed criteria, presented accurately and objectively? A: Thank you for the reviewer’s insightful question. In response, we have expanded the discussion of limitations in the Conclusion section to address this point more thoroughly. Specifically, we added two key aspects: 1, Experimental Verification: We emphasized that the results obtained through data mining using machine learning must be experimentally verified to ensure their relevance and applicability in drug development. Without experimental validation, the utility of these predictive models may be limited. 2, Framework Limitation: We highlighted that this study relies solely on molecular descriptors for prediction, which falls within the framework of 1D-QSAR. While 1D-QSAR is effective to a certain extent, it does not account for molecular bonding relationships, such as structural and spatial interactions, which are critical in understanding molecular properties. This limitation may impact the predictive accuracy and generalization of the models. Future studies could incorporate higher-dimensional QSAR frameworks, such as 2D- or 3D-QSAR, to address this issue and improve the robustness of the models. We believe these additions enhance the objectivity and accuracy of our discussion on the limitations of the study. Please let us know if further clarification or additional detail is required. Q: Are "5 key ADMET properties" adequately defined for the target audience? Should explanations of binary classifications (e.g., "1" or "0") be elaborated further for clarity? A: Thank you for your valuable feedback. We appreciate your suggestion to ensure that the 5 key ADMET properties and their binary classifications are clearly defined for the target audience. In response, we have revised the corresponding Dataset section of our manuscript to provide detailed definitions for each ADMET property and to elaborate further on the binary classification method. Below is a summary of the changes made: 1, We have clarified the biological significance of each ADMET property: Caco-2 permeability (Y1): Measures the permeability of a compound across small intestinal epithelial cells. CYP3A4 metabolism (Y2): Assesses the metabolizability of the compound by CYP3A4, a key metabolic enzyme. hERG channel interaction (Y3): Evaluates the cardiotoxic potential of a compound. Human Oral Bioavailability (HOB, Y4): Represents the proportion of the drug absorbed into systemic circulation when administered orally. Mutagenicity (MN, Y5): Assesses genotoxic potential using the micronucleus test. 2, For each property, we explicitly explained the binary classification system (1 or 0) and its interpretation. For example: A '1' in Caco-2 permeability indicates good permeability, while a '0' indicates poor permeability. Similarly, a '1' in hERG channel interaction represents cardiotoxic potential, while a '0' indicates no cardiotoxicity. By elaborating on these properties and their classification, we aim to make the manuscript more accessible and clear for a broader audience, ensuring that the ADMET properties are well-defined and their importance in drug development is evident. These updates are reflected in the revised manuscript under the Dataset section. We hope this addresses your concern and provides sufficient clarity for the target audience. Thank you again for your thoughtful and constructive feedback. Q: What are the methods (SHAP and LassoNet) adequately introduced for a reader unfamiliar with these techniques, or should more context be provided about their use cases? A: Thank you for your thoughtful comments and suggestions. We have revised the Descriptor Selection With explainable AI model section to provide a more detailed introduction to SHAP and LassoNet for readers unfamiliar with these techniques. Specifically: 1, We have elaborated on SHAP's origins in cooperative game theory and its use in assessing individual feature importance. 2, We have expanded the explanation of LassoNet, highlighting its hybrid approach combining neural networks and LASSO regression, as well as its skip layer mechanism for non-linear feature selection. 3, Additionally, we clarified how these methods complement each other, with SHAP focusing on individual descriptor importance and LassoNet addressing group-level interactions, both of which enhance the robustness and interpretability of our models. We believe this will make the manuscript more accessible and informative for the target audience. Thank you again for your valuable feedback. Q: Are the machine learning models (e.g., MLP, SVM, LightGBM) described in sufficient detail for understanding their roles and differences in predicting bioactivity? A: Thank you for the reviewer’s insightful suggestion. In response, we have added a concise introduction to the machine learning models used in this study, including MLP, SVM, XGBoost, RF, and LightGBM, in the Methods section. This addition provides a brief overview of their strengths, limitations, and suitability for handling molecular descriptor data in the context of bioactivity prediction. Furthermore, we have compared the performance of these models in the Prediction of Bioactivity section, analyzing their respective outcomes in detail. Based on this comparative analysis, we have explained the rationale for selecting the LightGBM (LGB) model as the final predictive tool for bioactivity, emphasizing its superior performance, computational efficiency, and suitability for the dataset. We believe these revisions provide sufficient detail to help readers understand the roles and differences of the machine learning models in our study. Thank you again for your valuable feedback. Q: Should the evaluation metrics (e.g., R², MSE) for model performance be described in more detail to ensure clarity? A: Thank you for the reviewer’s question. In response, we have expanded the Evaluation Metrics subsection under the Results section to provide a more comprehensive interpretation of the metrics used in this study. Specifically, the evaluation metrics are divided into two categories: those used for regression models (R-square, MSE) and those used for classification models (accuracy, ROC-AUC). For each metric, we now include detailed explanations of its formula, significance, and relevance to the evaluation of model performance. These additions aim to improve clarity and ensure that readers can fully understand the purpose and importance of each metric in assessing the predictive performance of the machine learning models. Attachments Attachment Submitted filename: Response to Reviewers.pdf https://doi.org/10.1371/journal.pone.0319673.r003
6 Feb 2025 Decision Letter - Manikkam Rajalakshmi, Editor Enhancing ERα-Targeted Compound Efficacy in Breast Cancer Threapy with ExplainableAI and GeneticAlgorithm PONE-D-24-51400R1 Dear ZeonLung Pun We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Manikkam Rajalakshmi Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: https://doi.org/10.1371/journal.pone.0319673.r004
Formally Accepted
Acceptance Letter - Manikkam Rajalakshmi, Editor PONE-D-24-51400R1 PLOS ONE Dear Dr. Pun, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Manikkam Rajalakshmi Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0319673.r005

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .