Kinome inhibition states and multiomics data enable prediction of cell viability in diverse cancer types

Matthew E. Berginski; Chinmaya U. Joisa; Brian T. Golitz; Shawn M. Gomez

doi:10.1371/journal.pcbi.1010888

Peer Review History

Original SubmissionMay 23, 2022
3 Aug 2022 Decision Letter - Feilim Mac Gabhann, Editor, Inna Lavrik, Editor Dear Professor Gomez, Thank you very much for submitting your manuscript "Kinome Inhibition States and Multiomics Data Enable Prediction of Cell Viability in Diverse Cancer Types" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Inna Lavrik Associate Editor PLOS Computational Biology Feilim Mac Gabhann Editor-in-Chief PLOS Computational Biology ********************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The recent development of cancer treatment emphasizes targeted therapies. Protein kinases represent one of the largest classes of anti-cancer targeted therapies. Therefore, uncovering the correlation between kinome activity and cancer cell viability via regression model could provide a list of potent drugs based on predicted cell viability value ahead of treatment and infer potential key kinases in regulating cell viability. Berginsiki et al set out to determine whether a combination of kinase inhibition profiles at multiple doses and gene expression could predict the effect of kinase inhibitors on cancer cell viability. They integrated kinase inhibition profile data and gene expression data of different cancer cell lines to build a random forest model to predict cell viability. The final model achieved R2 value as 0.79 in 10-fold cross-validation. This manuscript compared linear and non-linear models, utilized feature ranking via Pearson’s correlation, and introduced data from various aspects like copy number variation and proteomics as additional input for model training. Besides developing the model, this manuscript showed some insights from the model itself, pointing out that most top-ranking features are from kinase inhibition data and that genes selected as model features have a high possibility of interacting with kinase. This manuscipt also considered the experimental variation and evaluated the difference between the original PRISM value and lab assay result before validating the model results on new compound–cell line combinations. Major comments 1. The conceptual bias of training data. The kinase inhibition profiles comes from Klaeger et al, whose data is derived from a mixture of four cell lines, while cell viability and gene expression data (PRISM) are from a panel of over 400 cancer cell lines. The combination of kinase inhibition state from a mixture of cell lines and cell line-specific gene expression data might not be a proper input to train a model predicting individual cell line viability. Have the authors tried to use kinase inhibition profiles from gold standard radioactive assays which could avoid in vivo bias due to cell line variation? For example, GSK kinase inhibitor profiling or HMS LINCS profiling done at multiple doses. 2. Lack of innovation from a modeling perspective. To build the model, Berginsiki et al applied linear regression, random forest, and XGBoost with a limited parameter tuning process. Introducing a neural network with different transformers might improve the performance for large-scale multi-dimensional data. To optimize the random forest model, there should be parameters other than tree number to adjust, like subsample size, max leave, depth and child weight, etc. Show optimization of RF. Is it possible to make some more improvements to this model if including these features in parameter tuning, such as using grid search or randomized search for 1000 iterations or so? 3. I applaud the author's effort to examine the difference between original PRISM data and lab assay results, and there exists a similar trend between model prediction curves and lab assay results, however, R2 score of 0.5 is merely reasonable. Moreover, this comparison is based on the assay results and the model prediction values, which could not infer how the model might perform on the original PRISM data, given the observed difference between PRISM value and assay result. Reviewer #2: Summary: In this study, the authors develop a set of predictive models for cell viability across varying concentrations of kinome inhibitors, using kinome "inhibition states" as predictive features. The work builds on a 2017 paper by Klaeger et al. which profiled 243 kinase inhibitors to identify potential off-target/polypharmacology effects. This study integrates the kinome profiles from Klaeger et al. with -omics data from CCLE and PRISM, showing that kinome profiles tend to be more informative in viability prediction than baseline gene expression values, and additionally that kinome profiles and gene expression can be combined to produce slightly better models than either data type alone. The authors validate their modeling strategy on held-out kinase inhibitors and cell lines, showing that for a subset of breast cancer cell lines their models tend to generalize well. The paper is clear and well-written, and the experiments provide compelling evidence that kinome inhibition profiles contain predictive signal for cell viability screening of kinase inhibitors. The validation on breast cancer cell lines and the predictions across all of PRISM will serve as a useful resource, and the GitHub repository containing the code is well-documented and easy to navigate. However, there are some aspects that could be explored further to ensure the robustness of the study's main conclusions, particularly with relevance to existing work in viability prediction. Major comments: 1. The conclusions of the authors in this study seem slightly at odds with previous work, particularly Dempster et al. 2020 (https://doi.org/10.1101/2020.02.21.959627) which found that baseline expression was generally effective for predicting cell viability for various types of perturbations, moreso than genomic profiles. Importantly, the studies are looking at different datasets and drug classes: the Dempster et al. study looked at a broader set of compounds rather than just kinase inhibitors, and at GDSC and several other datasets in addition to PRISM. However, the application to cell viability and overall conclusions of both studies still seem quite related. Given the seemingly contradictory conclusions of these two studies, I think the question I would be most interested in is whether this is a biological difference (i.e. kinase inhibitors behave differently than the larger set of compounds as a whole), or whether a difference in experimental design is causing the discrepancy (e.g. model setup, label definition, cross-validation splitting strategy, etc, all of which have subtle differences between the two studies). Glancing through the methods of both papers I noticed a few differences - perhaps most notably the Dempster et al. study uses only the PRISM dose with the most variance over cell lines for each compound, while the present study looks at multiple doses as described in the Methods section and in Klaeger et al. It would be informative if the authors tried running their gene expression pipeline on their set of kinase inhibitors with the labeling strategy (selecting a single representative dose) described in Dempster et al. - if the results are unchanged, it would inspire more confidence that the set of kinase inhibitors studied in this paper are indeed biologically different from the rest of the compounds tested in the Dempster et al. study, rather than reflecting an experimental design difference. If the results are different, this would also be a noteworthy outcome - it would support the idea that looking at multiple doses, rather than picking a single representative dose, is a necessary/important difference between the studies. 2. Related to the first point, the example of BRAF inhibitors in Figure 6 of the Dempster et al. paper (dabrafenib and vemurafenib) could be a useful positive control, or example to use to examine the differences between the studies. Their data suggest that baseline RXRG expression, particularly in melanoma cell lines, is a strong univariate predictor of response to dabrafenib and vemurafenib - is that still the case for the authors' pipeline (i.e. are BRAF inhibitors a class of drugs where gene expression is particularly effective, and the poor performance shown in Figure 4 is mostly driven by other cell lines or kinase inhibitors), or are the results different when the labels for multiple doses are introduced? Does the authors' multivariate gene expression-based model perform well for BRAF inhibitors? 3. Many of the figures in the paper (e.g. Figure 4, Figure 5) show results/performance metrics summarized across all cell lines and compounds in the dataset. This makes sense for a summary figure, but it seems to me that this could potentially be obscuring variation between cell lines or between compounds. For instance, despite the fact that gene expression does not perform well for viability prediction in general (shown in Figure 4), are there any compounds or any cell lines/tissues of origin where gene expression does outperform kinase inhibition data? If so, is there any biological interpretation for the variation? If this is not the case (i.e. gene expression performs uniformly poorly across cell lines and tissues of origin, and/or kinase inhibition profiles perform uniformly well) this would also be interesting to note. 4. Overall the paper is very clear and well-written. However, as a reader, I would appreciate a bit more clarification earlier on in the paper about exactly what a "kinome inhibition profile" is in the introduction/summary figure since this is central to understanding the paper, especially given that this is a general computational biology journal with readers from diverse fields. Reading through the Klaeger et al. paper, this quote seems like a good summary of why kinome inhibition profiles are useful: "Owing to the fact many compounds target the structurally and functionally conserved adenosine 5′-triphosphate (ATP)–binding site, polypharmacology (that is, drugs that act on more than one target) is commonly observed. Target promiscuity may have advantageous or detrimental therapeutic consequences." The idea seems to be that kinase inhibitors can be either targeted or more general, but almost all of them have some "off-target" or polypharmacology effects. Knowing the empirical profile of kinase inhibition for a drug, then, can give you more information than just knowing what it's designed to target. It would be good to put some form of this summary in one of the introductory paragraphs, and/or to explicitly mention exactly what the model features represent (my understanding is that the input is a drug/perturbation x kinase matrix, where each entry quantifies binding/drug interaction strength for the given drug and kinase) since many readers might not be immediately familiar with the data format, or with the observation that kinase inhibitors do not necessarily bind specifically to their intended targets. Minor comments: 5. The authors should be more specific about what form of linear regression is being used in the Methods section. Does the tidymodels default use any form of regularization (e.g. a ridge/LASSO/elastic net penalty), or is it an ordinary least squares fit? It may also be good to describe the default hyperparameter selection strategy for the random forest and XGBoost models - does tidymodels test a variety of parameters using some kind of internal cross-validation, or does it choose a single set of default parameters? Is there any particular reason the authors chose to tune the number of trees in the random forest model and not any other hyperparameters? Hyperparameter selection details can have a considerable effect on performance, at least in my experience, and it is useful to be clear about what was explored/settled on for the final models and why. 6. Methods section, page 18 under "STRING": Ensemble -> Ensembl ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1010888.r001
Revision 1
22 Aug 2022 Author Response Attachments Attachment Submitted filename: ResponseToReviewers.pdf https://doi.org/10.1371/journal.pcbi.1010888.r002
20 Jan 2023 Decision Letter - Feilim Mac Gabhann, Editor, Inna Lavrik, Editor Dear Professor Gomez, We are pleased to inform you that your manuscript 'Kinome Inhibition States and Multiomics Data Enable Prediction of Cell Viability in Diverse Cancer Types' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Inna Lavrik Academic Editor PLOS Computational Biology Feilim Mac Gabhann Editor-in-Chief PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment.** Reviewer #2: The comments from my previous review have been addressed - thank you for the great work. In particular, the experiments showing that kinase inhibitor profiles still provide the most predictive signal even when only a single representative (highest variability) dose is used emphasize the robustness of the results, and the added analyses exploring per-compound and per-cell line performance variation is another interesting aspect of the revised paper. I have a few small comments/suggestions on the revised version of the manuscript: 1. One aspect of the response to reviewers that stood out to me was this: "We think of the kinase inhibitor profiles as providing a cell-line agnostic initial “state” that is expected to be induced by inhibitor treatment, with the RNA expression altering that state in a cell-line-specific manner (and thus the improvement in predictions with its inclusion)." I thought this was an interesting way to think about the findings in the paper - i.e. the kinase inhibition data provides information about each compound or perturbation, and the gene expression/omics data provides information about the cell line. Since the kinase inhibition profiles were so valuable in terms of predictive signal, I wonder if there are there any other data sources that could provide information about each compound in a cell-line-general manner similarly to, or complementarily to, the kinase inhibition profiles: maybe information on chemical structure, where applicable (e.g. Morgan fingerprints or some other kind of molecular embedding) could provide a performance improvement as well? If so (or if there are other examples I'm not aware of) this could be a useful addition to the discussion section. To be clear, the findings in the paper are valuable in their own right and adding more compound-specific information is probably beyond the scope of this study, but mentioning it as a future direction or an existing area of research could help to put the study in a broader context. 2. In the first sentence of the revised abstract, should "compounds that inhibit kinase activity emerging as a primary focus" be "compounds that inhibit kinase activity are emerging as a primary focus"? ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1010888.r003
Formally Accepted
15 Feb 2023 Acceptance Letter - Feilim Mac Gabhann, Editor, Inna Lavrik, Editor PCOMPBIOL-D-22-00783R1 Kinome Inhibition States and Multiomics Data Enable Prediction of Cell Viability in Diverse Cancer Types Dear Dr Gomez, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Timea Kemeri-Szekernyes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1010888.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .