Statistical integration of multi-omics and drug screening data from cell lines

Said el Bouhaddani; Matthias Höllerhage; Hae-Won Uh; Claudia Moebius; Marc Bickle; Günter Höglinger; Jeanine Houwing-Duistermaat

doi:10.1371/journal.pcbi.1011809

Peer Review History

Original SubmissionJuly 12, 2023
10 Aug 2023 Decision Letter - Pedro Mendes, Editor, Marcel Holger Schulz, Editor Dear dr. Said, Thank you very much for submitting your manuscript "Statistical integration of multi-omics and drug screening data from cell lines" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. In addition to the comments by the authors, please extend the introduction and cite other methods that predict drug effects from cell lines. Position your own previous work and the new work with respect to other studies. Provide guidance for the readers, in how far the current problem cannot be addressed by other methods. Please also include a baseline comparison as suggested by one of the reviewers. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Marcel Holger Schulz, Ph.D. Academic Editor PLOS Computational Biology Pedro Mendes Section Editor PLOS Computational Biology ********************* In addition to the comments by the authors, please extend the introduction and cite other methods that predict drug effects from cell lines. Position your own previous work and the new work with respect to other studies. Provide guidance for the readers, in how far the current problem cannot be addressed by other methods. Please also include a baseline comparison as suggested by one of the reviewers. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors propose a computational pipeline to jointly analyze unpaired multi-omics data from cell lines. As a main part of the workflow, the authors introduce POPLS-DA, a probabilistic orthogonal partial least squares approach for discriminative analysis. For the main results, POPLS-DA was applied to a multi-omics dataset to identify a subset of relevant genes that help discriminate affected from non-affected (aSyn vs control) samples. Subsequently, the authors perform a gene set enrichment analysis on the signature of 200 genes, thereby uncovering significant gene sets / pathways of biological relevance for the dataset at hand. This review focuses mostly on the proposed POPLS-DA method, as it comprises the main source of novelty for the submission. Major: My main concern with the paper is the lack of comparison with other baselines. Indeed, the authors suggest that existing approaches assume paired samples, while their proposed method does not. In that case, one may come up with a simple baseline such as analyzing the data modalities independently using existing and widely accepted methods such as tree-based approaches that provide means to quantify the feature/gene importance, and combining the results in a post-hoc manner. The authors also claim that the results shown in the paper can only be obtained when jointly analyzing the data, and not in a single-omic analysis. However, no evidence is provided to back up this claim. Minor: - Figure 1 has poor quality and it is quite difficult to read the text parts. - In the supplementary material, the table numbering need to be fixed as it currently shows [??] in the reference. - There are some typos across the manuscript, e.g. page 2: “…the drug screening data to highlight(s) which part…” Reviewer #2: Summary In their manuscript “Statistical integration of multi-omics and drug screening data from cell lines” el Bouhaddani and colleagues propose a workflow for multi-omics and drug screening data integration and apply this workflow to study synucleinopathies using transcriptomics, proteomics, and drug screening data on LUHMES cell lines (“cases” samples, overexpressing -synuclein and “control” samples, overexpressing GFP). In particular the goal of the study is to find genes that distinguish cases and controls and at the same time are druggable. Multi-modal integration is a common problem in computational biology. Authors developed an extension of PLS-DA method for multi-omics integration that works on non-overlapping samples. Also they developed a workflow for co-analyzing protein-protein interactions between identified genes (that distinguish cases and controls) and drug target information for the most protective drugs (identified in the drug screen). Overall the paper presents an interesting technical approach, but in applying this approach to the biological data in question there are a number of major issues that should be addressed. Major issues 1. The authors integrate transcriptomics dataset with 15660 transcripts and proteomics dataset with 2577 proteins and in the combined set they get 2292 genes. It means only less than 15% of genes profiled with transcriptomics were used in the analysis. So, if one would use the full transcriptomics dataset (e.g. w/o integration with proteomics) the set of genes that would be identified by a PLS-DA model would be different and also all the downstream findings with respect to drugs - targets analysis. Limiting analysis only to the genes in overlap between two omics sets biases/limits biological findings. One additional point here. Since all samples in the analysis belong to the LUHMES cell line I wonder whether the transcriptomics and proteomics samples can be actually considered as the same i.e. “overlapping” samples. If such consideration is possible one could also perform a “vertical” integration that would preserve all the information in both datasets. 2. The transcriptomics dataset consists of 6 samples, while the proteomics has 18 samples. It’s unclear how well the POPLS-DA deals with cases when the sizes of integrated sets are imbalanced. Would be interesting to see whether the top 200 genes identified by PLS-DA model applied to just proteomics data are different from the top 200 genes obtained in the paper from the integrated dataset. 3. Page 13, line 311: “The correlation of the transcripts with the -synuclein overexpressing cells versus the control cells was taken as a reference; if this correlation was negative, the corresponding protein feature was multiplied by minus one.” This sentence should be re-formulated so it’s clear for the reader what is correlated with what. Also it’s unclear why this per gene correlation is calculated, and what is the reason for the subsequent transformation of protein features. Minor issues Fig. 1: the resolution is very low, it’s difficult to read. Fig. 2, right: although it’s clear that score of 1 means “case” and score of -1 means control, perhaps it should be clarified in the legend or in the main text. Fig. 3: panel names are missing on the figure. Also bigger font for gene names would make this figure better readable. Fig. 4: this is a really good and important figure, it could be moved to the front of the paper so the reader understands the whole workflow right from the beginning (I would also show the number of transcriptomics and proteomics samples in the panel (1)). And the current Fig. 1 could be moved to the materials and methods or to the supplement. Supplementary materials, page 4, line 103: “??” symbols instead of Table numbers. It would be good to have a visual summary / illustration for POPLS-DA method in addition to “mathematical model” and “interpretation” sections in the supplement. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Roman Kurilov Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1011809.r001
Revision 1
27 Oct 2023 Author Response Attachments Attachment Submitted filename: ResponseReviewers3.docx https://doi.org/10.1371/journal.pcbi.1011809.r002
13 Nov 2023 Decision Letter - Pedro Mendes, Editor, Marcel Holger Schulz, Editor Dear dr. Said, Thank you very much for submitting your manuscript "Statistical integration of multi-omics and drug screening data from cell lines" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. The reviewers are generally happy with the revised manuscript. Please address the minor points by Reviewer 2 before publication. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Marcel Holger Schulz, Ph.D. Academic Editor PLOS Computational Biology Pedro Mendes Section Editor PLOS Computational Biology ********************* A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: The reviewer are generally happy with the revised manuscript. Please address the minor points by Reviewer 2 before publication. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment.** Reviewer #1: I thank the authors for addressing every point in my initial review. I believe the comparison with other baselines further demonstrates the relevance of their proposed method. Therefore, I lean towards accepting this paper. Reviewer #2: Authors addressed my comments in the revision, however I've noticed a couple of other statements in the text that would be good to clarify. Also I have questions to the authors's response to my original point number 3. Line 78: “This approach highlighted genes in the network that are associated with α-synuclein overexpression and targeted by validated drugs, and therefore are potential targets for a novel therapy for synucleinopathies” If these discovered genes are already targeted by existing drugs why is there a need for a novel therapy targeting the same genes? Conversely if the goal is to discover new potential targets for new therapies why is it important to look at the overlap of genes discovered by multi-omics analysis and genes that are targets of existing protective drugs (and not just to look at all genes discovered by multi-omics analysis). Line 320: “Our novel omics integration POPLS-DA identified a set of 200 relevant genes/proteins that discriminated between samples overexpressing α-synuclein and controls, as well as their drug targets” It’s unclear to which part of the sentence “as well as their drug targets” belongs. Also, whose drug targets? Drugs can have drug targets, genes/proteins can be drug targets but they cannot have drug targets. Line 366: “We determined the sign of the t-statistic of each gene and protein separately with respect to the case-control grouping. When the sign of the corresponding protein's t-statistic differed from that of the gene, the measurements for that protein were multiplied by minus one. This adjustment ensures that the difference in means between cases and controls had a consistent sign across all omics data.” 1) Since you are talking about expression / transcriptomics data instead of “gene” you should use “mRNA” or “gene expression level”. 2) It’s unclear to me why the difference in means between cases and controls (for a gene) must have the same sign in transcriptomics and in proteomics data. 3) I understand that such genes (where difference in means between cases and controls in transcriptomics and proteomics data has the different sign) would not be good predictors for case / control prediction, but I am not sure that it’s a justifiable data transformation – to just multiply by -1 a subset of genes in proteomics dataset. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Roman Kurilov Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. https://doi.org/10.1371/journal.pcbi.1011809.r003
Revision 2
15 Dec 2023 Author Response Attachments Attachment Submitted filename: ResponseReviewers_R3.docx https://doi.org/10.1371/journal.pcbi.1011809.r004
8 Jan 2024 Decision Letter - Pedro Mendes, Editor, Marcel Holger Schulz, Editor Dear dr. Said, We are pleased to inform you that your manuscript 'Statistical integration of multi-omics and drug screening data from cell lines' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Marcel Holger Schulz, Ph.D. Academic Editor PLOS Computational Biology Pedro Mendes Section Editor PLOS Computational Biology *********************************************************** Thank you for addressing the remaining points of reviewer 2. https://doi.org/10.1371/journal.pcbi.1011809.r005
Formally Accepted
23 Jan 2024 Acceptance Letter - Pedro Mendes, Editor, Marcel Holger Schulz, Editor PCOMPBIOL-D-23-01100R2 Statistical integration of multi-omics and drug screening data from cell lines Dear Dr el Bouhaddani, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1011809.r006

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .