Peer Review History

Original SubmissionMay 14, 2021
Decision Letter - Kiran Raosaheb Patil, Editor

Dear Dr Ebbels,

Thank you very much for submitting your manuscript "Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis" for consideration at PLOS Computational Biology.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Kiran Raosaheb Patil, Ph.D.

Deputy Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

***********************

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Wieder et al examine how different parameters for pathway over-representation analysis (ORA) influence the results of metabolomics data analysis. They use five experimental metabolomics data sets (from humans, mouse and E. coli) to test the relationship between ORA parameters and ORA results. The study is relevant for the field because it nicely illustrates the strong influence of ORA parameters on the outcome of the analysis. However, my main concern is that the authors cannot identify the best parameter configuration because they lack a proper reference of what is true (they mention this also in the discussion). Specific comments are below:

1) The authors claim in the abstract that they used in-silico simulations, thus I expected that they simulated metabolic changes (e.g. with a dynamic model) and used ORA to recover the true in silico perturbation. Instead they use real data, which is great, but it is difficult to judge which parameter configuration is best. The authors mention themselves that the study lacks a “ground-truth dataset”. The authors should at least better describe the nature of such a ground-truth dataset/ result. They should also describe better what they mean with in silico simulation.

2) The authors selected 5 experimental data sets for their study. They could better describe in the main text (instead of Table 1) why they selected these data and which conditions/organisms are investigated. For example, why did they select (only) two strains out of 3800 E. coli strains in Fuhrer et al? In fact, these data contain some information about the "ground truth", because in many cases the deleted gene can be assigned to a metabolic pathway.

3) A main concern is the selection of Databases. Obviously, the best choice is a genome-scale reconstruction of metabolism of the respective organism and I wonder why the authors did not consider them; at least for the E. coli data, mouse and HeLa cells.

4) The authors could give a better overview about the parameters tested and better quantify their relevance relative to each other. The recommendations in the discussion are not specific enough. For example, how could one derive a “consensus” pathway signature.

Reviewer #2: The authors assess parameters used in pathway enrichment analysis using 5 publicly available MS-based metabolomics datasets. While those dealing with these tools have surely identified inconsistencies in results according to the tools and parameters used, the exercise of testing the boundaries and consequences in results of mis-use of the tools is interestingly quantified by the authors. In addition it is of value the section on recommendations on the best practice to use over-representation analysis (ORA) in the metabolomics field.

The manuscript is well-written but requires improvement in certain sections.

Title/introduction – It is worth mentioning that ORA is also known as metabolite enrichment analysis, that might even be a more common name used within the metabolomics community.

Methods

L501 – for dataset MTBLS135 the text mentions that the sample type is plasma, while Table 1 mentions tissue. So here it is important to rectify and harmonize. In addition, the files of the uploaded dataset mention ‘serum’ and not plasma. This might sound like a detail, but one should be precise, as the two sample types (serum and plasma) are not interchangeable.

L502 – dataset MTBLS136: I could not retrieve any data files in the Metabolights repository for this study! Supplementary Materials of the associated publication do not contain the metabolomics data itself per sample. So I could also not confirm the number of samples (controls and estrogen-users).

L506 – as for all the other datasets, it is important to mention the number of samples for the last dataset Fuhrer et al. And was the negative mode subdataset used or the positive mode or both? In the results, it is then mentioned 2 subsets from this particular study, so this needs to be clarified in the Methods.

Table 1 – where does the total number of metabolites mapping to KEGG compounds was extracted from? Analysis within this manuscript or extracted from the original datasets?

L525 – metabolite ID conversion

This is a stress point of identification and according to the algorithms used, it can over-identify and thus overestimate metabolite coverage or if too conservative, it can assign only a part of possible metabolites.

For example: when one measures an amino acid, will it be immediately assigned to L-amino acid? An amino acid can be also D-amino acid in a biological environment, however this type of assignment is hardly assessed and possibly not even feasible to know using regular LC/CE-MS techniques (one would need to use chiral chromatography fo example). And in the likely even of not knowing, will it be assigned to D/L-amino acid or assumed to be L-amino acid?

Another example are acids and salts and ions (for example: glutamic acid vs glutamate vs sodium glutamate (or any other salt)): will these be assigned to the same metabolite ID or to different ones?

As different metabolite ID convertors (tool in MetaboAnalyst, too in BioCyc, etc) were used, it is likely that these will produce different results!! This aspect deserves some explanation and words of caution in the manuscript. Will the IDs be back-converted to the same list of IDs when using convertors from other databases? This would be good to check.

L567 – metabolite misidentification

Results

L169 – NMR is not relevant in this study, as none of the studies chosen have used it, so please remove it.

L293 – if the authors want to mention MSI levels 2-4, then they need to explain what these are, as the readers might not know…

Fig5 – A and B figures are actually quite similar. So it might be worth mentioning in the text that the misidentification is probably from molecular formula to metabolite and not so much from mass to molecular formula. Some words on the similarly / differences between these two graphs are worth mentioning.

L334 - 349 – one needs to be careful with these type of statements. Reversed phase is used in combination with ion pairing for detecting polar metabolites, of a similar nature to the ones that are detected by HILIC. HILIC can also detect a lot of apolar metabolites, because it can act in a mixed mode type of chromatography. In addition GC-MS with a prior derivatisation step in the sample preparation has been used a lot for detecting polar metabolites! So being that there is a lot of variety in analytical and sample prep methods for metabolomics, this whole section should be rephrased and adapted.

The authors should stick to polarity of compounds to make their point, irrespective of the technique used, as clearly the reality is not this simple, as it does not only depend on chromatography!! in fact one of the datasets does not use chromatography but capillary electrophoresis!!

Then none of the datasets aimed at lipid metabolism, this would then lead to completely different result. So this whole section is very circumstantial and simply not informative.

Discussion

L395 – mis-identification is abundant in all analytical platforms!

L397 – not relevant to mention NMR as it was not used in this study. To add to this: maybe NMR provides less coverage but maybe better identification…?

Reviewer #3: Wieder and colleagues performed an interesting study on the application of ORA to metabolomics data. The paper is well-written and proposes, for the first time, the guidelines to perform ORA analysis in metabolomics. I especially enjoyed reading the pathway comparison part, it is a nice addition to the paper. However, some of the observations or conclusions were somewhat trivial to me. Still, I find the paper suitable for publication and I suggest the following changes to improve the paper:

- The authors state: "To perform ORA, three essential inputs are required: a collection of pathways (or custom metabolite sets), a list of metabolites of interest, and a background or reference set." By definition, all annotatable metabolites in untargeted metabolomics are all those in the collection of pathways. How do all annotatable metabolites and all metabolites in the pathway differ?

- Pg 12, section "increasing the number...". It is not needed to do all that to demonstrate this trivial aspect. It is expected. The authors could perform a similar approach but instead of considering all pathways, considering only those pathways that have at least 2 (or 3 if data allows it) DA, and then randomly add new DA to the pathways to see how the overall ranking fluctuates. Otherwise, adding DA by p-value is arbitrary and, considering the nature of untargeted metabolomics data, these observations are expected.

- "Pathway sets can be obtained freely from several databases..". . KEGG is partially commercial so should not be included. For BioCyc, I would like to know how the authors obtained that information as I believe it's partially commercial. MetExplore uses others databases so it should be removed as well. Is Ingenuity still on business?

- Pg 26: "Suggested recommendations...". The paper discusses the ambiguity of the composition of the background set in untargeted metabolomics, but the recommendations are not clear on how this background set should be built in untargeted metabolomics. It would be worthwhile to break down the first recommendation into untargeted and targeted.

- Discussion, how using topology-based or FCS do/could naturally overcome some of the ORA limitations, or introduce different biases. Could the recommendations be instead: do not use ORA, but FCS/Topology-based? What could the limitations of FCS/Topology-based in untargeted metabolomics be? I believe a brief discussion about this is necessary.

- Pg 7 lines 139; "consisting of all compounds annotated to at least one KEGG pathway", could you define this better?

Minor:

- Change: Firstly -> first, secondly -> second

- Pg 5 line 95: p-value, P should be capitalized.

- Pg 14 line 225. "Pathway database is key" I suggest using a more informative sentence.

- I did not find the supplementary materials.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Sofia Moco

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Revision 1

Attachments
Attachment
Submitted filename: Response_to_reviewers_v1.pdf
Decision Letter - Kiran Raosaheb Patil, Editor

Dear Dr Ebbels,

Thank you very much for submitting your manuscript "Pathway analysis in metabolomics: pitfalls and best practice for the use of over-representation analysis" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the editorial recommendation below.

Before formal acceptance, I would like to suggest a change in the title: replacing "pitfalls and best practice" by "recommendations". The reason being that the term "best" in the computational context often implies optimisation / rigorous analytical basis. I would therefore like to encourage you to consider this change (and consistent changes in the rest of the manuscript along these lines).

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Kiran Raosaheb Patil, Ph.D.

Deputy Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

As summarised below, the reviewers are satisfied with the response and the changes to the manuscript. Before formal acceptance, I would like to suggest a change in the title: replacing "pitfalls and best practice" by "recommendations". The reason being that the term "best" in the computational context often implies optimisation / rigorous analytical basis. I would therefore like to encourage you to consider this change (and consistent changes in the rest of the manuscript along these lines).

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I thank the authors for addressing all of my points, and I have no further comments.

Reviewer #2: The authors improved the study by addressing the reviewers concerns to a level that in my opinion makes this manuscript worthy of publication.

Reviewer #3: The authors have addressed all my concerns.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: None

Reviewer #2: None

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Sofia Moco

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Revision 2

Attachments
Attachment
Submitted filename: Letter_to_editor_v2.docx
Decision Letter - Kiran Raosaheb Patil, Editor

Dear Dr Ebbels,

We are pleased to inform you that your manuscript 'Pathway analysis in metabolomics: recommendations for the use of over-representation analysis' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Kiran Raosaheb Patil, Ph.D.

Deputy Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

***********************************************************

Formally Accepted
Acceptance Letter - Kiran Raosaheb Patil, Editor

PCOMPBIOL-D-21-00895R2

Pathway analysis in metabolomics: recommendations for the use of over-representation analysis

Dear Dr Ebbels,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Katalin Szabo

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .