mbtransfer: Microbiome intervention analysis using transfer functions and mirror statistics

Kris Sankaran; Pratheepa Jeganathan

doi:10.1371/journal.pcbi.1012196

Peer Review History

Original SubmissionJanuary 17, 2024
13 Mar 2024 Decision Letter - Pedro Mendes, Editor, Niranjan Nagarajan, Editor Dear Dr Jeganathan, Thank you very much for submitting your manuscript "Microbiome intervention analysis using transfer functions and mirror statistics" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Niranjan Nagarajan Academic Editor PLOS Computational Biology Pedro Mendes Section Editor PLOS Computational Biology ********************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this article, Sankaran and Jeganathan propose mirror statistics to control FDR when estimating the effect of intervention on microbial abundance using gradient boosting models. This is an interesting and novel statistical solution to a common scientific question. This will be a nice contribution to the literature if the authors can provide additional results and discussion addressing potential limitations of their approach. My major concerns relate to violations of the assumptions required for FDR control with mirror statistics. I have more minor comments especially related to how they discuss fido and mdsine2 as comparisons. FDR control via mirror statistics is only guaranteed when key assumptions are satisfied. As the authors state, one assumption is the fact that the distribution of mirror statistics is symmetric about zero under the null. Another assumption, which the authors do not state, is that the mirror statistics are weakly correlated (Assumption 2.2 in Dai et al.). I am concerned that, in many practical settings, both assumptions may be violated in the authors applications. As I expand upon below, the authors should provide a more thorough study and discussion of when their method works and when it does not. So long as the authors provide an accurate picture of the limitations of their method, regardless of the results, I think this manuscript will be useful to the field. That said, in its current form these potential limitations are not adequately explored. Starting with the correlation assumption, there are two well known sources of strong correlations in microbiome data. One is the phylogenetic relatedness of taxa which will almost certainly lead to non-trivial correlations in the proposed mirror statistic. Second is the spurious correlations that can be induced into the data through the arbitrary sequencing depth of the measurement process. Regarding this second source, while the authors state that the data y is normalized, it is non-trivial (and I suspect not possible) to ensure that a chosen normalization will decorrelate the mirror statistics. To address this comment, I encourage the authors to expand their discussion of these assumptions to the manuscript and expand their simulation studies in two ways to provide at least some minimal degree of empirical results as to the robustness of their proposed approach to violations in these assumptions. First, the authors could incorporate varying degrees of phylogenetic structure between the taxa (e.g., phylogenetically patterned correlations in errors or in other confounding effects that might be present under the null). I would encourage the authors to explore how FDR might change as a function of that phylogenetic correlation (e.g., we expect more phylogenetic correlations when analyses are preformed at the sequence variant level than at higher taxonomic/phylogenetic levels). Second, the authors could, and likely should, expand their simulation to incorporate the arbitrary sequencing depth artifacts present in microbiome data. The counts y_i are not independent as the simulation suggests. I suspect that if the negative binomial was replaced with a multinomial (or the counts y from the negative binomial were resampled multivariate hypergeometric to an arbitrary depth) then the authors model will see elevated rates of false positives. These problems will likely become most noticeable when then intervention causes the total microbial load to shift (see McGovern et al., 2023). Regarding the symmetry assumption, it is less clear how the authors could explore model violations. At present the authors simply state they assume the mirror statistics are symmetric about zero under the null. Notably this seems to rest on an assumption that the error \\(\\epsilon\\) are symmetric about zero. As it is currently written, I do not understand why such an assumption would make sense. For context, Dai et al., make a strong argument that, under a wide range of settings, this assumption is likely true for certain mirror statistics based on linear model or graphical model estimates. The gradient boosting methods proposed are non-trivial and I do not think that prior arguments by Dai can be applied here. This needs further discussion and study. Doesn't a number of panels in figure 6 suggest that this assumption may be violated in practice? Clarifications are needed to accurately portray the current status quo. To clearly state up front, the authors do not need to change their simulations but simply to add discussions about the limits of how they are using the fido and MDSINE2 tools as comparisons. It is reasonable to use fido and MDSINE2 as a comparison here as there are few potential alternatives and both are used for similar purposes in the literature. Starting with fido. Fido is not designed to perform this task and, as the main author of that package (Justin Silverman), I would not recommend it be used in that way. In fact, in our original paper we explicitly state that this is a method for log-ratio estimation. In other words, fido is designed to estimate linear and non-linear trends in the relative abundances (or log-ratio abundances) of taxa and not their absolute abundance. It is only recently (Nixon et al. 2023, and McGovern et al. 2023) that we have started bridging the gap between estimation of log-ratios and absolute abundances. In short, when you give fido data with absolute abundance information (e.g., negative binomial data rather than multinomial data) fido basically just throws it away as its expecting the data has arbitrary scale. As a result, the authors should report how they are comparing estimates of log-ratios to their ground truth which is basically changes in absolute abundances. Are they using CLR coordinates for this comparison? Others do this but we (and others) have repeatedly expressed concern with this practice (Nixon et al. 2023, and McGovern et al. 2023, or just about any article by Pawlowsky-Glahn or Egozcue). Other non-trivial features that should be reported include the choice of kernel in fido. I am going to guess the authors are just using an RBF kernel in the basset model in fido. This is a common choice but I expect that other kernels might have better performance (again fine to leave as is and just explain choices). Both MDSINE2 and fido are Bayesian models. The choice of prior is non-trivial and should be reported. Note the default priors in fido come with an explicit warning to users urging them to think more deeply about their problem and design appropriate priors that reflect biological knowledge. For MDSINE2, the model was designed to also use qPCR measurements to bridge the gap between relative and absolute estimation. How are the authors then using MDSINE2 here without simulating qPCR data? Finally, as a smaller point, it seems wrong to say that a limitation of fido is its computational speed (page 3) when later (page 9) you say that fidowas comparable and even in once instance faster than mbtransfer. I think introducing mirror statistics to the field is important as I think there are multiple potential uses for them in the field. That said, the authors barely introduce them or explain their novelty. I think adding a paragraph somewhere providing a bit more background knowledge to the reader would likely improve the impact of the authors work. On page 3 the PDF seems to be malformed and the sentence starting with "Nevertheless, these methods" is oddly bolded on not on any particular line. Same with the word "profiles a few lines down". On line 67 you say the errors need to be symmetric but later you say they need to be symmetric and centered about zero. I think the "about zero" likely needs to be added on that line. As this is important and implies absence of systematic bias in the estimator. "timepoints that are at least max(P,Q) apart" I didn't understand what the authors are trying to say with this phrase (page 5). Reviewer #2: Manuscript by Sankaran & Jeganathan introduces a new model, mbtransfer, based on transfer function for modeling microbial community dynamics in response to external perturbation. The model allow user to fit and “forecast” the temporal dynamics of the microbiome. The method also implements the mirror statistics to select significantly perturbed taxa. Overall this is a well written paper and would be useful for modelling microbial dynamics in a continuous fashion, especially when the response to external perturbation has a delayed effect. An R package and well documented tutorials on the case studies were provided. I was able to follow the tutorial easily. I have some minor comments that I would like the authors to clarify: 1. Page 4: “we extract non-overlapping taxonomic and intervention histories”. What does “non-overlapping” mean here? 2. Page 11: “the out-of-sample forecasts showed a clear correlation with the truth”. It would be more rigorous to show the statistics here. 3. Page 11: “Across taxa, we found that the first and third quartiles of the counterfactual differences tended to agree”. It is unclear to me what the authors mean here and why it suggests that “the model did not detect interactions between the intervention effects and microbial composition”. 4. For mirror statistics, is that possible to compare it with a simple baseline statistics testing the abundance before and after perturbation? Reviewer #3: Uploaded as attachment ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes:** Justin D Silverman Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Attachments Attachment Submitted filename: Reviewer Comments.docx https://doi.org/10.1371/journal.pcbi.1012196.r001
Revision 1
10 May 2024 Author Response Attachments Attachment Submitted filename: Response_to_Reviewers.pdf https://doi.org/10.1371/journal.pcbi.1012196.r002
27 May 2024 Decision Letter - Pedro Mendes, Editor, Niranjan Nagarajan, Editor Dear Dr Jeganathan, We are pleased to inform you that your manuscript 'mbtransfer: Microbiome intervention analysis using transfer functions and mirror statistics' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Niranjan Nagarajan Academic Editor PLOS Computational Biology Pedro Mendes Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have addressed my comments appropriately. In particular I think the discussion of mirror statistics (the added background) adds substantially. I think the presentation of their method is also much improved in the revision and provides readers with a greater ability to judge the benefits and limitations of the proposed approach. Reviewer #2: The authors had addressed my comments. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes:** Justin D Silverman Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1012196.r003
Formally Accepted
8 Jun 2024 Acceptance Letter - Pedro Mendes, Editor, Niranjan Nagarajan, Editor PCOMPBIOL-D-24-00098R1 mbtransfer: Microbiome intervention analysis using transfer functions and mirror statistics Dear Dr Jeganathan, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1012196.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .