Bayesian inference of relative fitness on high-throughput pooled competition assays

Manuel Razo-Mejia; Madhav Mani; Dmitri Petrov

doi:10.1371/journal.pcbi.1011937

Peer Review History

Original SubmissionOctober 18, 2023
15 Nov 2023 Decision Letter - Sergei Maslov, Editor, Zhaolei Zhang, Editor Dear Dr. Razo-Mejia, Thank you very much for submitting your manuscript "Bayesian inference of relative fitness on high-throughput pooled competition assays" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. Two major issues raised by several reviewers are that the algorithm has not been tested on experimental data and has not been compared to currently available algorithms. Addressing at least one of these issues is required for publication. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Sergei Maslov Academic Editor PLOS Computational Biology Zhaolei Zhang Section Editor PLOS Computational Biology ********************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Review is uploaded as an attachment Reviewer #2: By tracking lineages frequency via DNA barcodes in competitive cultures, it is possible to measure microbial fitness and phenotypic diversity on a large scale. However, many biological and non-biological noises will have a significant impact on the relationship between barcodes and phenotypes (fitness). Accordingly, the authors of this manuscript developed a Bayesian model-based pipeline that takes into account some of these uncertainties in the experimental setup described above. Additionally, this model was applied to analyze simulated multi-environment and replicate (by batches, or barcodes of the same genotype) experiments. As a whole, the manuscript has a solid theoretical basis and should be of general interest to the field. I believe that additional work is necessary before it can be published. My concerns are listed below. == MAJOR == 1. The manuscript contains a little bit too many technical details, so that readers might become distracted from the main logic flow. As an example, I do not believe it is necessary to comment on frequentist confidence intervals versus Bayesian credible regions starting at line 247. However, I lack a strong statistical background to provide a more subjective or comprehensive assessment or to say that all the technical details are trivial/unnecessary. Other reviewers with expertise in that area may be able to provide better suggestions. Nonetheless, moving some details to supplementary to better emphasize the main logic seems reasonable. 2. There is no description of the detailed simulation procedure in the main text or in SI. This would make reproducing the results difficult. The reader must also understand the factors that have been taken into consideration during the simulation in order to determine (i) whether the results presented are related to the reader's own experimental environment or not, and (ii) whether the performance assessment based on the simulation is just expected since the simulation entails exactly the same type of noise as those considered in the inference model. I understand from the manuscript that (ii) is exactly the case. I am not saying this approach is wrong, but it certainly needs clarification. Even better, sometimes the simulation may include some additional noise or perturbation that the inference model is unable to account for, so that one can determine whether the inference is robust against those specific additional noises or perturbations. 3. There is a disconnect between the results and the actual experimental data. In all tests, inferences are drawn based on simulated data. I understand that it gives the authors the ground truth fitness. Nevertheless, every simulation is based on some simplification or assumption that may not be valid in reality. It is difficult to determine the relevance of the method without actually relating the model to real data. I would like to provide two more specific comments in this regard. 3.1. The authors seem to assume some specific composition of the competing population (e.g. most strains are not barcoded), as well as some specific distribution of mutational fitness effects (s) (most barcodes are slightly more adaptive than wild-type). In my opinion, both assumptions are frequently violated. For example, in https://www.science.org/doi/10.1126/science.aae0568, https://www.nature.com/articles/nature17995 and https://academic.oup.com/mbe/article/39/5/msac086/6575838, all genotypes are barcoded. In the commonly used "deep mutational scanning" assays, where only proximal mutations are tested, most mutants (containing only a few simple mutations) have fitness very similar to that of the wild-type. Can the model accurately estimate their fitness? What is the significance of their differences with wild-type? 3.2. There should be some analysis based on actual experimental data (such as those in the three papers mentioned above). A straightforward test would be to determine whether the Bayesian-inferred fitness shows better between-replicate-correlations than the more naively estimated fitness reported in those papers. I am certain that the author can come up with more analyses relating the actual experimental data to the accuracy of their methodology. ==Minor== 1. The schematic diagram in Figure 1 could use a lot more details, especially regarding the source of uncertainties (at least those have been considered in the model) 2. Figure2D and E. This is related to major comment 2. Please explain how the “ground truth” is chosen/defined and used in the simulation. Also, why “68%” is used ? The number just seems weird. 3.Figure6F. It would be more prefessional to connect the two lines reaching y=1.0 early on, to the top right corner of the plot area (ECD = 1 across that range). 4.Line 310, “This loses the subtle differences due to biotic and abiotic batch effects, effectively halving the data that goes into our inference problem” I don’t understand this, please elaborate. 5. I am unable to see the name of the particular section whenever the author refers to one (for example, line 53, "... See Section ?? for details...", also in line 56, etc.). I have tried two PCs (both using Acrobat on Win11). Are there invisible symbols due to PDF transformation? Reviewer #3: In Razo Mejia et al, the authors describe a Bayesian framework for the analysis of barcode fitness assays. The authors show how general the framework can be, and test the validity of the inference on synthetic data. I really enjoyed the reading and seeing this type of inference approach finally seeing the light is a blessing to the community. I particularly enjoyed the section dealing with hyperparameters and how it can be used for interesting assays. While it is not the first time barcode fitness inferences have taken a Bayesian spin, it is the first to do so with a true Bayesian perspective on many different parameters. Unfortunately, I am not familiar with Julia (and don’t have it installed) and so cannot evaluate the software within the allotted review timeframe as the documentations are fairly lengthy. The only major issue I have is that I am unable to judge how much this framework improves on the inference compared to the general approaches the community have undertaken. How poor are the current approaches implemented by the community? Or is it less so about the fitness estimates in the simple case (single environment, assuming replicates are same fitnesses), and more about the fact that the approach can be expanded? Or is it just the credible intervals on fitness values (augmented by priors)? Because my understanding is that almost all the ad hoc approaches taken currently by the community (by the lab of Levy, Petrov, Sherlock, Desai, Dunham, Gresham, or even earlier by the genomic era through the barcoded yeast deletion collection). are very adequate for their purposes. It would have been interesting and borderline necessary to compare at least the simple approaches to this. If it is simply differences between confidence intervals and credible intervals, then it would be ideal if the authors show distinct differences in their output notwithstanding the philosophical differences between them as the differences are usually fairly minimal when priors are not strong. I have a few minor comments: 1) There are a few missing references to specific sections (e.g. line 53, line 56, but there are at least dozens throughout). 2) I’m very familiar with the experimental setup, but I think the readers will not appreciate why the unlabeled reference strain should be at ~90% of the population in these assays as described in lines 69-75. Many barcode assays have not done this in the literature and as far as I know there have been no reported issues. From my recollection, this is done due to frequency-dependent selection on some key lineages, but there is no evidence that this design resolves this (and frequency-dependent selection is not solved by any inference approach). 3) Is the simulation really sufficient for this work? Without going overboard and testing every single scenarios, it seems overly simple for the power of the proposed approach. I guess I see drift being implemented through the Poisson noise but is the Gaussian noise really adequate for the read frequency if one does not sequence to extremely high coverage? I’m also wondering if systematic noise that all play a role influence the framework at all: such as prolonged lag phase leading to uncertainty on the generations per transfer, exponential jackpotting issues during sequencing that may be poorly estimated from the neutral lineages, day to day variations etc. The main issue is that I feel like all the older approaches can adequately infer fitness for the simulation that was performed. 4) I think the aside debating the fundamental differences between confidence intervals and credible intervals is a bit misplaced. Yet if the authors want to keep this in, then perhaps the frequentist confidence intervals should be made more precise by emphasizing the repeated construction of CIs: “frequentist CIs represent the rangeS (emphasis on plural) of values where X% of ranges …” since upon repeating the experiments the confidence intervals would differ. As is written (in singular ‘range’), it may imply a fixed range of values between repetitions (vs a fixed construction method) and the ‘repetition’ containing the true population parameter would not really make sense. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Attachments Attachment Submitted filename: ploscompbio_review_11122023.pdf https://doi.org/10.1371/journal.pcbi.1011937.r001
Revision 1
30 Jan 2024 Author Response Attachments Attachment Submitted filename: ploscompbio_reviews.pdf https://doi.org/10.1371/journal.pcbi.1011937.r002
21 Feb 2024 Decision Letter - Sergei Maslov, Editor, Zhaolei Zhang, Editor Dear Dr. Razo-Mejia, We are pleased to inform you that your manuscript 'Bayesian inference of relative fitness on high-throughput pooled competition assays' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Sergei Maslov Academic Editor PLOS Computational Biology Zhaolei Zhang Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Thanks to the authors for a very thorough and thoughtful response and revision. The analysis of the Kinsler dataset greatly adds to the paper, and I also appreciate their clarification of the flexibility of the model to handle non-logistic growth. I have no further comments. Reviewer #2: In general, I am satisfied with the author's response, especially after they applied their method to empirical datasets. My congratulations go out to the authors for their nice work. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1011937.r003
Formally Accepted
11 Mar 2024 Acceptance Letter - Sergei Maslov, Editor, Zhaolei Zhang, Editor PCOMPBIOL-D-23-01682R1 Bayesian inference of relative fitness on high-throughput pooled competition assays Dear Dr Razo-Mejia, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Lilla Horvath PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1011937.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .