A Bayesian approach to modeling phytoplankton population dynamics from size distribution time series

Jann Paul Mattern; Kristof Glauninger; Gregory L. Britten; John R. Casey; Sangwon Hyun; Zhen Wu; E. Virginia Armbrust; Zaid Harchaoui; François Ribalet

doi:10.1371/journal.pcbi.1009733

Peer Review History

Original SubmissionMay 27, 2021
23 Sep 2021 Decision Letter - Jason M. Haugh, Editor, Inna Lavrik, Editor Dear Mr. Glauninger, Thank you very much for submitting your manuscript "A flexible Bayesian approach to estimating size-structured matrix population models" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Inna Lavrik Associate Editor PLOS Computational Biology Jason Haugh Deputy Editor PLOS Computational Biology ********************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This paper explores and investigates several different formulations of a size-structured matrix population model to accurately estimate growth rate from hourly cell size distributions. Different model formulations are checked against laboratory data of the marine cyanobacterium Prochlorococcus to infer how well each model version estimates growth rate and other physiological relevant parameters. The expansion of model formulations presented here is a valuable and worthwhile effort; if it is possible to capture additional physiological parameters of a phytoplankton population from cell-size distributions, the data would be quite informative. The authors attempt to expand and improve upon earlier models whose underlying formulas for transition matrices were not able to be linked directly to the cellular parameters of interest here (carbon fixation, carbon loss, and cell division). However, identifiability issues raises serious concerns about whether model formulations and Bayesian approach are able to obtain valid inference: 1) A Bayesian approach requires that the evidence, P({n}), is constant in order to use the proportionality listed after equation 6, line 418. How can you ensure this partition function is constant? 2) Instability was encountered when attempting variational inference, which may indicate that the procedure is not producing valid estimates of the posterior using the likelihood and prior models. Authors do not provide posterior distributions for all parameters (only two are shown in Figure 6) nor how these estimates correspond to assumed priors, (e.g. for parameters E_k and rho_max given in Table 2); the presentation is not clear on whether posteriors are reasonable with respect to expert-choice priors. The presentation can be better if the reader is more directly convinced that the likelihood with chosen priors do in fact result in STAN sampling a realistic (rather than just an algebraically emergent) posterior. Perhaps the authors could include both the prior and posterior distributions (on the same plot) of a sensitive or difficult to constrain parameter from Table 2. To this end, please show: -Maximum likelihood estimates of parameters to show that priors and posterior estimates are reasonable; MLE should give similar values to mean posterior. -MC sampler chains (i.e. as Reference 38, Figure 4) for the most difficult to constrain parameter to affirm ‘burn-in’, feasible models, and successful exploration of posterior. - Plots of posterior probability for all parameters. 3) Please more fully discuss risk of over-parameterization. The paper itself ends with the admission that size-distribution data cannot constrain all the parameters. It is entirely possible that no model is able to accurately partition cell size changes between division, cell growth and carbon loss, so it is unclear how the authors are accounting for this and refining their model structures. It is also unclear why so many models are presented when some are very likely over-parameterized, especially the free-size dependence versions. The paper would benefit from a streamlining and selection of models, with over-parameterized models noted in the SI. Please also discuss benefits and risks of incorporating more concrete cell biology within the model. For example, a time delay on cell division or no carbon loss during night would likely provide more simpler model formulations. In addition, I have the following specific comments: Title: The title as written is misleading; ‘flexible’, as written, addresses the Bayesian approach utilized, which appears to be standard. The title also suggests that the paper is about inference techniques for size-structure matrix population models (a very large class of models!), which is not the case. The flexibility that the authors are mentioning is presumably for model construction (not the inference procedure). The models implemented here are also highly specific to high-resolution, time series flow cytometry data of phytoplankton. The title should reflect this; please revise. Line 23-24: Awkward/misplaced sentence; in-situ growth rate estimates are not really obscured by heterotrophic biomass nor detritus; it is because these rates cannot be obtained from abundance or carbon estimates alone (which are a composite of growth and loss). Please clarify. Line 58-61: Authors are incorrect with regards to how previous models were evaluated. While goodness of fit is checked, model formulations in some papers were benchmakred against laboratory division rate data. Please revise. A more objective presentation of earlier studies is also needed across the manuscript; authors of previous model papers are presumably well-aware of their model limitations, and physiological measurements weren’t often the goal in these past studies. Risk of over-parameterization is only one of many reasons why other versions were not explored. Please do not assume intent unless explicitly stated in these earlier manuscripts. Line 91: Ending sentence here is not entirely correct; there is no connection to the larger marine carbon cycle in this paper. The rate parameters perhaps may offer this, but the authors do not extrapolate to any larger cycles in this manuscript. Lines 100-104: I’m not sure I entirely follow the arguments and the emphasis placed on cell size in relationship to hourly division rate. It is correct to compare plots of hourly division rate to mean cell size? Division produces an increase of small cells, such that a plot between mean cell size and division rate would not necessarily show a correlation (no large cells are expected with higher division rates, and after division, which must have happened to get a rate, cell size is no longer correlated to the process). Furthermore, as the authors have count data in each size class, could the authors not present an analysis of how cell sizes shift in comparison to hourly division rate? For example, do the largest cells decrease in abundance immediately after dusk (as suggested by Fig.2), whereas medium cells decrease in abundance more towards the middle of the night? And is this all accompanied by corresponding increases in smaller cells? This would probably yield better insight into the timing of division and guidance for model division formulas. I agree that the process of cell division is not likely instantaneous and complete cell fission will likely take a few hours. Discussion: The paper examines performance of each model version, but at the end, the reader is left wondering what perhaps is the best formulation going forward or where exactly the work should be going. Could the authors add additional recommendations or concrete decisions to guide readers? Methods: Please provide example functional curves for cell growth, cell division and cell loss; these are useful for reader visualization and how different parameters affect each function. Methods: Perhaps I missed it, but explicit code to call Stan with model formulations does not appear available. Reviewer #2: The work presented in this manuscript represents an important advance in the use of demographic models for understanding the ecology of marine phytoplankton. It should eventually be published. There are, however, a few issues that need to be addressed in the current version of the manuscript. 1. One contribution of this paper is the application of Bayesian methods for parameter estimation to a size-structured matrix population model for marine phytoplankton. The application of Bayesian methods for matrix models is not particularly new. A second is the extension of the model to allow for the shrinking of cells (as a result of respiration). This second part is new. I would, the introduction of shrinking into the model introduces new parameters. I would like to see some evidence that these parameters can be estimated accurately from simulated data (where the true values are known). My intuition is that there are many combinations of the parameters that would produce the same sequence of size distribution, and, as a result identification may be an issue. The authors touch on this in the section that begins on line 258. 2. There is no free lunch: one must pay for the Bayesian approach by the specification of prior distributions on the parameters. In my opinion the authors do not discuss this cost sufficiently in the discussion. What does one do when there is no "prior knowledge?" How sensitive are the posterior distributions to the priors? This second question is important, and could (should?) be addressed with the simulation studies, but should also be addressed in estimation of the parameters for the laboratory data. 3. In a number of places in the manuscript (eg., lines 58-61, 346-348) where the authors claim that previous work was flawed because that previous work used a measure of goodness of fit to observed size distributions "as a proxy for overall model performance". At least for references [18], [19], [23] and [24] this was not the case. Instead, they judged model performance by how well the model could estimate division rate---the object of inference---compared with a "gold standard" method (dilution experiments) by calculating concordance. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No: No code is provided that must have been used to construct model formulations and call Stan libraries Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1009733.r001
Revision 1
1 Dec 2021 Author Response Attachments Attachment Submitted filename: response_to_reviewers.pdf https://doi.org/10.1371/journal.pcbi.1009733.r002
8 Dec 2021 Decision Letter - Jason M. Haugh, Editor, Inna Lavrik, Editor Dear Mr. Glauninger, We are pleased to inform you that your manuscript 'A Bayesian approach to modeling phytoplankton populationdynamics from size distribution time series' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Inna Lavrik Associate Editor PLOS Computational Biology Jason Haugh Deputy Editor PLOS Computational Biology *********************************************************** https://doi.org/10.1371/journal.pcbi.1009733.r003
Formally Accepted
12 Jan 2022 Acceptance Letter - Jason M. Haugh, Editor, Inna Lavrik, Editor PCOMPBIOL-D-21-00986R1 A Bayesian approach to modeling phytoplankton populationdynamics from size distribution time series Dear Dr Ribalet, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Agnes Pap PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1009733.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .