Bayesian calibration, process modeling and uncertainty quantification in biotechnology

Laura Marie Helleckes; Michael Osthege; Wolfgang Wiechert; Eric von Lieres; Marco Oldiges

doi:10.1371/journal.pcbi.1009223

Peer Review History

Original SubmissionJune 17, 2021
13 Oct 2021 Decision Letter - Dina Schneidman-Duhovny, Editor Dear Professor Oldiges, Thank you very much for submitting your manuscript "Bayesian calibration, process modeling and uncertainty quantification in biotechnology" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Dina Schneidman Software Editor PLOS Computational Biology ********************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Helleckes et al describe a computational framework for the Bayesian analysis of calibration and process models. The framework is implemented in python, open-source, extensively documented implements code in a object-oriented manner. The software is tested using continues integration and coverage analysis. Overall I want to commend the authors for the quality of the presented software. The manuscript is well written and easy to understand, but may benefit from some restructuring that I will describe below. The authors evaluate the framework on multiple examples of biotechnological processes, including experimental data, where they analysis calibration models individually, as well as combinations of calibration and process models. Major Comments 1) While most of the text is well written and easy to follow, there are some instances where it is difficult or impossible to understand the text without looking at the code or figures. Examples: (i) Figure 3: Description of what was done is only available in figure legends, this also needs to be described in the text. (ii) 4.1.1: It would helpful to have a mathematical description of the calibration model (provide formulas) (iii) Figure 10: It is hard to understand what was actually done and the reader has to guess based on the figure. Describe what was done first, then the results. 2) The (unpublished) python implementation of PESTO, pyPESTO https://github.com/ICB-DCM/pyPESTO), implements a similar feature set as the murefi/calibr8 including providing an interface to pyMC3 for Bayesian analysis. Still pyPESTO implements less modularity between process model and calibration model to enable use of adjoint sensitivity analysis. Similarly, the recently proposed PEtab format (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008646) permits a similarly flexible specification of calibration models and multi-experiment process models (templating) in COPASI/data2dynamics/pyPESTO. Accordingly, I think it would be important to (i) explicitely describe differences to what is possible with PEtab (ii) demonstrate the greater flexibility by the modularity between calibr8 and murefi. Regarding (i), I would also encourage the authors to implement support for the PEtab format and potentially propose an extension of the format based on the identified differences (I don't think these points are necessary for the scope of this review though) Regarding (ii) I think the use of a hyperprior in Section 4.2.3 is an interesting example, but I think it would be more convincing if the nesting would be in the calibration model component and if the authors could demonstrate that joint analysis of process+calibration model is important to obtain accurate credibility intervals for parameters (by comparing stepwise sequential analysis, using synthetic data if necessary) Am I correct in my understanding that the toolbox cannot be applied to mixed effect modeling? If I am mistaken, this would be quite convincing and a demonstration could replace point (ii). A third way of addressing (ii) would be to demonstrate that the combination of calibr8/murefi/sunode enables the use of adjoint sensitivities (see https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005331) with complex noise models, but I expect that this would be a lot of work 3) The description in 3.2.6 seems to imply that murefi supports sensitivity analysis which would be necessary for the computation of the posterior gradient. Looking at the code, this does not seem to be supported though, but theoretically possible with sunode. It would be helpful if the authors could more extensively describe the support of gradient computation in calibr8/murefi. Minor Comments a) I believe that the abstract should mention that the authors apply their methods to ODE models. b) l244, the text just mentions calibr8, which is only introduced later in the text c) I have a hard time understanding the issue that is supposed to be illustrated by figure 3C, could the authors describe the issue in more detail, is there some statistical test to show the issue (and is such an analysis implemented in calibr8)? d) I find it hard to believe that gradient based optimizers did not perform well for section 3.2.5. Did the authors try using the least squares algorithm and used a logarithmic transformation of process parameters? e) the authors may want to cite https://doi.org/10.1093/bioinformatics/btw703 in section 2.4 f) I think "Frequentist" is more commonly used than "likelihoodist" (assuming that's what the authors wanted to say). Reviewer #2: This proposal is focused on the description of a Python package for the use of Bayesian theory for the quantification of uncertainty from high-throughput cultivation experiments. The paper is technically sound, well written and all the codes are available on a Git server. Instructions about the installation on a Python idle are clearly given.This proposal has been submitted as a “software” article. As such, the software must be either widely used within the scientific community or have the promise of wide adoption by a broad community of users. However, as it is actually written, the target audience is quite limited. Specific comments are appended below. - The background needs to be more precisely defined. The field of expertise of the authors are “bioprocess engineering”. However, as stated in the introduction, the software could be used by any researchers handling large set of microbial (or even cell culture) kinetics data, extending the target audience to the broad community if researchers involved in fields like systems and synthetic biology. Some part of the text should then be reformulated according to this comment. - The term “microbioreactor” is used at several stages of the manuscript. This term is actually misleading, because the authors are used a commercial minibioreactor platform based on the use of deepwell. For the readers that are not familiar with the technology, the use of the term “micro” could give the feeling that the authors are using microfluidic cultivation device. This comment is also related to the above-mentioned in the sense that the technical terms in the text should be reformulated to target a broader audience of potential users (other example: I can imagine that only a limited number of specialists know what is a flowerplate). - At this stage, the applicability of the software is limited to a cultivation device commercialized under the name of “Biolector” and involving the use of 48-well deepwell plates. Additionally, the Biolector can be coupled to a robotic liquid handling platform for off-line sampling. The authors have to discuss the potential extension of their software to other microplate-based devices. - Bayes theory is typically easy to explain, but it is not the case of this paper. Section 2.1 should be expanded by considering an example. - About the applications, only two are given in the manuscript. Another, very important, application is the measurement of the activity of gene circuits based on fluorescent reporters. For this application, it is also important to relate to biomass in order to obtain specific values. How could this application integrated in the package ? - ODE models can be integrated in the calibration procedure. Is there any limit about the number of equations/parameters that can be handled ? Reviewer #3: In the manuscript "Bayesian calibration, process modeling and uncertainty quantification in biotechnology", Laura Marie Helleckes, Michael Osthege, and coworkers present two Python software packages, calibr8 and murefi, for enabling more reproducible and automated calibration of mathematical models in biotechnology. The authors show the capabilities of these packages on several examples with real datasets collected by themselves. The topic of parameter inference for mathematical models of biological systems have become a very relevant topic in the recent years. This can be achieved by a frequentist perspective, finding the maximum likelihood estimate using various optimization techniques, or by a Bayesian approach, which relies on the Bayes theorem and often carried through Markov chain Monte Carlo sampling. One of the key problems is that frequently “handmade” computational pipelines are built for each specific problem which compromises their reproducibility and reusability. Efforts in the community to build automated pipelines exist, however these can be complex and require high expertise. In this manuscript, the authors tackle this problem by introducing two new software tools for model construction and calibration, which substantially ease the process and facilitate the usage to non-experts, in particular, focusing - but not limited to - biotechnology applications. Likewise, the key contributions is the development of two reusable open-source toolboxes/libraries. Overall, the manuscript is well written. I appreciate that the authors provide exemplary notebooks/code in the respective toolboxes, but I have to admit that I did not have the time to test them. ===== Major ===== As far as I can see, the only available loglikelihood implemented in calibr8 assumes a Student’s t distribution. This is a bit contradictory with the statement the authors make in L.67-69 regarding the need of having more flexible frameworks. Maybe the authors could include in their toolbox additional distribution assumptions such as Laplace, Gaussian and/or log-normal, to facilitate the users this flexibility. Moreover, I do not see how the user could easily implement a different custom noise model, in case this is possible it would be great to add a tutorial. A very well-known standard format to encode ODE models is the SBML format (https://doi.org/10.1093/bioinformatics/btg015) which is supported by many existing toolboxes for modeling and parameter estimation. Including support to this in the toolboxes here presented would substantially increase the public and potential new users to the tool. I deeply encourage the authors to add support to SBML models although not only ODE models are used within these toolboxes. Following on the line of my previous comment: Is there a model validator? What I mean is whether is there some sort of sanity checks for user defined models in calibr8 and murefi., e.g. positivity of the modeled species. This would make even better the user experience, since then, even the level of expertise required could be lowered. I am wondering if the authors thought of this option, and whether it would be possible to include it (or at least comment on this). - L.88: “examples are Data2Dynamics [14] or PESTO [15], ... However, both tools are implemented in MATLAB and are thus incompatible with data analysis workflows that leverage the rich ecosystem of scientific Python libraries.” I would like to make the authors aware that I could find that the toolbox PESTO has been translated into Python, going under the name of pyPESTO (https://github.com/ICB-DCM/pyPESTO). I would encourage the authors to include it into their manuscript since it has been released since January 2019 and, therefore, adapt the comparison within toolboxes. L.177: The statement is correct. However, parameter uncertainties can also be quantified from a frequentist perspective using optimization by the so-called method profile likelihoods (see https://doi.org/10.1093/bioinformatics/btp358). I am not aware whether this is known in the field of biotechnology, but definitely something to mention in a manuscript regarding parameter estimation and uncertainty quantification. In case this is not frequently used in this field, it could be also a novelty to add in the study (although not necessary). L.205: Could some citation be added to this statement? L.226-227: Could some citation be added to this statement? L.244: Please reformulate the sentence, I could not understand which restriction is meant here. ===== Minor ===== L.112: Without losing generality, the authors could list some specific examples of other research fields. Figure 1: I suggest to use the same label and line style in the two subplots for the Normal case. L.142: “From a known list of parameters …” The word “known” here could lead to misunderstanding since the parameters may be actually unknown and need to be estimated. But I understand that when simulating they are actually “known”. Maybe this sentence could be rephrased (in case of finding a better formulation). L.158: Please clarify that each individual pair y_{obs} and y_{pred} are the same length. Figure 3: Please indicate in the figure as legend what the blue dots and green lines are. Having this on top of the caption description will facilitate the understanding of what is depicted. Same for the black dashed line. Figure 7: Change the color scheme → colorblind proof L.540: To the list of known samplers, please add “emcee” which is also a very popular python sampler (https://doi.org/10.1086/670067). Figure 12: Please increase the separation between A and B. This helps to identify the right Y axis in A. General: Revise the text for typos. General: Please use consistent font sizes in the figures, for some, they are really small. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Frank Delvigne Reviewer #3: Yes:** Elba Raimundez Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1009223.r001
Revision 1
18 Dec 2021 Author Response Attachments Attachment Submitted filename: response.pdf https://doi.org/10.1371/journal.pcbi.1009223.r002
16 Jan 2022 Decision Letter - Dina Schneidman-Duhovny, Editor Dear Professor Oldiges, We are pleased to inform you that your manuscript 'Bayesian calibration, process modeling and uncertainty quantification in biotechnology' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Also please address reviewer 3 comments in this submission. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Dina Schneidman Software Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have adequately addressed my concerns. Still, the authors may want to consider using qq-plots to visualize differences in data/theoretical quantiles/percentiles in figure 3. Reviewer #3: The authors have addressed the reviewer comments. The manuscript is improved. I reviewed this manuscript a second time with a closer attention to details within the text, from which I have only few minor comments. L.17: Specify the name of the packages as done in the author summary L.22 and L.24: the abbreviations can be introduced in the main body instead of in the abstract L.82: “many methods assume” This is very generic, maybe list a few examples of such methods. L.137: check for consistency style with or without italics for “Normal”, same for “Student-t” (and other words which I may have missed) L.124: introduce here abbreviation for MLE L.175: List some examples L.185: previous chapter? Or previous subsection? Maybe avoid unclear references and point directly to the desired Section, e.g. “introduced in SectionX.X” L.186: abbreviation already introduced L.188: e.g. here Normal is without italics (check for style consistency in the whole text), same for “Bayesian inference” (see L.193 and L.124) L.213: indeed the choice is arbitrary, however 99%, 95% and 90% are predominantly used maybe this information can be added Figure3: Likelihood bands reported in the caption differ from those in the figure legend (A), please check which are the correct ones. General: check for writing style of specific words in italics vs non-italics L.301: ODE is already introduced as abbreviation L.566: which prior assumptions? ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: None Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1009223.r003
Formally Accepted
7 Feb 2022 Acceptance Letter - Dina Schneidman-Duhovny, Editor PCOMPBIOL-D-21-01126R1 Bayesian calibration, process modeling and uncertainty quantification in biotechnology Dear Dr Oldiges, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Livia Horvath PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1009223.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .