Estimating the distribution of parameters in differential equations with repeated cross-sectional data

Hyeontae Jo; Sung Woong Cho; Hyung Ju Hwang

doi:10.1371/journal.pcbi.1012696

Peer Review History

Original SubmissionMay 28, 2024
20 Aug 2024 Decision Letter - Stacey D. Finley, Editor, Peter Kim, Editor Dear Prof. Hwang, Thank you very much for submitting your manuscript "Estimating the Distribution of Parameters in Differential Equations with Repeated Cross-Sectional Data" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. When revising the manuscript, please address the comments of the reviewers, especially those of Reviewer 3. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Peter Kim Guest Editor PLOS Computational Biology Stacey Finley Section Editor PLOS Computational Biology ********************* We would like to reconsider this article after major revisions. When revising the manuscript, please address the comments of all the reviewers, especially those of Reviewer 3. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Review Report on PCOMPBIOL-D-24-00903, “Estimating the Distribution of Parameters in Differential Equations with Repeated Cross-Sectional Data” The manuscript proposes a novel parameter estimation method for when the given observations are repeated cross-sectional (RCS) data. The problem of model fitting using RCS is prevalent in biomedical research, and hence the manuscript addresses an important methodological problem. The proposed model is presented without a rigorous theoretical justification, but the authors provide a wide range of examples and simulation studies to demonstrate that their methods work well for common mathematical biology problems. I have identified only a few minor issues to raise: 1. The authors could use some copy editing to improve the writing. Currently, there are too many awkward sentences throughout the manuscript. 2. The presented method is very similar to the approximate Bayesian computation (ABC) approach. Proper citations to the literature on ABC would be useful. 3. In Figures 3b and 3c, while the different modes are correctly identified by the proposed algorithm, the weights for the modes differ from the true weights. For example, in Figure 3c, EPD puts more weight on the parameter value of 2 even when there is no noise. Some comments or analysis on why this happens are needed. 4. When discussing GP-based calibration, please include appropriate references such as Kennedy and O’Hagan (2001) in JRSSB. References Kennedy, M. C., & O'Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(3), 425-464. Reviewer #2: This paper introduces a novel method for parameter estimation in differential equations using repeated cross-sectional (RCS) data. Traditional methods for parameter estimation in differential equations often lead to data information loss, especially when dealing with RCS data. The authors present the Estimation of Parameter Distribution (EPD) method, which aims to accurately capture the distribution of parameters without losing data information. EPD involves generating synthetic time trajectories from RCS data, estimating parameters by minimizing the discrepancy between these trajectories and the true solution of the differential equation, and selecting parameters based on the scale of discrepancy. The authors evaluate EPD's performance across several models and apply it to real-world datasets, showing its advantage in capturing the shape of parameter distributions and addressing heterogeneity within systems. The introduction of the EPD method seems novel and addresses the limitations of traditional parameter estimation methods. By focusing on RCS data, the authors present a solution that retains data information and estimates parameter distributions. The manuscript evaluates the EPD method across a few models, including exponential growth, logistic population models, and target cell-limited models with delayed virus production. The application of EPD to a few real-world datasets, such as amyloid beta accumulation and viral load, shows the practical utility of the method. The manuscript is also well-organized, and the results are clearly presented. Several points could be worth including for elaboration and discussion: 1. The EPD method involves generating a large number of synthetic trajectories and estimating parameters for each, which can be computationally intensive. The manuscript does not provide a detailed discussion of the computational cost and potential strategies to mitigate it. 2. The determination of the scaling factor CC in the accept probability function seems critical for the EPD method's accuracy. However, the manuscript did not provide an analysis of how different values of CC impact the results. 3. The authors mention the use of a logistic transformation in the accept probability calculation. The manuscript could benefit from a discussion on alternative transformations and their potential effects on the parameter estimation process. 4. The EPD method shows promise in the models evaluated, but its applicability to other types of differential equations and systems is not explored. Reviewer #3: This manuscript seeks to address a problem in parameter estimation when working with discrete time series data, where individual measurements are available, but not individual histories. The authors have proposed a novel method to estimate parameters in dynamical systems models of these data. However, I cannot recommend it for publication on this journal, primarily because the method proposed needs to be more fully developed to make a convincing case. I also have several concerns/suggestions for the authors that I hope will be helpful: 1. A major concern is that the authors only compare their method in one case to GP method. However, if the goal is to arrive at probability distributions of model parameters, then the method should also be compared with other Bayesian Estimation methods eg Metropolis Hastings. For which, priors could be multimodal, possibly circumventing the issues faced by GP in capturing true multimodal distributions. 2. The test cases, eg Fig 1, 2, etc are very artificial. I understand that these are demonstrations that the method proposed can successfully recover the true multimodal distributions on model parameters, but equally, a modeler applying classic methods to these data would notice that the data is not from a single but from multimodal distributions on model parameters and presumably account for that before curve fitting. 3. The fits in Figure 4, panel (c) right box show a number of parameter combinations that were estimated but are very far from true parameter estimates. Given that the model is the logistic model, and hence very simple, it is concerning that the proposed method can have a high number of false positives that are such big outliers. 4. The parameter estimates in Fig 8 and 9 are very scattered. Is it reasonable to expect that the biological data being fitted can admit such diverse parameter values? This points to issues of parameter identifiability, and something that a modeler would attempt to address before undertaking the estimation itself. Also, line 171 on Page 11 seems to have a typo, the parameter names do not match those on Figure 9. 5. On line 7, the authors say "data is collected over time measuring the same 7 variables with different samples or populations at each time point", yet in methods, line 220, they say "Each Yi includes J observed data points at time ti". This is confusing and needs to be clarified. Were the number of observations constant at each time point or did these change? If they change, then sampling from the data points at each time ti can introduce a bias becase sampling from fewer observations will over emphasize their importance relative to data at time points with more observations 6. How big should N be (line 226), ie, how many artificial trajectories should be generated from the data? How does this depend on the number of data points, the number of model parameters, the variance in the data etc etc? 7. Equation (3), why this objective function? What if we minimize simply sum of squares? Or employ some other minimization scheme? 8. Is there a theoretical basis for the definition of an (line 249)? The authors themselves point to challenges with selecting 'c' which seems ad hoc at present. Is there any guidance on how to choose c? what about other forms for an? 9. What about estimation multiple data time courses simultaneously? So if a model has 2+ equations an we had measurements of 2+ variables, then how would this method be amended, or how would it perform? 10. The authors also hint at the method being computationally intensive and with the simplicity of the models they considered here, this can be potentially a bottleneck that may be undesirable. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: None ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1012696.r001
Revision 1
18 Oct 2024 Author Response Attachments Attachment Submitted filename: EPD_PLOS_comp_bio_Review response.pdf https://doi.org/10.1371/journal.pcbi.1012696.r002
4 Dec 2024 Decision Letter - Stacey D. Finley, Editor, Peter Kim, Editor Dear Prof. Hwang, We are pleased to inform you that your manuscript 'Estimating the Distribution of Parameters in Differential Equations with Repeated Cross-Sectional Data' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Peter Kim Guest Editor PLOS Computational Biology Stacey Finley Section Editor PLOS Computational Biology Feilim Mac Gabhann Editor-in-Chief PLOS Computational Biology Jason Papin Editor-in-Chief PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #2: The authors have addressed my comments. Reviewer #3: I am satisfied with the revisions ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: None Reviewer #3: None ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1012696.r003
Formally Accepted
16 Dec 2024 Acceptance Letter - Stacey D. Finley, Editor, Peter Kim, Editor PCOMPBIOL-D-24-00903R1 Estimating the Distribution of Parameters in Differential Equations with Repeated Cross-Sectional Data Dear Dr Hwang, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Dorothy Lannert PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1012696.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .