Parallel Tempering with Lasso for model reduction in systems biology

Sanjana Gupta; Robin E. C. Lee; James R. Faeder

doi:10.1371/journal.pcbi.1007669

Peer Review History

Original SubmissionSeptember 24, 2019
20 Nov 2019 Decision Letter - Thomas Lengauer, Editor, Stacey Finley, Editor Dear Dr Faeder, Thank you very much for submitting your manuscript 'Parallel Tempering with Lasso for Model Reduction in Systems Biology' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised several concerns about the manuscript as it currently stands. In particular, the reviewers have questions about details related to the mathematical approach for parameter fitting, including convergence criteria and chain swapping. In addition, more attention should be paid to presentation of the figures, parameter values, and which data to include in the main text compared to supplement. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here. We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us. Sincerely, Stacey Finley, Ph.D. Associate Editor PLOS Computational Biology Thomas Lengauer Methods Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this manuscript, Gupta, Lee, and Faeder describe a method for identifying key kinetic modules associated with more complicated mechanistic models using a Bayesian parameter estimation approach coupled with Lasso regularization to reduce the number of parameters. In terms of the review criteria: Originality - yes, the work appears to be original as it incorporates Lasso regularization. Of note, Klinke and Finley used a Bayesian parameter estimation approach coupled with time-scale separation for model reduction (Klinke and Finley Biotech Prog 2012). Significant biological and/or methodological insight - methods that enable one to extract some biological insight from a more complicated mechanistic model are helpful. This one uses Lasso regularization as a new aspect. Rigorous methodology - - Couple of concerns: 1. The authors choose to incorporate a Lasso regularization approach to minimize the number of parameters in the model. Given that parameters in system biology models may not be tightly constrained and may exhibit correlation with other parameters, a Lasso approach only chooses one parameter in a set of correlated parameters. The choice is a bit arbitrary. So what is it to say if the Lasso chooses another parameter in the correlated set, would the conclusions be different? 2. Given the common presence of parameter non-identifiability in these types of models, convergence applied to the parameters doesn't really make sense, while applied to the model predictions does. Once the chains have converged, they are sampling the posterior distribution. It seems here, the authors only ran the MCMC chains until they converged and then used the unconverged segments. That's not correct. 3. It's unclear to me about the point of swapping chains. In this implementation, the proposed step only depends on the current point. Swapping chains just artificially inflates within chain variance and reduces between chain variance such that the Gelman-Rubin PSRF would be reduced, suggesting early convergence. Of note, the majority of the time is spent solving ODEs - so figure out a way to speed up solving the ODEs, especially stiff ones. Importance to researchers in the field - Yes, this may be helpful to researchers in the field to reduce the complexity of models based on the data available. Other minor comments: Unclear how PT is better than adaptive MCMC methods, which are much faster than plain MCMC. Unclear how noise properties impacts results using toy model. Experimentally, noise distribution may be constant (i.e., independent of the true value) or may be inversely proportional to the true value (RNAseq data is like that). Reviewer #2: This paper presents a novel computational method (PTLasso) for identifying reduced mathematical models of signaling networks. By searching for reduced models that can fit temporal experimental data, the method yields information on critical pathway components, modules, or parameters that are necessary to explain observed data. In my opinion, this paper addresses major needs in the area of quantitative systems biology/mathematical biology: first, by seeking reduced models it addresses the issue of overfitting (which is a particular challenge in cell signaling, where models based on prior knowledge are becoming increasingly complex, even while very limited information about parameter values is available); second, it addresses the issue of how to interpret model fits (by revealing which aspects of a model are actually necessary to reproduce experimental observations). Overall I believe this paper makes a significant contribution to cell signaling and other areas of quantitative biology, which should be published and is a good fit for PLOS Comp. Bio. The methodological approach is solid. However, there are a number of issues with the presentation that should be addressed before publication, detailed below. 1) Throughout the paper, the mathematical models are presented only graphically. If I understand correctly, all models are treated as ODEs with added Gaussian noise. I think it would be helpful to include in the Supplement the explicit equations for all models. This would aid reproducibility and help prevent potential confusion, for example: a) I was initially confused about the usage of both capital Ks and lower-case ks to denote kinetic rate parameters, since capital Ks are typically associated with equilibrium constants (i.e., ratios of kinetic parameters). I would recommend using only lower-case ks (e.g., in Figures 1 and 2), but regardless, showing the full models would clarify any potential confusion about what the parameters mean. b) In the caption of Figure 3 model description, it reads “Solid lines show species conversions and dashed lines show influences”. I think I know what is meant by this in terms of mass-action kinetics, but I’m not sure it would be universally understood. Again, the full equations would prevent any confusion. 2) In the captions of Figs. 1-3, the parameter values are reported without any units. The units should be clarified (I assume they are per-second rate constants, since the simulated time trajectories are presented in seconds) 3) It would be helpful if the x-axis could be labeled with numeric values for the parameter plots (e.g., Fig 1C, Fig 2C and D, etc.). With the plots unlabeled, it’s very difficult to figure out where the prior was centered (the value of mu) and how that value compares to the true parameter values. I realize this is a challenge because of space limitations. At least, it would be helpful to clarify the chosen values of mu and b directly in the figure and/or figure caption, along with the values of the true parameters for comparison. 4) Throughout the paper, it is unclear what is meant by the number of samples. For example, in Figure 1B, the curves represent “4e3 samples” and in Figure 1C, the distributions show “4e5 PT samples” (is it the same number for both PT and PTLasso?). Why is the number of samples shown in 1B and 1C different? I had a hard time understanding (1) what does one sample mean, in terms of the MCMC procedure described. Does one sample = one run of many iterations? If so, is it subject to the convergence criterion described in Methods? (2) What determines the number of samples used, which is different for different models? 5) The results for NFkB signaling seem central to the paper, since the authors here show an application to real-world experimental data. I think the presentation of Figure 4 and corresponding section of Results could be improved: a) It’s unclear to me why Fig. S7 is in the supplement, and not included in the main text (perhaps as part of Figure 4, or as a stand-alone figure). It seems like an important result that demonstrates the utility of the method: that PTLasso identifies the A20 network module as being necessary to reproduce the nNFkB trajectories under continuous stimulation, whereas it is NOT necessary to reproduce measurements from short-pulse stimulated cells. From a biological standpoint, it suggests that cells may engage negative feedback loops (or other types of modules) only under certain stimulus conditions. This is very interesting! b) The presentation in Fig. 4 is confusing, because in 4C results for both PT and PTLasso are shown, whereas in 4B only “PT Fits” are mentioned, but I believe PTLasso is implied. c) In Fig. 4, only the PTLasso penalty parameter distributions are shown, which seems inconsistent in presentation with the rest of the paper, in which distributions of parameters from both PT and PTLasso are directly compared. (And since PT serves as the baseline of comparison throughout the paper, to support the utility of PTLasso). I guess the challenge is that there are so many parameters in the NFkB signaling network. Still, it would be nice to see a visual representation of the output of PT versus PTLasso, especially since it is emphasized that the PT and PTLasso fits are equally good (4C). I was particularly interested in whether the PT fits also hint at some modularity, or do the PT parameters tend to involve all modules? In other words, is PTLasso really necessary to reach the conclusion that the A20 module is not necessary, or can PT alone reach this conclusion, perhaps indirectly? I wonder if there’s a way to show parameter ensembles, somehow lumped by module. Another solution would be to directly show the PT and PTLasso parameter distributions in the Supplement, where space is not limiting (and group the parameters by module). I think this is already done for PTLasso in Fig. S5, where the A20 module parameters tend to be centered around low values (-20 to -30). Do the corresponding plots for PT show higher values and/or broader distributions of A20-module parameters? This would support the idea that PTLasso is really necessary over PT to eliminate the A20 module. d) I don’t think Fig. S5 is ever referred to in the Main Text (I couldn’t find it). If I understand, it shows that PTLasso recovers best fit parameter values that are in line with previous estimates. This seems worth describing in Main Text. However, it needs to be clarified which NFkB trajectory these parameters correspond to: continuous or short-pulse stimulation. 6) This might be a minor detail, but in line 144, the authors neglect the normalization constant 1/2b. If I’m not mistaken, this means that the weight of the prior (relative to the likelihood) in the energy function is changing in a way that depends on the hyperparameter b. Is this justified? 7) There are some typos in line 113: missing vector notation on theta, and a missing period. 8) Could the authors clarify the relationship between equations 1 and 2? Is there an explicit equality between p(Y given theta) and L(theta)? Reviewer #3: The paper by Gupta et al addresses an important problem in computational biology - the calibration of parameters in ODE kinetic models and model reduction. The authors extended their past work on parallel tampering and Bayesian MCMC chains to include Lasso regularization. Regularization is very important in large models and the Lasso framework they use merges very well with the parallel tempering algorithm. Overall I recommend that the paper be published by Plos Comp Bio. I have a few suggestions that I think can improve the paper without too much effort and I recommend that these will be suggested to the authors so they can decide which to implement. I did not understand how the authors dealt with parameter units. Parameters have values that have are on an arbitrary scale, some could be order of magnitude different from others. In their MCMC runs, parameter scaling matters in their multivariate proposal distribution ( eq 4.). This is especially true if parameter covariance is taken into account. One suggestion (see Yao et al MSB 2016) is to normalize all values related to a reference value using log of the fold difference. There are other ways of addressing this issue and perhaps the authors already do and I missed it. But if they don’t I think that they should. The authors used synthetic data to test their methodology. I think this is a great approach. However, the way they implemented the addition of noise in synthetic data is problematic. The addition of Gaussian white noise is very unrealistic. The majority of variability in biological measurement is not due to instruments where the idea of Gaussian noise is reasonable but from biological variability. An alternative I would suggest is to add parameter variability (that could be gaussian) and run multiple simulations of the ODEs and use that to generate the synthetic data and see how good is model calibration is under these conditions. In the NFkB model, the authors used both hard and soft constraints. It will be great to know if these were essential, i.e. can the system “work”, create good fits even if these constraints are violated? The authors explain how they tune hyperparameters related to the prior. But I did not see any discussion related to other hyperparameters related to the proposal distribution, the tempering rate, the likelihood function (i.e. the sigma of the residual). Are these less important? Or should they also be tuned? ******** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: None ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1007669.r001
Revision 1
3 Jan 2020 Author Response Attachments Attachment Submitted filename: cover_letter_resubmission_PCB.pdf https://doi.org/10.1371/journal.pcbi.1007669.r002
20 Jan 2020 Decision Letter - Thomas Lengauer, Editor, Stacey Finley, Editor Dear Dr. Faeder, We are pleased to inform you that your manuscript 'Parallel Tempering with Lasso for Model Reduction in Systems Biology' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch within two working days with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Stacey Finley, Ph.D. Associate Editor PLOS Computational Biology Thomas Lengauer Methods Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The revision seems to largely address the prior critiques. Reviewer #2: The authors have addressed all my concerns. I think this is a great paper and it is ready to be published. Reviewer #3: The authors addresses all my concerns. I recommend the paper for immediate publication in Plos Comp Bio ****** Have all data underlying the figures and results presented in the manuscript been provided?** Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: None Reviewer #3: Yes ******** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1007669.r003
Formally Accepted
28 Feb 2020 Acceptance Letter - Thomas Lengauer, Editor, Stacey Finley, Editor PCOMPBIOL-D-19-01641R1 Parallel Tempering with Lasso for Model Reduction in Systems Biology Dear Dr Faeder, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Laura Mallard PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1007669.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .