Verification and reproducible curation of the BioModels repository

Lucian P. Smith; Rahuman S. Malik-Sheriff; Tung V. N. Nguyen; Henning Hermjakob; Jonathan Karr; Bilal Shaikh; Logan Drescher; Ion I. Moraru; James C. Schaff; Eran Agmon; Alexander A. Patrie; Michael L. Blinov; Joseph L. Hellerstein; Elebeoba E. May; David P. Nickerson; John H. Gennari; Herbert M. Sauro

doi:10.1371/journal.pcbi.1013239

Peer Review History

Original SubmissionJune 18, 2025
18 Jun 2025 Author Response Attachments Attachment Submitted filename: Response to Reviewers.pdf https://doi.org/10.1371/journal.pcbi.1013239.r001
3 Aug 2025 Decision Letter - Marc Birtwistle, Editor PCOMPBIOL-D-25-01206 Verification and reproducible curation of the BioModels repository PLOS Computational Biology Dear Dr. Smith, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Oct 03 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Marc Birtwistle Section Editor PLOS Computational Biology Journal Requirements: If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. 1) Please ensure that the CRediT author contributions listed for every co-author are completed accurately and in full. At this stage, the following Authors/Authors require contributions: Lucian P Smith, Rahuman S. Malik-Sheriff, Tung V. N. Nguyen, Henning Hermjakob, Jonathan Karr, Bilal Shaikh, Logan Drescher, Ion I. Moraru, James C. Schaff, Eran Agmon, Alexander A. Patrie, Michael L. Blinov, Joseph L. Hellerstein, Elebeoba E. May, David P. Nickerson, John H. Gennari, and Herbert M. Sauro. Please ensure that the full contributions of each author are acknowledged in the "Add/Edit/Remove Authors" section of our submission form. The list of CRediT author contributions may be found here: https://journals.plos.org/ploscompbiol/s/authorship#loc-author-contributions 2) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 3) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 4) We note that your Data Availability Statement is currently as follows: "All relevant data are referenced within the manuscript.". Please confirm at this time whether or not your submission contains all raw data required to replicate the results of your study. Authors must share the “minimal data set” for their submission. PLOS defines the minimal data set to consist of the data required to replicate all study findings reported in the article, as well as related metadata and methods (https://journals.plos.org/plosone/s/data-availability#loc-minimal-data-set-definition). For example, authors should submit the following data: 1) The values behind the means, standard deviations and other measures reported; 2) The values used to build graphs; 3) The points extracted from images for analysis.. Authors do not need to submit their entire data set if only a portion of the data was used in the reported study. If your submission does not contain these data, please either upload them as Supporting Information files or deposit them to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories. If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. If data are owned by a third party, please indicate how others may request data access. 5) Please amend your detailed Financial Disclosure statement. This is published with the article. It must therefore be completed in full sentences and contain the exact wording you wish to be published. 1) State the initials, alongside each funding source, of each author to receive each grant. For example: "This work was supported by the National Institutes of Health (####### to AM; ###### to CJ) and the National Science Foundation (###### to AM)." 2) State what role the funders took in the study. If the funders had no role in your study, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 3) If any authors received a salary from any of your funders, please state which authors and which funders.. If you did not receive any funding for this study, please simply state: u201cThe authors received no specific funding for this work.u201d 6) Please send a completed 'Competing Interests' statement, including any COIs declared by your co-authors. If you have no competing interests to declare, please state "The authors have declared that no competing interests exist". Otherwise please declare all competing interests beginning with the statement "I have read the journal's policy and the authors of this manuscript have the following competing interests" Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: This manuscript from Smith and others claims to have generated a revised Biomodels database collection that contains verified and reproducible models, whereas before the database contained many models that were not so. The primary definition they use is that each model in SBML form should be able to be paired with a SED-ML document for specifying simulations, and then simulated by at least two different tools to obtain the same time course information. A main issue is that this definition of a reproducible model seems to fall short of what many would consider to be, is that the model in the database reproduces the original claims of the model and/or simulation data given in the original publication. The fact that an SBML model and a SED-ML paired document might give different results using different simulators seems much more likely due to the simulator having issues, and /or the wrappers that are used to convert the information to the format needed for the simulator. In fact, in one of the few author’s examples (Fig. 3), this is the case. Thus, the manuscript as written does not seem to support the very broad claims being made. That being said, the set of wrappers written to interface a common SBML+SED-ML file set to a variety of simulation tools could be a useful contribution to the modeling community, and that in this reviewer’s opinion is one of the novelties of this work. However the fidelity of these wrappers is uncertain, given the extremely surprising lack of agreement between many simulators, which seem to all be using standard ODE solvers. Some more focus on why so much disagreement is seen would be very important and is quite disturbing. Specific comments on the paper are further given below: • If separate simulation engines do not produce the same result, it may not be the issue of the model but could be an issue with the software, or a numerical methods issue. Therefore, this criteria may be too aggressive. In fact, the result highlighted in Figure 3 suggests this is the case. Why would someone want to simulate a model with multiple ODE solvers to prove reproducibility? • Reference to some of the debate that went on with regards to metabolic modeling and discussion in the introduction would be quite helpful o 10.15252/msb.20156548 o 10.1038/ncomms5893 o 10.15252/msb.20156157 • The introduction is generic and more detail on what problems reproducibility of models brings, and how people have already tried to solve it, would be appropriate. • The introduction conflates to some extent open source (accessible) versus reproducibility; this distinction deserves some nuance. • A figure which shows the EMBL protocol for model curation, versus the improvements in this work, would be helpful to the reader. • In the introduction, the section on BioSimulators is informative but how it is relevant to the described work and how it is used could be better described. • The possibility that the developed wrapper codes to enable SED-ML to be interpreted by different solvers play a role in observed errors was not adequately considered. That being said the development of these wrappers seems to be a large part of the novelty of this work but the way the manuscript is presented does not capture this very well. • Line 185, the paragraph about SED-ML updates, was unclear. If the update happened back in 2021, why is it relevant to this 2025 paper? • Line 210, what problems, and how would they be fixed? More broadly in this section, how do secondary files fit into the main point of the manuscript of computational reproducibility? Related, Line 214, it was unclear what translation programs did or why they were important. • The usefulness of the template SED-ML files for 579 entries with respect to reproducibility of a model for the reported results was unclear, but this seems to be the major point of the paper. More impactful would be SED-ML files that proved reproducibility of results. • It was unclear how many of the 1055 models are reproducible in terms of the claims and/or results of the publications. • Why was no attention paid to the non-ODE models? • Figure 4 is hard to interpret--why are there six different bars but also five different colors in each bar? One wonders as well what is the significance of the template versus full sed-ml comparison. Lastly, there seems to be an unexpectedly large amount of mismatch if the figure is being interpreted correctly, which is very surprising and deserves more analysis. This figure would benefit from changing and/or a much more detailed legend. • Line 381 paragraph, this work that went into changing and updating simulators seems perhaps significant but is not deeply described or enumerated in the paper. More of this would be relevant in this reviewer’s opinion. • The fact that this study does not compare simulation results to original results undermines the claim of general reproducibility, and some of the argued significance of the work. This seems to be driven by a strict reliance on SED-ML as a component of reproducibility, which could be overly demanding. How the Biomodels “increase in reproducibility has in turn increased the value of the repository to modelers going forward” (line 478) was not completely convincing. • Table 1 has perhaps one of the most surprising results that a large number of simulators are different. Given these are using a standard ODE solver, this is alarming, but the reasons why are unclear. Reviewer #2: This work represents a valuable addition to both BioModels and the systems biology community. I found the terminology usage around "validation" somewhat inconsistent. In the BioSimulators paragraph, the term appears to refer to validating the SED-ML/SBML format, which differs from your earlier definition where validation involves determining the biological accuracy of a model. While I appreciate your explicit definitions of terms in the Verification of Models paragraph, I believe you should also have provided a definition of 'translation' as its meaning was unclear in the context of 'translation programs'. The verification results are particularly interesting. It would have been valuable to include a similar analysis using the SBML test-suite, which is widely used by simulation engine developers to verify their implementations. Your results highlight persistent issues with different applications correctly simulating SBML. The fact that very few models verified across all five simulation packages indicates significant work remains to be done—as you rightly acknowledge. I anticipate that extending your template SED-ML files will likely reveal additional issues. Minor issues: - The curation of BioModels has historically involved broader collaboration than just the EBI. Initially, researchers from the University of Hertfordshire and Caltech were also involved. - Regarding standards chronology, SED-ML was not strictly the second standard to emerge, as SBGN predates it. - The sentence on line 83, page 4 ("However, many such simulators...") appears redundant and could be removed. - On line 373, page 11, there is a repetition of the word 'only' that should be corrected. Reviewer #3: This manuscript represents a valuable contribution to model reproducibility and reuse. The authors looked at all curated ODE models in the BioModels database (1,055 models) and generated a SED-ML description of a simulation protocol for each. They then ran simulations using five different ODE solvers/simulation engines, namely COPASI, Tellurium, VCell, PySCeS, and Amici. Sometimes, this failed. And when there was run success there was sometimes a consistency problem. The authors tried to fix these problems. After this effort, they were able to get a 98% run success rate (1,031/1,055) and a “verification success” flag for 928 or 932 models (88%). Verification success was declared when two or more simulators (given the same simulation instructions) generated indistinguishable results. This report demonstrates some of the challenges of reproducing modeling/simulation results. I think it has major strengths, which are listed below. I think this report will be of interest to many in the biological modeling community. The weaknesses/limitations of the study that I note below do not require major manuscript revisions. Mostly, these points should just be appropriately recognized in the Discussion section. Strengths 1) I’m not aware of another effort that considered all the curated models in the database. I think other related studies have focused on a selected subset. The comprehensive scope is perhaps unique. 2) I think the authors’ efforts make an impact on reproducibility. They were able to get 98% of the models running (on at least two simulators) and a verification success rate of 88%. Now other researchers will benefit from this work. 3) The authors are sharing resources freely, such as APIs. Weaknesses 4) There was verification failure for 12% of the models, even after curation. The causes are discussed (on ~2 pages) but perhaps the analysis could go further? Could the authors say more about why they think so many models have this problem? 5) In this study, there was a focus on ODE models. There are other types of models. Could the authors say more about how their efforts might be extended to encompass other model types? What challenges would have to be overcome to consider logic/Boolean models, PDEs, agent-based models, etc.? 6) The “template” SED-ML files provided by the authors are not for reproducing published figures. This is a disappointment, but it’s understandable. Could the authors say more about why they did not go further? What are the challenges to reproducing results in figures? How could these be overcome? Can model developers do more to promote reproducibility? 7) The manuscript does not spell out the audit trail for fixes. Could the authors better explain the extent of logging and how other researchers can access available documentation of fixes? 8) There was no attempt to verify whether models capture underlying biology. We really want that type of effort, right? How can we get there? Could the authors say more about this point? 9) Success results are reported inconsistently, 932/1055 in some places, and 928/1055 elsewhere. This inconsistency should be fixed. 10) Verification success rate was 94% for 578 “template” protocols but only 85% for 453 “full” protocols. Are more complex SED-ML files a problem? The simulators evaluated don’t fully support SED-ML features? ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013239.r002
Revision 1
12 Sep 2025 Author Response Attachments Attachment Submitted filename: Response_to_Reviewers_auresp_1.pdf https://doi.org/10.1371/journal.pcbi.1013239.r003
14 Nov 2025 Decision Letter - Marc Birtwistle, Editor Dear Dr. Smith, We are pleased to inform you that your manuscript 'Verification and reproducible curation of the BioModels repository' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Marc Birtwistle Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Overall the authors have done an adequate job addressing the previous concerns of this Reviewer. Only some minor considerations remain that could be addressed without another round of revision. 1. Verification has a few different definitions throughout the manuscript and this could lead to confusion. For example, pg 3 says “verification to mean that separate simulation engines produce the same results when running the same computational experiment.” Pg 5 says “We use the definition of verification from the Los Alamos National Laboratory white paper [23], which defines verification as ‘. . . the process of determining that a model implementation accurately represents the developer’s conceptual description of the model and the solution to the model.”, and then this is further clarified later. 2. On pg. 7 they state “" As Figure 2 shows, as a first step, model repositories such as BioModels should include SED-ML to describe the steps needed to replicate the results." This seems overstated—a suggestion that inclusion of files such as SED-ML in BioModels and similar databases would help with reproducibility seems more appropriate. Reviewer #2: Thank you for addressing my comments and those of my fellow reviewers. The manuscript certainly flows better. Reviewer #3: The authors have substantively and adequately addressed my concerns. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1013239.r004
Formally Accepted
Acceptance Letter - Marc Birtwistle, Editor PCOMPBIOL-D-25-01206R1 Verification and reproducible curation of the BioModels repository Dear Dr Smith, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013239.r005

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .