SIMPROV: Provenance capturing for simulation studies

Andreas Ruscheinski; Anja Wolpers; Philipp Henning; Pia Wilsdorf; Adelinde M. Uhrmacher

doi:10.1371/journal.pone.0327607

Abstract

Improving interpretability and reusability has become paramount for modeling and simulation studies. Provenance, which encompasses information about the entities, activities, and agents involved in producing a model, experiment, or data, is pivotal in achieving this goal. However, capturing provenance in simulation studies presents a tremendous challenge due to the diverse software systems employed by modelers and the various entities and activities to be considered. Existing methods only automatically capture partial provenance from individual software systems, leaving gaps in the overall story of a simulation study. To address this limitation, we introduce a lightweight method that can record the provenance of complete simulation studies by monitoring the modeler in their familiar yet heterogeneous work environment, posing as few restrictions as possible. The approach emphasizes a clear separation of concerns between provenance capturers, which collect data from the diverse software systems used, and a provenance builder, which assembles this information into a coherent provenance graph. Furthermore, we provide a web interface that enables modelers to enhance and explore their provenance graphs. We showcase the practicality of SIMPROV through two cell biological case studies.

Citation: Ruscheinski A, Wolpers A, Henning P, Wilsdorf P, Uhrmacher AM (2025) SIMPROV: Provenance capturing for simulation studies. PLoS One 20(7): e0327607. https://doi.org/10.1371/journal.pone.0327607

Editor: Ivan Zyrianoff, Alma Mater Studiorum Universita di Bologna: Universita degli Studi di Bologna, ITALY

Received: March 11, 2025; Accepted: June 18, 2025; Published: July 8, 2025

Copyright: © 2025 Ruscheinski et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All mentioned software can be found at https://github.com/orgs/MosiSimProv/repositories. The complete documentation of the software can be found at https://simprov.readthedocs.io/en/latest/index.html. A software artifact containing all relevant files, including the specifications of the rules and provenance pattern, as well as provenance graphs of the case studies can be found at https://doi.org/10.5281/zenodo.15388206.

Funding: This research was supported by the German Research Foundation (DFG), grant 682 320435134 (GrEASE). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Motivation

Established reporting and documentation guidelines in systems biology enlist information required for interpreting and reusing simulation models [1] and simulation experiments [2]. The former includes assumptions and decisions made during model construction [3] and semantic annotations for relating model parts to real-world entities [4]. Reporting guidelines such as TRACE [5] and STRESS [6] direct the attention towards the entire simulation study and document, e.g., calibration and validation steps that have been conducted. Similarly to the COMBINE archive [7], they bundle heterogeneous information sources that are relevant in reproducing and interpreting the results of a simulation study. However, neither captures the processes by which activities, information sources, and the generated artifacts are interrelated. Simulation studies comprise iterative processes of collecting information about the system of interest, specifying and refining the simulation model, checking the expected behavior of the simulation model, conducting simulation experiments, and analyzing the results. Thus, these documentation guidelines leave significant gaps in the overall story of a simulation study even though the documentation of all steps has been identified as crucial for credible simulation models [8] and for justifying the decisions made during the simulation study [3].

Provenance embraces any information describing the production process of a product of interest [9]. In particular, it refers to information about entities, activities, and people involved in producing a piece of data or thing[10]. Thus, provenance moves the simulation study processes into the focus of interest. It puts the different entities of a simulation study, e.g., assumptions, requirements, data, simulation experiment specifications [7], and different versions [11] and components [12] of simulation models into a specific context [13], i.e., how they have been used and generated during the simulation study. Thus, provenance can enrich existing collections of artifacts such as COMBINE archives [7]. With provenance, not only do the final products become accessible, but intermediate products and how they relate to each other, complementing existing simulation model documentation formats [14–16]. Moreover, provenance has been found useful for relating different simulation studies to become a family of models [17], and for documenting the process of pursuing different hypotheses to finally arrive at a simulation model that fits the findings of different wet-lab studies [18] (see Fig 1). Thus, provenance appears crucial for improving transparency, reproducibility, and reuse of simulation models and data from simulation studies [4,5,19–21], and for automatically executing simulation studies and the simulation experiments therein [22,23].

Download:

Fig 1. Excerpt from a provenance graph depicting the refinement process of a simulation model (SM1) during a simulation study.

This includes the incorporation of references (R) in the refinement, followed by testing the model’s ability to reproduce wet-lab data (WD) through a simulation experiment (E) in which the resulting simulation data (SD) is then analyzed using a Python script (S) to obtain the validation result (VR).

https://doi.org/10.1371/journal.pone.0327607.g001

However, manually recording the provenance of a simulation study will, in most cases, be unrealistic due to the implied effort, also it bears the danger that incomplete or incorrect information might enter the documentation [24]. The obvious solution is to gather as much provenance information as possible while conducting the simulation study automatically and in a structured manner.

Workflows are typically equipped with means of automatically recording provenance [25]. However, simulation studies belong to a class of processes that are not well supported by established workflow management systems, as these knowledge-intensive processes require substantial flexibility at both design- and run-time [26]. Declarative workflows offer the required flexibility and support [27] and have been used for conducting simulation studies [28] and for the automatic recording of provenance information [29]. However, to use the automatic documentation of workflow systems, the modeling and simulation study has to be conducted within these workflow systems, which need to support the specific combination of tools of relevance for the simulation study being conducted [8].

Here, we will opt for provenance documentation that avoids the additional workflow layer. Instead, our approach provides a modular architecture that records the modeler’s activities in each tool used for the simulation study separately via provenance capturers and then centrally collects, interprets, and aggregates the provenance information in a provenance builder.

In the following, we will first overview the W3C provenance standard PROV-DM and describe how it can represent and store provenance information about a simulation study. Afterward, we will introduce the architecture of our lightweight and modular provenance-capturing approach and the methods involved and describe its implementation in the web-based tool SIMPROV. Next, we will demonstrate SIMPROV in two cell biological case studies. In the first case study, we capture developing and extending an mRNA delivery model [30] within a new modeling and simulation tool for rule-based modeling with dynamic compartments [31]. In the second case study, we used Tellurium [32] to replay crucial steps of a simulation study that analyzed endocytosis processes of the Wnt signaling pathway [18] and compared the generated provenance graph to the one that was manually generated and included in the original publication. We close the paper with a discussion, conclusions, and future work.

Provenance of simulation studies

Recording the provenance of any process requires a provenance model prescribing how the recorded provenance information is formatted. In addition to how the provenance information is formatted, the choice of the model also defines what information is needed to form said structure: It dictates the minimum of information needed and what optional information can be included [33,34].

PROV-DM extension for simulation studies

Our provenance model is based on the W3C provenance standard, PROV-DM [35], as PROV-DM has previously been extended to develop a provenance model for entire simulation studies [15,17,36,37]. In PROV-DM, the provenance information forms a directed acyclic graph in which the nodes represent entities, activities, and agents. The edges are used to describe their dependencies, i.e., the following relations:

usage-relations (entity activity) that describe which entities have been used in the activity,
generation-relations (activity entity) that describe which entities have been generated by an activity and
association-relations (agent activity) that indicate an agent’s responsibility for an activity that took place.

We extended PROV-DM with entity types representing a study’s intermediate and final products and activity types describing what activities are executed by a modeler [15,17,36], which will represent the provenance model applied by SIMPROV. What activities are executed by a modeler and what entities are part of these activities throughout a simulation study has been identified in various life cycles [29,38,39] and reporting guidelines [5,6] of simulation studies. According to those, a simulation study starts with a research question about a specific aspect of a real-world system that shall be answered by modeling and simulation. Next, a conceptual model (which is often said to include the research question [40]) is built that contains all context that informs and guides the model building process including requirements, assumptions, a qualitative model, and further data and information [38,41]. Based on the conceptual model, the simulation model can be implemented, e.g., via a domain-specific modeling language, which can then be executed as part of a simulation experiment to produce simulation data. Until the research question can be answered based on this data, the simulation model is successively refined, and the new model versions are calibrated, validated, and analyzed by simulation experiments, as represented by activities. Additionally, various postprocessing steps of the experiment outputs may be necessary, producing entities such as scripts or visualizations. The software systems used by the modeler during these activities are represented as agents. According to PROV-DM, agents refer to anything that bears responsibility for generated entities and activities that happened. Therefore, we propose that the software systems used by the modeler during these activities are represented as agents. These may be the simulators used to run the model or the software tools used by the modeler to analyze and visualize the experiment results. Treating software as agents (rather than meta-data of other entities, such as simulation experiment specifications) underlines the importance of software in simulation studies for reproducing simulation results.

Table 1 provides an overview of the central entities in a simulation study. In the following, we discuss in which formats they typically are provided and their role in simulation studies.

Download:

Table 1. Provenance entities of a simulation study.

https://doi.org/10.1371/journal.pone.0327607.t001

The conceptual model.

The importance of the conceptual model in designing a simulation model and conducting a simulation study is widely acknowledged [38,39,41]. Although the interpretation of the conceptual model varies [40], one widely accepted definition is that the conceptual model is a collection of all types of resources that inform the construction of the simulation model [38,39]. Thus, the conceptual model refers to early-stage products of a simulation study [42]. In our data model, we include the following entities as, e.g., listed by [17,40].

Research questions define what questions or objectives are sought to be addressed by the study. They determine the focus in constructing the model and which experiments have to be executed to reach answers. Thus, it is essential to include the research questions in recording provenance, as also highlighted by reporting guidelines [5].

Requirements define the simulation model’s output behavior and further substantiate the research question. Throughout the simulation study, the model is checked and refined to ensure that the requirements are satisfied via a variety of calibration and validation activities [40]. Therefore, requirements play a crucial role in motivating the specification and execution of simulation experiments as part of provenance. They may be encoded by a reference to data that shall be reproduced [40], by a logic formula (such as Metric-Interval Temporal Logic [43]), or in a domain-specific language for requirement specification (e.g., the Biological Property Specification Language [44]).

Assumptions describe the scope of a model and what simplifications were made throughout the simulation study to reduce the level of complexity. Therefore, documenting assumptions is needed to correctly build the model and to interpret the simulation study’s results. Assumptions can be formulated using arbitrary formulas with a defined mathematical syntax, or free text. Different types of assumptions may be annotated with an ontology, such as the Systems Biology Ontology (SBO) [45], to indicate which part of the model contains which kinds of assumptions [17]. For example, the assumption that the concentration of a specific model entity remains constant during the simulation refers to SBO term 362 (concentration conservation law) [17].

The Qualitative Model is an early, often more abstract version of the simulation model focusing on causal relationships. Qualitative models can be specified using various formats, such as informal sketches and textual descriptions of varying complexity, that can help design the model. But also qualitative formalism with a visual (diagrammatic) representation may be used, such as causal loop diagrams [46] or the Systems Biology Graphical Notation [47].

The final element of the conceptual model is data used as input for variables and parameters or used for calibration or validation.

All these elements might be subject to further substantiation by references to scientific papers, websites, or other source types.

Simulation models.

Based on the entities in the conceptual model, a simulation model is specified [38,39]. A (simulation) model (M) for a system (S) and an experiment (E) is anything to which E can be applied in order to answer questions about (S) [48]. So, in contrast to the conceptual model, the simulation model is executable by a simulation experiment via a simulator. A simulation model may be implemented using a modeling formalism like (stochastic) Petri nets [49], or using domain-specific languages [50] like rule-based languages, e.g., the BionetGen language [51], the Systems Biology Markup Language (SBML) [52], or CellML [53]. In addition, general-purpose programming languages (e.g., Python or Julia) may be used to implement the model.

Simulation experiments.

A simulation experiment is the process of extracting data from a system by exerting it through its inputs [48]. They may be specified using domain-specific languages for experiment design, such as SESSL [54] or SED-ML [55], based on meta-models [23,96] or in a general-purpose language, such as Python or R, with the option to use dedicated libraries for experiment design [56]. Various types of simulation experiments are specified and executed with different goals throughout a simulation study, e.g., to calibrate, validate, and analyze the simulation model [39,57]. For instance, a modeler may aim to use sensitivity analysis to probe a model’s robustness [58], compare simulation data of related models for cross-validation [59], or check existing hypotheses regarding a biochemical mechanism using parameter scans [18].

Simulation data.

Executing simulation experiments produces simulation data that can then, e.g., be compared to reference data from the conceptual model for calibration or validation. Furthermore, simulation data may be reused in other simulation studies and thus become part of the conceptual model (in those subsequent studies). The format of simulation data depends on the tool running the simulation experiment and the simulator used.

Postprocessing.

Several postprocessing steps may be required depending on the answers the modeler wishes to extract from simulation data. This includes specifying and executing scripts to transform data and produce visualizations.

Scripts are additional executable files, that are crucial for extracting relevant insights from given simulation data. Scripts may be specified in any general-purpose programming language, in a domain-specific language for creating visualizations [60], or in a scientific workflow format [61].

Visualizations: The appropriate type of visualization depends on the experiment executed, the data produced, and the information the modeler wants to convey. Thus, in general, any visual media format may be used.

Fig 2 shows a provenance graph in the graphical notation of PROV-DM, illustrating various types of entities, activities, agents, and dependencies discussed above.

Download:

Fig 2. Example provenance graph of exploring the difference between two simulation models by visualizing simulation data generated from two independent simulation experiments.

The activities comprise the executions of the experiments (Executing Experiment) and the visualization of results (Visualizing Data). The entities comprise the two simulation models (SM1, SM2), two simulation experiments (E1, E2), the simulation data generated by the experiments (SD1, SD2), and the Python script (S) for generating visualization of the simulation data (V). The agents are the modeling and simulation tool (TEL) and the programming environment (PY).

https://doi.org/10.1371/journal.pone.0327607.g002

Provenance patterns

Based on the extension of PROV-DM for simulation studies we formulated provenance patterns [22] that define which combinations of provenance entities, dependencies, and agents are valid for an activity type.

For example, the activity Specifying Experiment always uses a simulation model entity but can sometimes use an assumption entity and a requirement entity (see Fig 3). In addition, a Specifying Experiment activity always produces a simulation experiment entity. In addition, we defined required and optional attributes for the entities and agents as part of the provenance patterns.

Download:

Fig 3. An example for a provenance pattern for the activity Specifying Experiment.

The activity always uses a simulation model (SM). Optionally (denoted in gray), it can also use an already existing simulation experiment, an assumption (A), and/or a requirement (R). The attributes required or optional (gray) for the entities and agents are annotated next to the dashed lines. So, if an activity uses a requirement, then the requirement must have attribute values for its name and file path—its specification can be blank.

https://doi.org/10.1371/journal.pone.0327607.g003

Based on the life cycles of simulation studies and guidelines discussed above, we specified patterns for the activities Specifying Reference, Specifying Research Question, Specifying Requirement, Specifying Assumption, Specifying Simulation Model, Specifying Simulation Experiment, Executing Simulation Experiment, and Analyzing Simulation Data.

Concept

To collect provenance information from a varying set of individual tools, we propose a modular architecture that can be adapted to the modeler’s needs for different tool setups. We adopt Eek and Miller’s idea [62] to capture provenance from individual tools locally before combining all captured information centrally. They proposed an Agent-Based Provenance Architecture [62] introducing various types of agents to capture provenance from distributed, heterogeneous software applications. Here, we propose an architecture tailored to record the provenance of simulation studies.

Our approach is highly flexible and can be easily adapted to the individual modeler’s needs. It consists of two module types: provenance capturers and a (central) provenance builder (see Fig 4). The provenance capturers are responsible for detecting and collecting information about the modeler’s activities from the different software systems or tools. As modelers use a wide range of systems, tools and their combinations they can combine any number of capturers—one captuer for each system or tool. Whenever a capturer detects that the modeler performed an activity using the capturer’s software system or tool, it collects all relevant information including which entities and agents are involved in the activity and communicates it to the provenance builder.

Download:

Fig 4. Overview of the SIMPROV software architecture.

Provenance capturers record information on the tools used by the modeler. The provenance builder collects and combines the captured information into a provenance graph.

https://doi.org/10.1371/journal.pone.0327607.g004

The provenance builder processes the incoming events and checks if the new provenance activity with its entities and the associated agents matches the provenance model (incl. the provenance patterns) used by the simulation study. Here, the provenance model (incl. the provenance patterns) will be as discussed above. Still, other provenance models can also be used with our approach (you only have to adapt the provenance capturers accordingly). If the new activity, including its entities and agents, matches the activity’s provenance pattern, it is chained with the already captured provenance graph.

Provenance capturers

A provenance capturer has to accomplish four tasks:

Detecting the modeler’s activity in a software system,
Collecting information about the activity,
Identifying the activity and related entity types, and
Sending the information to the provenance builder.

Detecting the modeler’s activity & collecting information.

Given the diverse software architectures used in various systems, the methods by which a provenance capturer detects and collects information about the modeler’s activity must be determined case by case, depending on the type of software observed by the respective capturer. Previous studies employ numerous approaches to capturing provenance from computational tasks ranging from observing the modeler’s operating system on the level of system calls [63] to providing an annotation syntax that modelers can include in their code [64]. The information gained from recording system calls is recorded completely automatically, but it also includes information irrelevant to the simulation study that is complicated to eliminate. On the other hand, asking the modeler to manually annotate their code requires continuous additional effort on their part, which we aim to minimize. So, we recommend using methods that compromise between recording targeted information and user involvement. We argue that two methods, using plugin interfaces and wrapping library functions, suffice to capture provenance from most types of software.

Using plugin interfaces We suggest using the readily available plugin interfaces for editors and IDEs, such as PyCharm [65], Visual Studio Code [66], or Notepad++ [67]. These plugin interfaces typically define functions executed whenever the underlying software system detects a user interaction, such as clicking a button or saving a file. In this case, the provenance capturer consists of a custom function that introspects the information about the user interaction to collect information about the modeler’s activity, which will then be used to identify the activity and the associated entities and agents.

Wrapping library functions When using software libraries, such as Hybrid Automata [68], Matplotlib [69], Tellurium [70] or SESSL [54], the modeler calls the library functions to execute tasks like running a simulation or analyzing the simulation data. Monitoring the called functions can detect the modeler’s activity in this context. Thus, we can implement a provenance capturer via wrapper functions that call the original function and introspect the function arguments and results to collect information about the activity.

Further, one-stop simulation systems, such as COPASI [71] or NetLogo [72], that provide an IDE for specific simulation models and use a simulation library for running simulation experiments with them might require combining both capturing approaches.

Identifying the activity and entities.

Independently of whether provenance has been captured based on an IDE or libraries, we need to identify the activity and entity types.

Entities: First, the provenance capturer identifies the type of all related entities. In general, the entities’ types can be deduced from the file’s content, its name (extension), or other conventions that make the file’s entity type explicit. Ideally, the provenance capturer can deduce the file’s entity type from its content or name. However, this is not always possible and the modeler has to make the entity type explicit—for example, by adhering to naming conventions. Which of (combinations of) these approaches can be used differs for the different entity types and depends on the modeler’s work practices.

Naturally, the interpretation of a simulation experiment or simulation model specification is facilitated by accessible syntax and semantics, e.g., in terms of meta-models about simulation experiments [23] or a formalization of a model language [73]. This also applies to formally specified behavioral requirements to be checked automatically [44,74]. In addition, such structures enable the provenance capturer to recognize other entities (from the conceptual model) used by the simulation model or experiment. For example, an experiment might import a behavioral requirement as a logic formula to check if the formula holds for a simulation model (the produced simulation data) in a certain configuration. As the behavioral requirement is directly imported, its involvement in the experiment execution can be inferred by inspecting the experiment file for imports.

Now, suppose the behavioral requirement was only available in text form. In that case, the experiment cannot import it directly. Hence, the provenance capturer will have no way of knowing that a behavioral requirement was involved with the execution of the experiment. Moreover, the capturer will not be able to distinguish the behavioral requirement from an assumption or qualitative model, for example, as all these entity types can come in the form of text which is hard to interpret automatically.

These problems exist not only for files of the entity type requirement but for all elements of the conceptual model. Table 1 shows that no entity type from the conceptual model can be identified universally and distinctively just from their file format (e.g., both research questions and assumptions are commonly specified in plain text files). In addition, most entities in the conceptual model lack not only a formalized but also a standardized syntax for seamless integration into files used by other entities. For instance, even if a requirement is specified using a formal approach, such as a logical formula, it may not be directly usable by the simulation experiment, depending on the compatibility between the used requirement checking approach and the specification language. Thus, the provenance capturer cannot automatically detect that these entities were used in the creation or execution of said files.

Therefore, we still need the modeler to make this information explicit—in the form of naming conventions in file names or annotations in files, for example.

Activity: Based on the identified entities and the action performed by the modeler, the provenance capturer infers the activity’s type. The entity’s type has been inferred from the entity’s file name or file introspection as described above. The activity performed by the modeler can be identified from the function triggering the provenance capturer: when a file is saved, the activity is specifying, and when a file is executed, the activity is executing.

Sending the information.

Finally, the provenance capturer encodes the provenance activity and entities in an event and sends it to the provenance builder. The event’s type corresponds to the activity type inferred in the preceding step. It includes information about all related agents and entities, including their attributes, such as file path, name, version, and content.

Provenance builder

As soon as the provenance builder receives a new event, it extracts the provenance activity with its entities and agents from the event. The provenance builder then has to accomplish two tasks:

Ensuring the consistency of the provenance information as defined by the provenance model and
Chaining the extracted provenance activities into a single provenance graph.

Ensuring consistency.

After unpacking the provenance activity with its entities and agents from the event, the provenance builder checks if the newly created provenance is complete and valid according to our provenance model. Using the defined provenance patterns, the model builder can not only check if all required entities are attached to the activity but can also make sure that all attributes such as file paths and names are available for each entity and agent.

If an attribute value is missing or invalid entities/agents are assigned to an activity, e.g., due to a lack of information in the original event, the activity is discarded and added to an error log that the user can inspect. Otherwise, the provenance activity (with its entities, agents, and dependencies) is chained with the already captured provenance graph.

Chaining into a single provenance graph.

Finally, the identified activity with its dependencies and entities is inserted into the provenance graph. The activity can be inserted directly because the activity has just been created and will not already exist within the graph. However, an entity can already be part of the provenance graph because the product the entity represents has been used or produced by a previous activity. Consequently, before inserting an entity, the application must check whether the entity to be inserted is already part of the graph. If this is the case, the entity is not inserted again, but the dependency is added between the latest version of the existing entity and the new activity. This is possible because whenever an entity is edited a new version of it is created as part of the recording.

Implementation

We implemented SIMPROV adopting a client-server architecture in which provenance capturers function as clients and a provenance builder operates as the server (see Fig 4). As the provenance capturers have to be developed individually for the specific software system or tool used by a modeler, we will explain our provenance capturer’s implementation in our case study, below. Independently of the individual provenance capturer’s implementation, though, all provenance capturers send their captured information to the provenance builder as an event using arbitrary key-value pairs in JSON [75]. Afterward, the event has to be sent via a POST request to the REST API of the provenance builder.

The provenance builder, in contrast, stays the same independently of the individual modeler’s choices. It processes incoming events by extracting the captured information and converting it into objects representing provenance activities, entities, and agents connected with dependencies as defined in PROV-DM. This new excerpt of a provenance graph is then matched to the activity type’s provenance pattern that is declaratively specified using YAML [76]. For all recorded activities that could not be successfully matched with the respective provenance pattern, we keep an error log collecting the respective events for manual inspection.

Also, we equipped the provenance builder with a web interface allowing the modeler to explore the captured provenance graph visually, introspect the error log, and download the stored provenance information for further processing or publishing [77] (see Fig 5). For this, a Node.js app was implemented that communicates with the provenance builder via its REST API. In summary, the app allows the modeler to

Download:

Fig 5. Screenshot of the SIMPROV web interface.

https://doi.org/10.1371/journal.pone.0327607.g005

visually explore the provenance graph rendered using Cytoscape [78],
download the provenance information in the PROV-JSON format [77] for reuse in other applications,
fill missing entity/agent attributes or add new dependencies between activities and entities/agents in case some information was not extracted by the rules, and
access the error log and events for debugging purposes.

Moreover, we also integrated various filtering approaches to reduce the number of nodes and edges shown in the graph, e.g., a custom abstraction-based filtering approach [29] that groups sequences of the same activity into a single activity, or a mechanism using the transitive reduction [79] of the provenance graph to remove edges.

Case studies

For the case studies, we reimplemented two models. First, we transferred and extended an mRNA delivery model from COPASI [71] to ML-Rules 3 [31], a simulation tool to efficiently simulate multi-level models including dynamic compartments. Based on this case study, we will explain the process of capturing the provenance graph in detail. Second, we took a Wnt/-catenin signaling model originally implemented in ML-Rules 2 [73] and reimplemented it in Tellurium [32]. The original paper [18] includes a manually drawn provenance graph, which we will compare with the automatically captured SIMPROV graph.

Capturing provenance from visual studio code and python

For both of the following case studies, the modeler uses Visual Studio Code as an editor for specifying

the conceptual model, consisting of the research question, literature references, requirements, and assumptions, using Markdown[81],
the simulation models using the rule-based multilevel modeling language ML-Rules [31] or Tellurium, and
the simulation experiments and scripts for visualizing and analyzing simulation data using Python 3.10 (https://www.python.org/).

Note that SIMPROV is designed to work with any software or tool. So, the following only reflects how SIMPROV can work for the combination of software used in our case studies. One might even choose to implement the provenance capturer for the same software differently if it better matches the software’s internal workings.

As the modeler conducts the whole simulation study with Visual Studio Code, we implemented only one provenance capturer. The capturer employs two capturing methods in combination: using a plugin interface and providing the Python utility library simprovhelper.

The plugin makes use of Visual Studio’s command palette. The command palette offers a list of commands that work similarly to macros. A command can, for example, create a new file and ask the user to enter a name for it. Our capturer plugin provides commands to create files for each kind of provenance entity explained above (i.e., requirements, simulation models, simulation experiments, etc.). Depending on the entity type, the modeler is only asked to enter a name for the new file. For some entity types, the modeler might be asked, in addition, to select already existing entities from the conceptual model that are used to specify the new provenance product. For example, for the specification of a simulation model, assumptions and requirements might be used to indicate how the simulated mechanisms should be implemented. As a result, the capturer knows from an entity’s creation its entity type, the file’s location, the file name, and what other entities from the conceptual model were used in its specification. After a new file has been created, the capturer creates a provenance event notifying the provenance builder that the respective entity has been created. In addition, the provenance capturer then tracks the entity’s file to detect when it is edited. When a tracked file is edited, the capturer will send an event of the type specifying <entity type>, including the old and new versions of the file, to the provenance builder.

The Python utility library simprovhelper provides functions that extract information (like the script’s content, used libraries, their versions, etc.) from scripts, experiment specifications, and their outputs. When a modeler creates a simulation experiment from the command palette, then that new file will already call a function from the simprovhelper library to collect all information about the experiment’s execution and contain a comment explaining how to use that function. The modeler can then implement their simulation experiment as they are used to. Before running the experiment for the first time, they only have to add the simulation model’s path and the path to the simulation experiment’s output data as parameters to the library function (as stated in the explanation in the comment). Now, once the modeler executes their experiment, the capturer will detect this activity because of the library function and collect all information associated with running an experiment. Then, as part of the library function, the capturer sends the gathered information to the provenance builder as a provenance event executing <entity type>.

These two steps instrumenting Visual Studio Code with a provenance capturer are only necessary once for both of the following case studies and can also be used for future studies. As long as we continue to use Visual Studio Code and Python and the same provenance model, we can use this instrumentation and do not need to install or create a capturer repeatedly.

mRNA delivery model

Understanding and exploiting the process of delivering mRNA into cells has been widely identified as one of the key challenges for developing vaccinations, protein replacement therapies, and treatments for genetic diseases [82–84]. One potential method involves the utilization of lipoplexes [85], i.e., small lipid spheres that shuttle into the cell and unpack the encapsulated mRNA.

In [30], the authors developed a simulation model that accurately predicts key features of the delivery process via lipoplexes, e.g., the likelihood of synthesizing a target protein in relation to the dose of lipoplexes placed in the cells’ environment. However, the simulation showed clustering in the absolute height of the target-protein expression curves (see Fig 9, left), which was not observed in wet-lab experiments. The authors identified the use of a fixed amount of mRNA particles per lipoplexes, originating from the limitation of the simulation software, as a likely cause of this behavior.

Download:

Fig 6. Reduced provenance graph showing the phases of the simulation study, including the A) the specification of the conceptual model, B) the step-wise re-implementation of the original model, and C) the adaptation of the simulation model and comparative visualization of the simulation results of both models.

https://doi.org/10.1371/journal.pone.0327607.g006

Download:

Fig 7. Overview of the conceptual modeling phase of the simulation study.

https://doi.org/10.1371/journal.pone.0327607.g007

Download:

Fig 8. Overview of an iteration of the re-implementation phase of the simulation study.

The modification on the right-hand side led to an error and an empty simulation data file that could not be visualized.

https://doi.org/10.1371/journal.pone.0327607.g008

In [31], we developed an efficient simulator for a variant of ML-Rules [80], i.e., ML-Rules 3. We used the lipoplex model implemented in ML-Rules 3 to showcase the features and the performance of the new simulator, as well as the impact of considering dynamic compartments on the simulation results. In the following, we will show how SIMPROV captured the provenance information while reimplementing the original model in ML-Rules 3, which allows the use of a varying number of mRNA per lipoplex.

Captured provenance graph.

The captured provenance graph of the simulation study consists of 153 nodes and 342 edges, comprising all details about the simulation study’s entities, activities, and agents. This fine-grained provenance information allows for tracing and introspecting all study products throughout the simulation study, significantly contributing to the desired reproducibility. However, it also introduces a challenge in providing an overview of the different phases within a simulation study.

The reduced provenance graph, as generated by SIMPROV, provides an overview of the different phases by aggregating chains of the same provenance activities into a single activity, resulting in a more coarse-grained provenance graph (see Fig 6).

From a birds-eye view, we can identify three phases: A) the specification of the conceptual model, B) the stepwise re-implementation of the original simulation model in ML-Rules, and C) the adaptation of the simulation model followed by comparative visualization of the simulation results of both models. In the following, we give an overview of each of the phases.

A) Conceptual modeling.

The simulation study started with the development of the conceptual model (see Fig 7). For this, we began with the specification of references referring to a publication about the delivery process via lipoplexes and the source of the original simulation model. Based on these references, the research question was specified, i.e., re-implement the original simulation model and analyze the models’ behavior for varying amounts of mRNA particles per lipoplex. Further, we used the references to deduce crucial assumptions for the simulation model, i.e., the uniform distribution of all simulated particles and the stochastic behavior of the system. Finally, we specified the requirement that our simulation model be able to reproduce the results shown in the original publication.

From the modeler’s perspective, SIMPROV captured the provenance information of the entire conceptual modeling process of the simulation study. However, the modeler had to manually specify the relation between the different conceptual aspects as part of their specification, e.g., using a reference for specifying an assumption, as these relations could not be inferred.

B) Re-implementation of the original model.

After developing the conceptual model, we started re-implementing the original simulation model following an incremental approach in ML-Rules (see Fig 8). In each iteration, our simulation model was extended by another reaction rule, e.g., a rule to describe the attachment procedure of lipoplexes to the cell. The newly added reaction rule was tested, e.g., whether and how often the reaction rule fired and whether the dynamics of the reaction rule and the model appeared plausible. For this, an initial simulation experiment was developed and subsequently adapted to generate the necessary simulation data to be visualized. The iterations continued until all mechanisms of the original model were integrated as reaction rules and the original simulation data were reproduced (see Fig 9 left).

In contrast to the specification of the conceptual aspects, all relevant provenance information from this phase is directly captured from Python scripts responsible for generating and visualizing the simulation data. Thus, capturing the provenance information of this phase integrates seamlessly into the modelers’ workflow.

C) Adaptation of the model.

In the simulation study’s final phase, the model was adapted to support a varying amount of mRNA particles per lipoplex, which was not possible in the original model. For this, we copied the ML-Rules model (of B) and removed the fixed initial solution from the specification. As a consequence, we now rely on the simulation experiment to dynamically generate an initial solution with a varying amount of mRNA, i.e., by sampling the diameters of the lipoplexes from a normal distribution and calculating the number of mRNA particles based on the equation given by [86], and inject it into the model specification before running the simulation.

We collected the simulation data from 100 simulation runs per model and visualized and analyzed the resulting simulation data (see Fig 9 and Table 2). In the visualization and the table, we see a significant reduction of the clustering in the different trajectories of our adapted simulation model.

Download:

Table 2. Distribution (Mean

SD) of target protein amounts in a single cell at the 30-hour mark, based on 100 trajectories, for fixed and varying mRNA content per lipoplex particle. Here, n denotes the number of lipoplexes that unpack their mRNA in the cell. Note: Only one trajectory resulted in n = 6 unpacking lipoplexes in the fixed mRNA model, so the standard deviation could not be computed. No such trajectories were observed in the varying mRNA model, preventing calculation of both the mean and standard deviation for that case. This outcome is consistent with the expectations of the original simulation model’s authors [30].

https://doi.org/10.1371/journal.pone.0327607.t002

Similar to the previous phase, all the provenance information from the simulation experiments and visualization of the results are directly captured (see Fig 10) However, the cloning and adaptation of the original simulation model initially appear as a Specifying Simulation Model activity generating a new simulation model. The relation to the old simulation model cannot be inferred using the rules. Thus, the modeler manually added a dependency (green arrow) to make the relations between both models explicit.

Wnt/ catenin model

The Wnt/ catenin signaling pathway is essential during embryonic development and adult tissue homeostasis [87]. One part of this pathway is the internalization of the LRP6 receptor. In the paper [18], four different mechanisms of internalization were modeled and compared to wet-lab data. The model-building process is documented in a provenance graph. We reimplemented the first two models (M1 and M2) in Tellurium to compare the manually drawn provenance graphs to the provenance graph produced by SIMPROV.

No additional changes were necessary to use Tellurium with SIMPROV since the already existing utility library for the modeler can be reused to report the provenance information from the simulation experiments to the provenance builder. Since the capturing process was described in detail in the previous section, we will only focus on comparing the two graphs in this section. In addition, we will show how a working environment for simulation can be adapted to work with SIMPROV supporting an automatic documentation of provenance.

Manually vs. automatically generated provenance graphs.

Fig 11 shows the manually drawn provenance graph and Fig 12 the automatically generated SIMPROV graph. There are two main differences between the two graphs. First, model M1 is based on five references in the original provenance graph, while the SIMPROV graph also lists assumptions, requirements, and a research question. However, these additional nodes are optional. The second difference is in the generation of data. In the original graph, a simulation experiment and data (related to figures) are generated by an activity that uses the simulation model (and possibly other sources); this activity aggregates the specification and execution activities that are distinguished in the provenance graph produced by SIMPROV.

Download:

Fig 9. Comparative visualization of the 100 trajectories of the simulation model for a fixed and varying amount of mRNA per lipoplex particle.

The colors indicate the number of lipoplexes that unpack their mRNA in the cell.

https://doi.org/10.1371/journal.pone.0327607.g009

Download:

Fig 10. Adaptation of the simulation model and visualization of the different simulation results.

The green arrow indicates a dependency manually added by the modeler to describe the relation between the re-implemented original model in ML-Rules 3 and the adapted simulation model in ML-Rules 3.

https://doi.org/10.1371/journal.pone.0327607.g010

Download:

Fig 11. Original provenance graph from [18].

https://doi.org/10.1371/journal.pone.0327607.g011

Download:

Fig 12. SIMPROV graph.

https://doi.org/10.1371/journal.pone.0327607.g012

Despite these minor differences, both graphs are very similar in visualizing which references are used to build the different models and which data/figures are plotted from these models.

Moreover, we contacted the corresponding author of the original paper [18] and asked for an estimate of the time needed to create the original provenance graph. According to the author, identifying the relevant entities and their dependencies and drawing the graph took approximately 10 hours after writing the paper. In contrast, our provenance graph was directly captured throughout the simulation study, and we took about 5 minutes to arrange the graph as shown in Fig 12.

Discussion

In developing SIMPROV, we set out to capture the provenance of entire simulation studies with minimal user involvement. Now, we will discuss the range of the automatically captured information, the degree of user interaction necessary, and the effort required to support new tools.

.

What can be deduced from the modeler’s activities?

The provenance information that our approach can deduce highly depends on the implementations of the provenance capturers and how the modeler works. In the above case studies, SIMPROV with the Visual Studio Code capturer can fully automatically detect and record when a file is edited and when an experiment is executed by using plugin interfaces and wrapping library functions. When the modeler edits a file, both the old version of the file and the new one are recorded with the detected activity without any additional information from the modeler. Similarly, once the modeler executes a simulation experiment, the capturer records the activity, the executed experiment, and the resulting simulation data automatically. The provenance capturer can easily identify entity and activity types because the modeler chooses a command from Visual Studio’s command palette that creates a file specifically for the chosen entity type. And, finally, the commands prompt the modeler to choose which entities from the conceptual model are used to specify simulation models and experiments. So, all entities and activities described above can be recorded with SIMPROV.

What has to be specified by interaction with the modeler?

However, to identify and record some of the activities we require the modeler to provide information that cannot be collected otherwise. The modeler has to tell the provenance capturer what entity types the files belong to and which entities from the conceptual model are used to specify simulation models and experiments. This paragraph will look into why we require these inputs from the modeler and how the information could be gained alternatively.

Since there are many different forms that the entities of simulation studies can take, no entity can always be identified unambiguously purely based on its form. Table 1 shows that some entities can occur in forms that are unique for the entity. For example, only simulation models can come in the form of simulation modeling formalisms. However, they can just as well be specified using a general-purpose programming language that could also be used to specify simulation experiments or any other entity. Therefore, the entity types of products generally cannot be discerned from the file type alone. For that reason, we require the modeler to name the entities explicitly by choosing the correct command (like in the case study above) or following a naming convention supported by the respective provenance capturer.

In addition, the modeler could use other tools to specify different products as long as each tool is equipped with a corresponding capturer. For example, they might use Visual Studio Code only to specify the simulation model and experiments. Then, they could use a wiki to document the conceptual model [40] which they instrumented with a capturer that sends events to the provenance builder whenever a wiki page is edited. In that case, their wiki structure would make the entity types of the respective products explicit, as well as the necessary metadata. Similarly, a documentation of the simulation study (e.g., Tellurium notebooks) could be instrumented with a capturer to automatically collect provenance (cf. instrumentation of Jupyter notebooks for experiment generation [88]). All in all, the modeler can use whatever tools they prefer, as long as the products’ entity types are made explicit and the tools are equipped with capturers.

But even if all the entities can be captured, it is difficult to automatically record which elements of the conceptual model were used to specify other products. For example, the modeler might read a description of a mechanism in a paper and then include that mechanism in their simulation model. Since there is no direct link from the simulation model file to the paper file, the dependency between the specify simulation model activity and the paper’s information entity has to be added manually by the modeler. Therefore, our Visual Studio Code capturer allows the modeler to select which elements from the conceptual model are used in an entity’s creation in our case study. Consequently, this reliance on user annotations can introduce inconsistencies in the provenance information we initially sought to prevent. Future capture could address this issue by storing the information provided by the modeler externally or by turning off user annotation editing to prevent the modeler from accidentally introducing inconsistencies.

Again, it might be possible to fully automatically capture the dependencies between activities and entities from the conceptual model in some cases. Take the example from above, where an experiment imports an assumption as a logic formula to check if the formula holds for a simulation model in a certain configuration. In this case, the assumption’s involvement in the experiment specification and execution can be inferred by inspecting the experiment file for imports. However, this only works for entities that are directly imported by another entity.

How much implementation effort is needed?

To use SIMPROV in a simulation study, a modeler has to only start the provenance builder and use the software systems that already integrate provenance capturers. After a simulation study, a JSON representation of the provenance information [77] can be downloaded for further processing or publishing.

However, SIMPROV is also associated with a trade-off as additional effort is required to implement the provenance capturers, and to specify the provenance patterns for the provenance builder. Examples of this were shown in the case study for Visual Studio Code and Tellurium. However, since modelers tend to stick to their familiar modeling and analysis environment, once everything is implemented, no additional effort will be required for their future simulation studies.

What insights do the captured provenance graphs provide?

Currently, SIMPROV improves the readability of the provenance graphs by aggregating chains of similar activities. The reduced provenance graphs convey information about the phases of conceptual modeling, simulation model specification, and model analysis via simulation experiments.

Yet, the captured provenance graph could still be further processed than in our current implementation to reveal even more insights into its simulation study. As shown in [29], different kinds of reductions may be applied on demand to generate distinct provenance views emphasizing different aspects of a conducted simulation study.

Moreover, it might also prove useful to capture and present the intentions behind the activities. For instance, experiments may be conducted for the purpose of calibration or validation [17]. Thus, the activity type Execute Simulation Experiment could be split into two activity types Execute Validation Experiment and Execute Validation Experiment. To identify these automatically, the types of entities used and generated by experiment executions could be held to account. The definitions of what types of entities are used and generated by experiments with which intentions are available in [22].

Conclusion and future directions

A 2021 study showed that 49% of models in systems biology are not reproducible [24]. The main reasons were insufficient, incomplete, or ambiguous reporting. Thus, in this paper, we introduced SIMPROV as a lightweight method for automatically capturing provenance information during an entire simulation study with minimal user involvement. In the captured provenance graphs, not only the final products of the simulation study are reported, but also intermediate activities, products (including requirements and assumptions), and their interrelations. This explicit story of the simulation study can complement existing model documentation and facilitate the interpretation and reusability of results.

In our method, provenance capturers are responsible for detecting and collecting information about the modeler’s activity from the different software systems used in the simulation study. A provenance builder accepts this information, extracts, and validates provenance activity by declarative specifications of provenance patterns. Finally, the activity will be chained into a provenance graph, which can be further visually explored via a web interface. Further, we demonstrated how our software system could be applied and extended to illuminate the different phases of a simulation study using two-cell biological case studies.

The starting point for our approach has been that developing simulation models is an iterative and knowledge-intensive process involving the activities of domain-experts [89]. This also implied that some parts of the above process could not be automatically captured, e.g., which part of a publication was used in developing the simulation model. However, new methods challenge this traditional development of simulation models. In the last decade, significant advances have been made toward learning entire simulation models from data. The reactionet lasso, a structure learning approach, takes advantage of information-rich single-cell data, a tractable problem formulation, and partial knowledge about networks [90]. Other approaches are based on SINDy (Sparse Identification of Nonlinear Dynamics) [91] to estimate parsimonious biochemical reaction networks based on time series [92,93]. Again, others combine information extraction from literature, clustering, simulation, and formal analysis to support an automated assembling, testing, and selection of context-specific models [94,95]. All these approaches have in common that everything that is used to develop the simulation model has to be available and thus will also be accessible to the respective capturers. Consequently, these new developments will increase the part of simulation studies that becomes amenable to automatic capturing.

References

1. Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, et al. Minimum information requested in the annotation of biochemical models (MIRIAM). Nat Biotechnol. 2005;23(12):1509–15. pmid:16333295
- View Article
- PubMed/NCBI
- Google Scholar
2. Waltemath D, Adams R, Beard DA, Bergmann FT, Bhalla US, Britten R, et al. Minimum Information About a Simulation Experiment (MIASE). PLoS Comput Biol. 2011;7(4):e1001122. pmid:21552546
- View Article
- PubMed/NCBI
- Google Scholar
3. Porubsky VL, Goldberg AP, Rampadarath AK, Nickerson DP, Karr JR, Sauro HM. Best practices for making reproducible biochemical models. Cell Syst. 2020;11(2):109–20. pmid:32853539
- View Article
- PubMed/NCBI
- Google Scholar
4. Neal ML, König M, Nickerson D, Mısırlı G, Kalbasi R, Dräger A, et al. Harmonizing semantic annotations for computational models in biology. Brief Bioinform. 2019;20(2):540–50. pmid:30462164
- View Article
- PubMed/NCBI
- Google Scholar
5. Grimm V, Augusiak J, Focks A, Frank BM, Gabsi F, Johnston AS, et al. Towards better modelling and decision support: documenting model development, testing, and analysis using TRACE. Ecol Model. 2014;280:129–39.
- View Article
- Google Scholar
6. Monks T, Currie CSM, Onggo BS, Robinson S, Kunc M, Taylor SJE. Strengthening the reporting of empirical simulation studies: Introducing the STRESS guidelines. J Simulat. 2018;13(1):55–67.
- View Article
- Google Scholar
7. Bergmann FT, Adams R, Moodie S, Cooper J, Glont M, Golebiewski M, et al. COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC Bioinformatics. 2014;15(1):369. pmid:25494900
- View Article
- PubMed/NCBI
- Google Scholar
8. Eriksson O, Bhalla US, Blackwell KT, Crook SM, Keller D, Kramer A, et al. Combining hypothesis- and data-driven neuroscience modeling in FAIR workflows. Elife. 2022;11:e69013. pmid:35792600
- View Article
- PubMed/NCBI
- Google Scholar
9. Herschel M, Diestelkämper R, Ben Lahmar H. A survey on provenance: what for? What form? What from?. VLDB J. 2017;26:881–906.
- View Article
- Google Scholar
10. Groth P, Moreau L. PROV-overview. An overview of the PROV family of documents. World Wide Web Consortium; 2013.
11. Scharm M, Gebhardt T, Touré V, Bagnacani A, Salehzadeh-Yazdi A, Wolkenhauer O, et al. Evolution of computational models in BioModels database and the physiome model repository. BMC Syst Biol. 2018;12(1):53. pmid:29650016
- View Article
- PubMed/NCBI
- Google Scholar
12. Krause F, Uhlendorf J, Lubitz T, Schulz M, Klipp E, Liebermeister W. Annotation and merging of SBML models with semanticSBML. Bioinformatics. 2010;26(3):421–2. pmid:19933161
- View Article
- PubMed/NCBI
- Google Scholar
13. Uhrmacher AM, Frazier P, H ähnle R, K lügl F, L orig F, L ud äscher B, et al. Context, composition, automation, and communication: the C2AC roadmap for modeling and simulation. ACM Trans Model Comput Simulat. 2024;34(4):1–51.
- View Article
- Google Scholar
14. Ruscheinski A, Uhrmacher A. Provenance in modeling and simulation studies—Bridging gaps. In: 2017 Winter Simulation Conference (WSC). IEEE; 2017. p. 872–83.
15. Reinhardt O, Rucheinski A, Uhrmacher AM. ODD P: complementing the ODD protocol with provenance information. In: 2018 Winter Simulation Conference (WSC). 2018. p. 727–38.
16. Wilsdorf P, Reinhardt O, Prike T, Hinsch M, Bijak J, Uhrmacher AM. Simulation studies of social systems - telling the story based on provenance patterns. Roy Soc Open Sci. 2024.
- View Article
- Google Scholar
17. Budde K, Smith J, Wilsdorf P, Haack F, Uhrmacher AM. Relating simulation studies by provenance-developing a family of Wnt signaling models. PLoS Comput Biol. 2021;17(8):e1009227. pmid:34351901
- View Article
- PubMed/NCBI
- Google Scholar
18. Haack F, Budde K, Uhrmacher AM. Exploring the mechanistic and temporal regulation of LRP6 endocytosis in canonical WNT signaling. J Cell Sci. 2020;133(15):jcs243675. pmid:32661084
- View Article
- PubMed/NCBI
- Google Scholar
19. Taylor SJ, Eldabi T, Monks T, Rabe M, Uhrmacher AM. Crisis, what crisis–does reproducibility in modeling & simulation really matter?. In: 2018 Winter Simulation Conference (WSC). 2018. p. 749–62.
20. Prike T. Open science, replicability, and transparency in modelling. Towards Bayesian model-based demography. Springer. 2022. p. 175–83.
21. Niarakis A, Waltemath D, Glazier J, Schreiber F, Keating SM, Nickerson D, et al. Addressing barriers in comprehensiveness, accessibility, reusability, interoperability and reproducibility of computational models in systems biology. Brief Bioinform. 2022;23(4):bbac212. pmid:35671510
- View Article
- PubMed/NCBI
- Google Scholar
22. Wilsdorf P, Wolpers A, Hilton J, Haack F, Uhrmacher A. Automatic reuse, adaption, and execution of simulation experiments via provenance patterns. ACM Trans Model Comput Simul. 2023;33(1–2):1–27.
- View Article
- Google Scholar
23. Wilsdorf P, Heller J, Budde K, Zimmermann J, Warnke T, Haubelt C, et al. A model-driven approach for conducting simulation experiments. Appl Sci. 2022;12(16):7977.
- View Article
- Google Scholar
24. Tiwari K, Kananathan S, Roberts MG, Meyer JP, Sharif Shohan MU, Xavier A, et al. Reproducibility in systems biology modelling. Mol Syst Biol. 2021;17(2):e9982. pmid:33620773
- View Article
- PubMed/NCBI
- Google Scholar
25. Davidson S, Boulakia S, Eyal A, Ludascher B, McPhillips T, Bowers S. Provenance in scientific workflow systems. IEEE Data Eng Bull. 2007;30(4):44–50.
- View Article
- Google Scholar
26. Di Ciccio C, Marrella A, Russo A. Knowledge-intensive processes: characteristics, requirements and analysis of contemporary approaches. J Data Semantics. 2015;4:29–57.
- View Article
- Google Scholar
27. van Der Aalst WM, Pesic M, Schonenberg H. Declarative workflows: balancing between flexibility and support. Comput Sci-Res Develop. 2009;23:99–113.
- View Article
- Google Scholar
28. Ruscheinski A, Warnke T, Uhrmacher AM. Artifact-based workflows for supporting simulation studies. IEEE Trans Knowl Data Eng. 2019;32(6):1064–78.
- View Article
- Google Scholar
29. Ruscheinski A, Wilsdorf P, Dombrowsky M, Uhrmacher AM. Capturing and reporting provenance information of simulation studies based on an artifact-based workflow approach. In: Proceedings of the 2019 ACM SIGSIM conference on principles of advanced discrete simulation. 2019. p. 185–96.
30. Ligon TS, Leonhardt C, Rädler JO. Multi-level kinetic model of mRNA delivery via transfection of lipoplexes. PLoS One. 2014;9(9):e107148. pmid:25237886
- View Article
- PubMed/NCBI
- Google Scholar
31. Köster T, Henning P, Warnke T, Uhrmacher A. Expressive modeling and fast simulation for dynamic compartments. Cold Spring Harbor Laboratory. 2024. https://doi.org/10.1101/2024.04.02.587672
32. Choi K, Medley JK, König M, Stocking K, Smith L, Gu S, et al. Tellurium: An extensible python-based modeling environment for systems and synthetic biology. Biosystems. 2018;171:74–9. pmid:30053414
- View Article
- PubMed/NCBI
- Google Scholar
33. Suh YK, Lee KY. A survey of simulation provenance systems: modeling, capturing, querying, visualization, and advanced utilization. Human-centric Comput Inf Sci. 2018;8(1):27.
- View Article
- Google Scholar
34. Davidson SB, Freire J. Provenance and scientific workflows. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008. p. 1345–50. https://doi.org/10.1145/1376616.1376772
35. Belhajjame K, B’Far R, Cheney J, Coppens S, Cresswell S, Gil Y. PROV-DM: the prov data model. W3C; 2013.
36. Ruscheinski A, Gjorgevikj D, Dombrowsky M, Budde K, Uhrmacher AM. Towards a PROV ontology for simulation models. In: Belhajjame K, Gehani A, Alper P, editors. Provenance and annotation of data and processes. Cham: Springer; 2018. p. 192–5.
37. Schroder M, Raben H, Kruger F, Ruscheinski A, van Rienen U, Uhrmacher A, et al. PROVenance patterns in numerical modelling and finite element simulation processes of bio-electric systems. Annu Int Conf IEEE Eng Med Biol Soc. 2019;2019:3377–82. pmid:31946605
- View Article
- PubMed/NCBI
- Google Scholar
38. Balci O. A life cycle for modeling and simulation. Simulation. 2012;88(7):870–83.
- View Article
- Google Scholar
39. Sargent RG. Verification and validation of simulation models. In: Proceedings of the 2010 Winter Simulation Conference. 2010. p. 166–83. https://ieeexplore.ieee.org/abstract/document/5679166
40. Wilsdorf P, Haack F, Uhrmacher AM. Conceptual models in simulation studies: making it explicit. In: 2020 Winter Simulation Conference (WSC). 2020. p. 2353–64. https://ieeexplore.ieee.org/document/9383984/?arnumber=9383984
41. Robinson S. Conceptual modelling for simulation Part I: definition and requirements. J Oper Res Soc. 2008;59(3):278–90.
- View Article
- Google Scholar
42. Bock C, Dandashi F, Friedenthal S, Harrison N, Jenkins S, McGinnis L. Conceptual modeling. Research challenges in modeling and simulation for engineering complex systems. 2017. p. 23–44.
43. Peng D, Warnke T, Haack F, Uhrmacher AM. Reusing simulation experiment specifications to support developing models by successive extension. Simulat Model Pract Theory. 2016;68:33–53.
- View Article
- Google Scholar
44. Mitra ED, Suderman R, Colvin J, Ionkov A, Hu A, Sauro HM, et al. PyBioNetFit and the biological property specification language. iScience. 2019;19:1012–36. pmid:31522114
- View Article
- PubMed/NCBI
- Google Scholar
45. Courtot M, Juty N, Knüpfer C, Waltemath D, Zhukova A, Dräger A, et al. Controlled vocabularies and semantics in systems biology. Mol Syst Biol. 2011;7:543. pmid:22027554
- View Article
- PubMed/NCBI
- Google Scholar
46. Crielaard L, Uleman JF, Châtel BDL, Epskamp S, Sloot PMA, Quax R. Refining the causal loop diagram: a tutorial for maximizing the contribution of domain expertise in computational system dynamics modeling. Psychol Methods. 2024;29(1):169–201. pmid:35549316
- View Article
- PubMed/NCBI
- Google Scholar
47. Rougny A, Touré V, Moodie S, Balaur I, Czauderna T, Borlinghaus H, et al. Systems biology graphical notation: process description language level 1 version 2.0. J Integr Bioinform. 2019;16(2):20190022. pmid:31199769
- View Article
- PubMed/NCBI
- Google Scholar
48. Cellier FE, Greifeneder J. Continuous system modeling. Springer; 2013.
49. Balbo G. Introduction to stochastic petri nets. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer; 2001. p. 84–155. https://doi.org/10.1007/3-540-44667-2_3
50. Fowler M. Domain-specific languages. Pearson Education. 2010.
51. Harris LA, Hogg JS, Tapia J-J, Sekar JAP, Gupta S, Korsunsky I, et al. BioNetGen 2.2: advances in rule-based modeling. Bioinformatics. 2016;32(21):3366–8. pmid:27402907
- View Article
- PubMed/NCBI
- Google Scholar
52. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–31. pmid:12611808
- View Article
- PubMed/NCBI
- Google Scholar
53. Lloyd CM, Halstead MDB, Nielsen PF. CellML: its future, present and past. Prog Biophys Mol Biol. 2004;85(2–3):433–50. pmid:15142756
- View Article
- PubMed/NCBI
- Google Scholar
54. Ewald R, Uhrmacher AM. SESSL: a domain-specific language for simulation experiments. ACM Trans Model Comput Simulat. 2014;24(2):1–25.
- View Article
- Google Scholar
55. Waltemath D, Adams R, Bergmann FT, Hucka M, Kolpakov F, Miller AK, et al. Reproducible computational biology experiments with SED-ML–the Simulation Experiment Description Markup Language. BMC Syst Biol. 2011;5:198. pmid:22172142
- View Article
- PubMed/NCBI
- Google Scholar
56. Salecker J, Sciaini M, Meyer KM, Wiegand K. The nlrx r package: a next‐generation framework for reproducible NetLogo model analyses. Methods Ecol Evol. 2019;10(11):1854–63.
- View Article
- Google Scholar
57. Sanchez SM, Sanchez PJ, Wan H. Work smarter, not harder: a tutorial on designing and conducting simulation experiments. In: 2020 Winter Simulation Conference (WSC). 2020. p. 1128–42.
58. Edeling W, Arabnejad H, Sinclair R, Suleimenova D, Gopalakrishnan K, Bosak B, et al. The impact of uncertainty on predictions of the CovidSim epidemiological code. Nat Comput Sci. 2021;1(2):128–35. pmid:38217226
- View Article
- PubMed/NCBI
- Google Scholar
59. Haack F, Lemcke H, Ewald R, Rharass T, Uhrmacher AM. Spatio-temporal model of endogenous ROS and raft-dependent WNT/beta-catenin signaling driving cell fate commitment in human neural progenitor cells. PLoS Comput Biol. 2015;11(3):e1004106. pmid:25793621
- View Article
- PubMed/NCBI
- Google Scholar
60. Kam HR, Lee S-H, Park T, Kim C-H. RViz: a toolkit for real domain data visualization. Telecommun Syst. 2015;60(2):337–45.
- View Article
- Google Scholar
61. Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S. Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, 2004. 2004. p. 423–4.
62. Eek RE, Miller DD. Agent-based provenance architecture. In: 2011-MILCOM 2011 Military Communications Conference. 2011. p. 1499–505.
63. Muniswamy-Reddy KK, Braun U, Holland DA, Macko P, Maclean D, Margo D. Layering in provenance systems. provenance systems.
64. McPhillips T, Song T, Kolisnik T, Aulenbach S, Belhajjame K, Bocinsky K. YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. arXiv preprint 2015. http://arxiv.org/abs/1502.02403
- View Article
- Google Scholar
65. PyCharm website. [cited 2024 Feb 3]. https://www.jetbrains.com/pycharm/
66. Visual Studio Code Website. [cited 2024 Feb 3]. https://code.visualstudio.com/
67. Notepad. [cited 2024 Feb 3]. https://notepad-plus-plus.org/
68. Bravo RR, Baratchart E, West J, Schenck RO, Miller AK, Gallaher J, et al. Hybrid automata library: a flexible platform for hybrid modeling with real-time visualization. PLoS Comput Biol. 2020;16(3):e1007635. pmid:32155140
- View Article
- PubMed/NCBI
- Google Scholar
69. Barrett P, Hunter J, Miller JT, Hsu JC, Greenfield P. matplotlib–a portable python plotting package. In: Astronomical data analysis software and systems XIV. 2005. p. 91.
70. Choi K, Medley JK, König M, Stocking K, Smith L, Gu S, et al. Tellurium: an extensible python-based modeling environment for systems and synthetic biology. Biosystems. 2018;171:74–9. pmid:30053414
- View Article
- PubMed/NCBI
- Google Scholar
71. Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, et al. COPASI–a COmplex PAthway SImulator. Bioinformatics. 2006;22(24):3067–74. pmid:17032683
- View Article
- PubMed/NCBI
- Google Scholar
72. Tisue S, Wilensky U. Netlogo: a simple environment for modeling complexity. In: International Conference on Complex Systems. Boston, MA. 2004. p. 16–21.
73. Helms T, Warnke T, Maus C, Uhrmacher AM. Semantics and efficient simulation algorithms of an expressive multilevel modeling language. ACM Trans Model Comput Simulat. 2017;27(2):1–25.
- View Article
- Google Scholar
74. Agha G, Palmskog K. A survey of statistical model checking. ACM Trans Model Comput Simulat. 2018;28(1):1–39.
- View Article
- Google Scholar
75. Bray T. The javascript object notation (json) data interchange format. 2014.
76. Ben-Kiki O, Evans C, Ingerson B. Yaml ain’t markup language (yaml™) version 1.1 working draft. YAML. 2009;5:11.
- View Article
- Google Scholar
77. Huynh TD, Jewell MO, Sezavar Keshavarz A, Michaelides DT, Yang H, Moreau L. The prov-json serialization. 2013. https://www.w3.org/submissions/prov-json/
- View Article
- Google Scholar
78. Franz M, Lopes CT, Huck G, Dong Y, Sumer O, Bader GD. Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics. 2016;32(2):309–11. pmid:26415722
- View Article
- PubMed/NCBI
- Google Scholar
79. Aho AV, Garey MR, Ullman JD. The transitive reduction of a directed graph. SIAM J Comput. 1972;1(2):131–7.
- View Article
- Google Scholar
80. Maus C, Rybacki S, Uhrmacher AM. Rule-based multi-level modeling of cell biological systems. BMC Syst Biol. 2011;5:166. pmid:22005019
- View Article
- PubMed/NCBI
- Google Scholar
81. Cone M. The Markdown Guide. 2022. https://www.markdownguide.org/
- View Article
- Google Scholar
82. Kowalski PS, Rudra A, Miao L, Anderson DG. Delivering the messenger: advances in technologies for therapeutic mRNA delivery. Mol Ther. 2019;27(4):710–28. pmid:30846391
- View Article
- PubMed/NCBI
- Google Scholar
83. Sahin U, Karikó K, Türeci Ö. mRNA-based therapeutics--developing a new class of drugs. Nat Rev Drug Discov. 2014;13(10):759–80. pmid:25233993
- View Article
- PubMed/NCBI
- Google Scholar
84. Chaudhary N, Weissman D, Whitehead KA. mRNA vaccines for infectious diseases: principles, delivery and clinical translation. Nat Rev Drug Discov. 2021;20(11):817–38. pmid:34433919
- View Article
- PubMed/NCBI
- Google Scholar
85. Zhang W, Jiang Y, He Y, Boucetta H, Wu J, Chen Z, et al. Lipid carriers for mRNA delivery. Acta Pharm Sin B. 2023;13(10):4105–26. pmid:37799378
- View Article
- PubMed/NCBI
- Google Scholar
86. Leonhardt C, Schwake G, Stögbauer TR, Rappl S, Kuhr J-T, Ligon TS, et al. Single-cell mRNA transfection studies: delivery, kinetics and statistics by numbers. Nanomedicine. 2014;10(4):679–88. pmid:24333584
- View Article
- PubMed/NCBI
- Google Scholar
87. Liu J, Xiao Q, Xiao J, Niu C, Li Y, Zhang X, et al. Wnt/β-catenin signalling: function, biological mechanisms, and therapeutic opportunities. Signal Transduct Target Ther. 2022;7(1):3. pmid:34980884
- View Article
- PubMed/NCBI
- Google Scholar
88. Wilsdorf P, Kirchhübel AW, Uhrmacher AM. NBSIMGEN: jupyter notebook extension for generating simulation experiments. In: 2023 Winter Simulation Conference (WSC). 2023. p. 2884–95.
89. Shannon RE. Introduction to the art and science of simulation. In: 1998 Winter Simulation Conference Proceedings (cat. no. 98ch36274). 1998. p. 7–14.
90. Klimovskaia A, Ganscha S, Claassen M. Sparse regression based structure learning of stochastic reaction networks from single cell snapshot time series. PLoS Comput Biol. 2016;12(12):e1005234. pmid:27923064
- View Article
- PubMed/NCBI
- Google Scholar
91. Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Natl Acad Sci U S A. 2016;113(15):3932–7. pmid:27035946
- View Article
- PubMed/NCBI
- Google Scholar
92. Hoffmann M, Fröhner C, Noé F. Reactive SINDy: discovering governing reactions from concentration data. J Chem Phys. 2019;150(2):025101. pmid:30646700
- View Article
- PubMed/NCBI
- Google Scholar
93. Burrage PM, Weerasinghe HN, Burrage K. Using a library of chemical reactions to fit systems of ordinary differential equations to agent-based models: a machine learning approach. Numer Algorithms. 2024:1–15.
- View Article
- Google Scholar
94. Butchy AA, Arazkhani N, Telmer CA, Miskov-Zivanov N. Automating knowledge-driven model recommendation: methodology, evaluation, and key challenges. IEEE Trans Comput Biol Bioinform. 2025.
- View Article
- Google Scholar
95. Ahmed Y, Telmer CA, Zhou G, Miskov-Zivanov N. Context-aware knowledge selection and reliable model recommendation with ACCORDION. Front Syst Biol. 2024;4.
- View Article
- Google Scholar
96. Teran-Somohano A, Smith AE, Ledet J, Yilmaz L, Og˘uztüzün H. A model-driven engineering approach to simulation experiment design and execution. In: 2015 Winter Simulation Conference (WSC). IEEE; 2015. p. 2632–43.

[ref1] 1. Le Novère N, Finney A, Hucka M, Bhalla US, Campagne F, Collado-Vides J, et al. Minimum information requested in the annotation of biochemical models (MIRIAM). Nat Biotechnol. 2005;23(12):1509–15. pmid:16333295
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Waltemath D, Adams R, Beard DA, Bergmann FT, Bhalla US, Britten R, et al. Minimum Information About a Simulation Experiment (MIASE). PLoS Comput Biol. 2011;7(4):e1001122. pmid:21552546
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Porubsky VL, Goldberg AP, Rampadarath AK, Nickerson DP, Karr JR, Sauro HM. Best practices for making reproducible biochemical models. Cell Syst. 2020;11(2):109–20. pmid:32853539
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Neal ML, König M, Nickerson D, Mısırlı G, Kalbasi R, Dräger A, et al. Harmonizing semantic annotations for computational models in biology. Brief Bioinform. 2019;20(2):540–50. pmid:30462164
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Grimm V, Augusiak J, Focks A, Frank BM, Gabsi F, Johnston AS, et al. Towards better modelling and decision support: documenting model development, testing, and analysis using TRACE. Ecol Model. 2014;280:129–39.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref6] 6. Monks T, Currie CSM, Onggo BS, Robinson S, Kunc M, Taylor SJE. Strengthening the reporting of empirical simulation studies: Introducing the STRESS guidelines. J Simulat. 2018;13(1):55–67.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref7] 7. Bergmann FT, Adams R, Moodie S, Cooper J, Glont M, Golebiewski M, et al. COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project. BMC Bioinformatics. 2014;15(1):369. pmid:25494900
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref8] 8. Eriksson O, Bhalla US, Blackwell KT, Crook SM, Keller D, Kramer A, et al. Combining hypothesis- and data-driven neuroscience modeling in FAIR workflows. Elife. 2022;11:e69013. pmid:35792600
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref9] 9. Herschel M, Diestelkämper R, Ben Lahmar H. A survey on provenance: what for? What form? What from?. VLDB J. 2017;26:881–906.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref10] 10. Groth P, Moreau L. PROV-overview. An overview of the PROV family of documents. World Wide Web Consortium; 2013.

[ref11] 11. Scharm M, Gebhardt T, Touré V, Bagnacani A, Salehzadeh-Yazdi A, Wolkenhauer O, et al. Evolution of computational models in BioModels database and the physiome model repository. BMC Syst Biol. 2018;12(1):53. pmid:29650016
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref12] 12. Krause F, Uhlendorf J, Lubitz T, Schulz M, Klipp E, Liebermeister W. Annotation and merging of SBML models with semanticSBML. Bioinformatics. 2010;26(3):421–2. pmid:19933161
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref13] 13. Uhrmacher AM, Frazier P, H ähnle R, K lügl F, L orig F, L ud äscher B, et al. Context, composition, automation, and communication: the C2AC roadmap for modeling and simulation. ACM Trans Model Comput Simulat. 2024;34(4):1–51.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref14] 14. Ruscheinski A, Uhrmacher A. Provenance in modeling and simulation studies—Bridging gaps. In: 2017 Winter Simulation Conference (WSC). IEEE; 2017. p. 872–83.

[ref15] 15. Reinhardt O, Rucheinski A, Uhrmacher AM. ODD P: complementing the ODD protocol with provenance information. In: 2018 Winter Simulation Conference (WSC). 2018. p. 727–38.

[ref16] 16. Wilsdorf P, Reinhardt O, Prike T, Hinsch M, Bijak J, Uhrmacher AM. Simulation studies of social systems - telling the story based on provenance patterns. Roy Soc Open Sci. 2024.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref17] 17. Budde K, Smith J, Wilsdorf P, Haack F, Uhrmacher AM. Relating simulation studies by provenance-developing a family of Wnt signaling models. PLoS Comput Biol. 2021;17(8):e1009227. pmid:34351901
View Article
PubMed/NCBI
Google Scholar

[52] View Article

[53] PubMed/NCBI

[54] Google Scholar

[ref18] 18. Haack F, Budde K, Uhrmacher AM. Exploring the mechanistic and temporal regulation of LRP6 endocytosis in canonical WNT signaling. J Cell Sci. 2020;133(15):jcs243675. pmid:32661084
View Article
PubMed/NCBI
Google Scholar

[56] View Article

[57] PubMed/NCBI

[58] Google Scholar

[ref19] 19. Taylor SJ, Eldabi T, Monks T, Rabe M, Uhrmacher AM. Crisis, what crisis–does reproducibility in modeling & simulation really matter?. In: 2018 Winter Simulation Conference (WSC). 2018. p. 749–62.

[ref20] 20. Prike T. Open science, replicability, and transparency in modelling. Towards Bayesian model-based demography. Springer. 2022. p. 175–83.

[ref21] 21. Niarakis A, Waltemath D, Glazier J, Schreiber F, Keating SM, Nickerson D, et al. Addressing barriers in comprehensiveness, accessibility, reusability, interoperability and reproducibility of computational models in systems biology. Brief Bioinform. 2022;23(4):bbac212. pmid:35671510
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref22] 22. Wilsdorf P, Wolpers A, Hilton J, Haack F, Uhrmacher A. Automatic reuse, adaption, and execution of simulation experiments via provenance patterns. ACM Trans Model Comput Simul. 2023;33(1–2):1–27.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref23] 23. Wilsdorf P, Heller J, Budde K, Zimmermann J, Warnke T, Haubelt C, et al. A model-driven approach for conducting simulation experiments. Appl Sci. 2022;12(16):7977.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref24] 24. Tiwari K, Kananathan S, Roberts MG, Meyer JP, Sharif Shohan MU, Xavier A, et al. Reproducibility in systems biology modelling. Mol Syst Biol. 2021;17(2):e9982. pmid:33620773
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref25] 25. Davidson S, Boulakia S, Eyal A, Ludascher B, McPhillips T, Bowers S. Provenance in scientific workflow systems. IEEE Data Eng Bull. 2007;30(4):44–50.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref26] 26. Di Ciccio C, Marrella A, Russo A. Knowledge-intensive processes: characteristics, requirements and analysis of contemporary approaches. J Data Semantics. 2015;4:29–57.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref27] 27. van Der Aalst WM, Pesic M, Schonenberg H. Declarative workflows: balancing between flexibility and support. Comput Sci-Res Develop. 2009;23:99–113.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref28] 28. Ruscheinski A, Warnke T, Uhrmacher AM. Artifact-based workflows for supporting simulation studies. IEEE Trans Knowl Data Eng. 2019;32(6):1064–78.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref29] 29. Ruscheinski A, Wilsdorf P, Dombrowsky M, Uhrmacher AM. Capturing and reporting provenance information of simulation studies based on an artifact-based workflow approach. In: Proceedings of the 2019 ACM SIGSIM conference on principles of advanced discrete simulation. 2019. p. 185–96.

[ref30] 30. Ligon TS, Leonhardt C, Rädler JO. Multi-level kinetic model of mRNA delivery via transfection of lipoplexes. PLoS One. 2014;9(9):e107148. pmid:25237886
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref31] 31. Köster T, Henning P, Warnke T, Uhrmacher A. Expressive modeling and fast simulation for dynamic compartments. Cold Spring Harbor Laboratory. 2024. https://doi.org/10.1101/2024.04.02.587672

[ref32] 32. Choi K, Medley JK, König M, Stocking K, Smith L, Gu S, et al. Tellurium: An extensible python-based modeling environment for systems and synthetic biology. Biosystems. 2018;171:74–9. pmid:30053414
View Article
PubMed/NCBI
Google Scholar

[94] View Article

[95] PubMed/NCBI

[96] Google Scholar

[ref33] 33. Suh YK, Lee KY. A survey of simulation provenance systems: modeling, capturing, querying, visualization, and advanced utilization. Human-centric Comput Inf Sci. 2018;8(1):27.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref34] 34. Davidson SB, Freire J. Provenance and scientific workflows. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008. p. 1345–50. https://doi.org/10.1145/1376616.1376772

[ref35] 35. Belhajjame K, B’Far R, Cheney J, Coppens S, Cresswell S, Gil Y. PROV-DM: the prov data model. W3C; 2013.

[ref36] 36. Ruscheinski A, Gjorgevikj D, Dombrowsky M, Budde K, Uhrmacher AM. Towards a PROV ontology for simulation models. In: Belhajjame K, Gehani A, Alper P, editors. Provenance and annotation of data and processes. Cham: Springer; 2018. p. 192–5.

[ref37] 37. Schroder M, Raben H, Kruger F, Ruscheinski A, van Rienen U, Uhrmacher A, et al. PROVenance patterns in numerical modelling and finite element simulation processes of bio-electric systems. Annu Int Conf IEEE Eng Med Biol Soc. 2019;2019:3377–82. pmid:31946605
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref38] 38. Balci O. A life cycle for modeling and simulation. Simulation. 2012;88(7):870–83.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref39] 39. Sargent RG. Verification and validation of simulation models. In: Proceedings of the 2010 Winter Simulation Conference. 2010. p. 166–83. https://ieeexplore.ieee.org/abstract/document/5679166

[ref40] 40. Wilsdorf P, Haack F, Uhrmacher AM. Conceptual models in simulation studies: making it explicit. In: 2020 Winter Simulation Conference (WSC). 2020. p. 2353–64. https://ieeexplore.ieee.org/document/9383984/?arnumber=9383984

[ref41] 41. Robinson S. Conceptual modelling for simulation Part I: definition and requirements. J Oper Res Soc. 2008;59(3):278–90.
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref42] 42. Bock C, Dandashi F, Friedenthal S, Harrison N, Jenkins S, McGinnis L. Conceptual modeling. Research challenges in modeling and simulation for engineering complex systems. 2017. p. 23–44.

[ref43] 43. Peng D, Warnke T, Haack F, Uhrmacher AM. Reusing simulation experiment specifications to support developing models by successive extension. Simulat Model Pract Theory. 2016;68:33–53.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref44] 44. Mitra ED, Suderman R, Colvin J, Ionkov A, Hu A, Sauro HM, et al. PyBioNetFit and the biological property specification language. iScience. 2019;19:1012–36. pmid:31522114
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref45] 45. Courtot M, Juty N, Knüpfer C, Waltemath D, Zhukova A, Dräger A, et al. Controlled vocabularies and semantics in systems biology. Mol Syst Biol. 2011;7:543. pmid:22027554
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref46] 46. Crielaard L, Uleman JF, Châtel BDL, Epskamp S, Sloot PMA, Quax R. Refining the causal loop diagram: a tutorial for maximizing the contribution of domain expertise in computational system dynamics modeling. Psychol Methods. 2024;29(1):169–201. pmid:35549316
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref47] 47. Rougny A, Touré V, Moodie S, Balaur I, Czauderna T, Borlinghaus H, et al. Systems biology graphical notation: process description language level 1 version 2.0. J Integr Bioinform. 2019;16(2):20190022. pmid:31199769
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref48] 48. Cellier FE, Greifeneder J. Continuous system modeling. Springer; 2013.

[ref49] 49. Balbo G. Introduction to stochastic petri nets. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer; 2001. p. 84–155. https://doi.org/10.1007/3-540-44667-2_3

[ref50] 50. Fowler M. Domain-specific languages. Pearson Education. 2010.

[ref51] 51. Harris LA, Hogg JS, Tapia J-J, Sekar JAP, Gupta S, Korsunsky I, et al. BioNetGen 2.2: advances in rule-based modeling. Bioinformatics. 2016;32(21):3366–8. pmid:27402907
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref52] 52. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, et al. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003;19(4):524–31. pmid:12611808
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref53] 53. Lloyd CM, Halstead MDB, Nielsen PF. CellML: its future, present and past. Prog Biophys Mol Biol. 2004;85(2–3):433–50. pmid:15142756
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref54] 54. Ewald R, Uhrmacher AM. SESSL: a domain-specific language for simulation experiments. ACM Trans Model Comput Simulat. 2014;24(2):1–25.
View Article
Google Scholar

[151] View Article

[152] Google Scholar

[ref55] 55. Waltemath D, Adams R, Bergmann FT, Hucka M, Kolpakov F, Miller AK, et al. Reproducible computational biology experiments with SED-ML–the Simulation Experiment Description Markup Language. BMC Syst Biol. 2011;5:198. pmid:22172142
View Article
PubMed/NCBI
Google Scholar

[154] View Article

[155] PubMed/NCBI

[156] Google Scholar

[ref56] 56. Salecker J, Sciaini M, Meyer KM, Wiegand K. The nlrx r package: a next‐generation framework for reproducible NetLogo model analyses. Methods Ecol Evol. 2019;10(11):1854–63.
View Article
Google Scholar

[158] View Article

[159] Google Scholar

[ref57] 57. Sanchez SM, Sanchez PJ, Wan H. Work smarter, not harder: a tutorial on designing and conducting simulation experiments. In: 2020 Winter Simulation Conference (WSC). 2020. p. 1128–42.

[ref58] 58. Edeling W, Arabnejad H, Sinclair R, Suleimenova D, Gopalakrishnan K, Bosak B, et al. The impact of uncertainty on predictions of the CovidSim epidemiological code. Nat Comput Sci. 2021;1(2):128–35. pmid:38217226
View Article
PubMed/NCBI
Google Scholar

[162] View Article

[163] PubMed/NCBI

[164] Google Scholar

[ref59] 59. Haack F, Lemcke H, Ewald R, Rharass T, Uhrmacher AM. Spatio-temporal model of endogenous ROS and raft-dependent WNT/beta-catenin signaling driving cell fate commitment in human neural progenitor cells. PLoS Comput Biol. 2015;11(3):e1004106. pmid:25793621
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref60] 60. Kam HR, Lee S-H, Park T, Kim C-H. RViz: a toolkit for real domain data visualization. Telecommun Syst. 2015;60(2):337–45.
View Article
Google Scholar

[170] View Article

[171] Google Scholar

[ref61] 61. Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S. Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, 2004. 2004. p. 423–4.

[ref62] 62. Eek RE, Miller DD. Agent-based provenance architecture. In: 2011-MILCOM 2011 Military Communications Conference. 2011. p. 1499–505.

[ref63] 63. Muniswamy-Reddy KK, Braun U, Holland DA, Macko P, Maclean D, Margo D. Layering in provenance systems. provenance systems.

[ref64] 64. McPhillips T, Song T, Kolisnik T, Aulenbach S, Belhajjame K, Bocinsky K. YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. arXiv preprint 2015. http://arxiv.org/abs/1502.02403
View Article
Google Scholar

[176] View Article

[177] Google Scholar

[ref65] 65. PyCharm website. [cited 2024 Feb 3]. https://www.jetbrains.com/pycharm/

[ref66] 66. Visual Studio Code Website. [cited 2024 Feb 3]. https://code.visualstudio.com/

[ref67] 67. Notepad. [cited 2024 Feb 3]. https://notepad-plus-plus.org/

[ref68] 68. Bravo RR, Baratchart E, West J, Schenck RO, Miller AK, Gallaher J, et al. Hybrid automata library: a flexible platform for hybrid modeling with real-time visualization. PLoS Comput Biol. 2020;16(3):e1007635. pmid:32155140
View Article
PubMed/NCBI
Google Scholar

[182] View Article

[183] PubMed/NCBI

[184] Google Scholar

[ref69] 69. Barrett P, Hunter J, Miller JT, Hsu JC, Greenfield P. matplotlib–a portable python plotting package. In: Astronomical data analysis software and systems XIV. 2005. p. 91.

[ref70] 70. Choi K, Medley JK, König M, Stocking K, Smith L, Gu S, et al. Tellurium: an extensible python-based modeling environment for systems and synthetic biology. Biosystems. 2018;171:74–9. pmid:30053414
View Article
PubMed/NCBI
Google Scholar

[187] View Article

[188] PubMed/NCBI

[189] Google Scholar

[ref71] 71. Hoops S, Sahle S, Gauges R, Lee C, Pahle J, Simus N, et al. COPASI–a COmplex PAthway SImulator. Bioinformatics. 2006;22(24):3067–74. pmid:17032683
View Article
PubMed/NCBI
Google Scholar

[191] View Article

[192] PubMed/NCBI

[193] Google Scholar

[ref72] 72. Tisue S, Wilensky U. Netlogo: a simple environment for modeling complexity. In: International Conference on Complex Systems. Boston, MA. 2004. p. 16–21.

[ref73] 73. Helms T, Warnke T, Maus C, Uhrmacher AM. Semantics and efficient simulation algorithms of an expressive multilevel modeling language. ACM Trans Model Comput Simulat. 2017;27(2):1–25.
View Article
Google Scholar

[196] View Article

[197] Google Scholar

[ref74] 74. Agha G, Palmskog K. A survey of statistical model checking. ACM Trans Model Comput Simulat. 2018;28(1):1–39.
View Article
Google Scholar

[199] View Article

[200] Google Scholar

[ref75] 75. Bray T. The javascript object notation (json) data interchange format. 2014.

[ref76] 76. Ben-Kiki O, Evans C, Ingerson B. Yaml ain’t markup language (yaml™) version 1.1 working draft. YAML. 2009;5:11.
View Article
Google Scholar

[203] View Article

[204] Google Scholar

[ref77] 77. Huynh TD, Jewell MO, Sezavar Keshavarz A, Michaelides DT, Yang H, Moreau L. The prov-json serialization. 2013. https://www.w3.org/submissions/prov-json/
View Article
Google Scholar

[206] View Article

[207] Google Scholar

[ref78] 78. Franz M, Lopes CT, Huck G, Dong Y, Sumer O, Bader GD. Cytoscape.js: a graph theory library for visualisation and analysis. Bioinformatics. 2016;32(2):309–11. pmid:26415722
View Article
PubMed/NCBI
Google Scholar

[209] View Article

[210] PubMed/NCBI

[211] Google Scholar

[ref79] 79. Aho AV, Garey MR, Ullman JD. The transitive reduction of a directed graph. SIAM J Comput. 1972;1(2):131–7.
View Article
Google Scholar

[213] View Article

[214] Google Scholar

[ref80] 80. Maus C, Rybacki S, Uhrmacher AM. Rule-based multi-level modeling of cell biological systems. BMC Syst Biol. 2011;5:166. pmid:22005019
View Article
PubMed/NCBI
Google Scholar

[216] View Article

[217] PubMed/NCBI

[218] Google Scholar

[ref81] 81. Cone M. The Markdown Guide. 2022. https://www.markdownguide.org/
View Article
Google Scholar

[220] View Article

[221] Google Scholar

[ref82] 82. Kowalski PS, Rudra A, Miao L, Anderson DG. Delivering the messenger: advances in technologies for therapeutic mRNA delivery. Mol Ther. 2019;27(4):710–28. pmid:30846391
View Article
PubMed/NCBI
Google Scholar

[223] View Article

[224] PubMed/NCBI

[225] Google Scholar

[ref83] 83. Sahin U, Karikó K, Türeci Ö. mRNA-based therapeutics--developing a new class of drugs. Nat Rev Drug Discov. 2014;13(10):759–80. pmid:25233993
View Article
PubMed/NCBI
Google Scholar

[227] View Article

[228] PubMed/NCBI

[229] Google Scholar

[ref84] 84. Chaudhary N, Weissman D, Whitehead KA. mRNA vaccines for infectious diseases: principles, delivery and clinical translation. Nat Rev Drug Discov. 2021;20(11):817–38. pmid:34433919
View Article
PubMed/NCBI
Google Scholar

[231] View Article

[232] PubMed/NCBI

[233] Google Scholar

[ref85] 85. Zhang W, Jiang Y, He Y, Boucetta H, Wu J, Chen Z, et al. Lipid carriers for mRNA delivery. Acta Pharm Sin B. 2023;13(10):4105–26. pmid:37799378
View Article
PubMed/NCBI
Google Scholar

[235] View Article

[236] PubMed/NCBI

[237] Google Scholar

[ref86] 86. Leonhardt C, Schwake G, Stögbauer TR, Rappl S, Kuhr J-T, Ligon TS, et al. Single-cell mRNA transfection studies: delivery, kinetics and statistics by numbers. Nanomedicine. 2014;10(4):679–88. pmid:24333584
View Article
PubMed/NCBI
Google Scholar

[239] View Article

[240] PubMed/NCBI

[241] Google Scholar

[ref87] 87. Liu J, Xiao Q, Xiao J, Niu C, Li Y, Zhang X, et al. Wnt/β-catenin signalling: function, biological mechanisms, and therapeutic opportunities. Signal Transduct Target Ther. 2022;7(1):3. pmid:34980884
View Article
PubMed/NCBI
Google Scholar

[243] View Article

[244] PubMed/NCBI

[245] Google Scholar

[ref88] 88. Wilsdorf P, Kirchhübel AW, Uhrmacher AM. NBSIMGEN: jupyter notebook extension for generating simulation experiments. In: 2023 Winter Simulation Conference (WSC). 2023. p. 2884–95.

[ref89] 89. Shannon RE. Introduction to the art and science of simulation. In: 1998 Winter Simulation Conference Proceedings (cat. no. 98ch36274). 1998. p. 7–14.

[ref90] 90. Klimovskaia A, Ganscha S, Claassen M. Sparse regression based structure learning of stochastic reaction networks from single cell snapshot time series. PLoS Comput Biol. 2016;12(12):e1005234. pmid:27923064
View Article
PubMed/NCBI
Google Scholar

[249] View Article

[250] PubMed/NCBI

[251] Google Scholar

[ref91] 91. Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc Natl Acad Sci U S A. 2016;113(15):3932–7. pmid:27035946
View Article
PubMed/NCBI
Google Scholar

[253] View Article

[254] PubMed/NCBI

[255] Google Scholar

[ref92] 92. Hoffmann M, Fröhner C, Noé F. Reactive SINDy: discovering governing reactions from concentration data. J Chem Phys. 2019;150(2):025101. pmid:30646700
View Article
PubMed/NCBI
Google Scholar

[257] View Article

[258] PubMed/NCBI

[259] Google Scholar

[ref93] 93. Burrage PM, Weerasinghe HN, Burrage K. Using a library of chemical reactions to fit systems of ordinary differential equations to agent-based models: a machine learning approach. Numer Algorithms. 2024:1–15.
View Article
Google Scholar

[261] View Article

[262] Google Scholar

[ref94] 94. Butchy AA, Arazkhani N, Telmer CA, Miskov-Zivanov N. Automating knowledge-driven model recommendation: methodology, evaluation, and key challenges. IEEE Trans Comput Biol Bioinform. 2025.
View Article
Google Scholar

[264] View Article

[265] Google Scholar

[ref95] 95. Ahmed Y, Telmer CA, Zhou G, Miskov-Zivanov N. Context-aware knowledge selection and reliable model recommendation with ACCORDION. Front Syst Biol. 2024;4.
View Article
Google Scholar

[267] View Article

[268] Google Scholar

[ref96] 96. Teran-Somohano A, Smith AE, Ledet J, Yilmaz L, Og˘uztüzün H. A model-driven engineering approach to simulation experiment design and execution. In: 2015 Winter Simulation Conference (WSC). IEEE; 2015. p. 2632–43.

Figures

Abstract

Motivation

Provenance of simulation studies

PROV-DM extension for simulation studies

The conceptual model.

Simulation models.

Simulation experiments.

Simulation data.

Postprocessing.

Provenance patterns

Concept

Provenance capturers

Detecting the modeler’s activity & collecting information.

Identifying the activity and entities.

Sending the information.

Provenance builder

Ensuring consistency.

Chaining into a single provenance graph.

Implementation

Case studies

Capturing provenance from visual studio code and python

mRNA delivery model

Captured provenance graph.

A) Conceptual modeling.

B) Re-implementation of the original model.

C) Adaptation of the model.

Wnt/ catenin model

Manually vs. automatically generated provenance graphs.

Discussion

What can be deduced from the modeler’s activities?

What has to be specified by interaction with the modeler?

How much implementation effort is needed?

What insights do the captured provenance graphs provide?

Conclusion and future directions

References