^{1}

^{2}

^{3}

^{1}

^{1}

^{4}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: TRM BGO FJB. Performed the experiments: TRM. Analyzed the data: TRM BGO FJB. Contributed reagents/materials/analysis tools: BGO. Wrote the paper: TRM BGO FJB. Developed StochPy: TRM BGO FJB.

Single-cell and single-molecule measurements indicate the importance of stochastic phenomena in cell biology. Stochasticity creates spontaneous differences in the copy numbers of key macromolecules and the timing of reaction events between genetically-identical cells. Mathematical models are indispensable for the study of phenotypic stochasticity in cellular decision-making and cell survival. There is a demand for versatile, stochastic modeling environments with extensive, preprogrammed statistics functions and plotting capabilities that hide the mathematics from the novice users and offers low-level programming access to the experienced user. Here we present StochPy (

Experiments at the level of single cells indicate large cell-to-cell variability in copy numbers of molecules

Stochastic simulation software is used in a variety of modeling methodologies e.g. ordinary differential equations

StochPy provides various SSAs for the simulation of stochastic dynamics and supports model definition in either plain text or the Systems Biology Markup Language (SBML)

The StochPy software has been designed around three core principles.

To satisfy these principles StochPy has been developed as a console application using the Python language, taking advantage of its pure object-oriented nature, portability, extensive standard library, and ability to seamlessly glue together scientific libraries written in compiled languages. For instance, Matplotlib

Combining these functionalities with those provided by the many available Python scientific libraries allows for easy extension of StochPy as well as its use as a library in other simulation software. The StochPy software has already been incorporated as a plug-in library for the systems biology simulator software PySCeS

The following SSAs are implemented in StochPy: The direct, first reaction, next reaction, and optimized tau-leaping methods

Model definition is by way of the human readable/writable PySCeS model description language (MDL)

The StochPy software provides interfaces to other widely used simulation tools: CAIN and StochKit2. Through these integrated interfaces, modelers are provided with a choice of SSA implementations that differ in speed and simulation output. For example, using StochKit2 via StochPy allows simulating models defined in SBML up to L2V4 or PySCeS MDL by StochKit2’s fast solvers. The output can then directly by analyzed in StochPy, without the need to install any additional software (by default StochKit2 uses MATLAB to provide this functionality).

A typical StochPy modeling session consists of first creating a StochPy model object from a (default) input model. Alternatively, different user-defined models (in SBML or PySCeS MDL) can be loaded into the model object. Once a model is loaded various simulation parameters can be set, e.g. the number of simulation steps and the number of simulation trajectories. Subsequently, kinetic parameter values and species amounts can be modified interactively, and simulations can be performed by calling the available analysis methods for model objects. As model objects are fully encapsulated, multiple models can be instantiated from the same (or different) input files at the same time. An example of a short modeling session within Python can be,

>>> import stochpy

>>> smod = stochpy.SSA()

>>> smod.Model(‘dsmts-001-01.xml’)

>>> smod.DoStochSim(end = 1000,mode = ‘steps’)

>>> smod.PlotSpeciesTimeSeries().

Here, we initiate the model object smod for the default input model, load a different model depicted in SBML into the model object smod, generate one time trajectory of the master equation (1000 steps), and plot the corresponding (discrete) species time series data.

In the following sections we discuss the potential uses of explicit output in systems biology, highlight StochPy’s capabilities by modeling different biological systems, and benchmark StochPy against other widely used stochastic tools. All simulations were done with StochPy’s implementation of the direct method. Only a single command, a high-level function such as PlotSpeciesTimeSeries(), is necessary to create most of the shown (sub)-figures. Annotated scripts and input files used to generate modeling results are available as

StochPy returns explicit output rather than discretized output (hereafter fixed-interval output). With fixed-interval output we mean that the state of the system, i.e. the copy numbers of all molecules, are reported for fixed-time intervals and not at the times in the system when single reaction events occur. Mathematically speaking, molecular reaction systems are modeled by continuous-time discrete state Markov chains and fixed-interval data storage approaches simulate these systems with continuous time but store the output with fixed-time intervals. Returning fixed-interval output likely derives from stochastic modeling practices in mathematical statistics. In systems biology, the requirements for stochastic simulation are often different. Access to exact simulation times allows for the straightforward calculation of event waiting times, species and propensity distributions, and correlation times. As these quantities are in principle observable in single-cell experiments, they should be calculable with simulation software.

An example of explicit simulation output of StochPy is shown in a table. It reports the number of molecules of each molecular species and the reaction propensities at each time point when a reaction occurs. The time differences between consecutive rows indicate waiting times between reaction events. In the last column, the waiting times for reaction 4,

In

Hundred stochastic simulations until t = 60.000 min (^{−1}, ^{−1}, ^{−1}, and ^{−1}. (A) Accuracy of mean and standard deviation estimates as function of the number of fixed intervals. (B) Simulation time with fixed-interval output increases with the number of fixed intervals. Fixed-interval simulations were done with the StochPy interface to StochKit2 and include the time to calculate the associated probability distributions. (C) The stationary mRNA distribution for

A complication with fixed-interval storage is that the user does not know beforehand what the relevant fixed-interval size and number should be and, for instance, data bootstrapping should be applied to assess the accuracy of the calculation. For instance,

This means that the speed of a fixed-interval algorithm is not set by the end time and the programming language, as is the case for an exact output approach, but also by the chosen number of fixed intervals. Note that deciding the right number of fixed intervals can only be done by trail and error. Generally, more than

In this section, we modeled the immigration-death model (^{−1} and the degradation rate constant ^{−1}.

Illustration of several plotting options in StochPy. Colored lines represent StochPy output and black the analytical solutions. (A) species time-series data. (B) propensities time-series data. (C) species distribution. (D) propensities distribution. (E) auto-correlation for different

Distributions can give us more insight into the size of fluctuations.

In this section, we consider a model of mRNA synthesis by a gene that switches spontaneously between an inactive (OFF) state and an active (ON) state (

In

StochPy plots of simulating stochastic gene expression. (A) long lifetimes of both the ON and OFF state. (B) bursty transcription. (C) short lifetimes of both the ON and OFF state. (D) non-bursty transcription.

StochPy plots of simulating stochastic gene expression with StochPy simulations (step, markers, colored) and analytical solutions (solid, black). (A) probability distribution of the mRNA copy numbers. (B) probability distribution of the mRNA synthesis event waiting times.

Next, we consider a completely different model that describes single-molecule enzymology (

The simulation results are shown in

StochPy plots for single-molecule enzyme activity simulations with StochPy simulations (step, markers, blue) and analytical solutions (solid, black) (A–B) time-series data of

To demonstrate the flexibility of StochPy we briefly illustrate how simple it is to extend a stochastic model of a gene expression network with explicit cell division events, even though this is not a standard functionality of StochPy. In this model, protein synthesis occurs from mRNA and mRNA synthesis depends on the presence of active transcription factors. This model consists of nine reactions where one reaction is not described by mass-action kinetics, which would make this system already hard to simulate for some software packages.

We modeled cell division in both an explicit and implicit manner (see

The differences between modeling cell division explicitly and implicitly are illustrated in

StochPy plots of simulating stochastic gene expression. Modeling details of cell division periods: Gamma-distributed with scale parameter is 60.0 and shape parameter is 1.0. Implicit and explicit time series of transcription factor copy numbers (A and D), mRNA copy numbers (B and E), and protein copy numbers (C and F). Distributions of protein copy numbers for modeling cell division explicitly and implicitly (G). The model is further described in

The StochPy software features and speed performance were benchmarked against widely used, existing stochastic software to make a fair and broad comparison of available tools for stochastic simulations.

In an extensive search for available stochastic simulators, CAIN, COPASI, Facile-EasyStoch, GillespieSSA, and StochKit2 were identified as those with the closest functionality to StochPy. Here, we will discuss and compare these tools against StochPy through a feature comparison. A comprehensive feature comparison is provided in

Feature | CAIN | COPASI | EasyStoch | GillespieSSA | StochKit2 | StochPy |

- Exact SSA | • | • | • | • | • | • |

- Inexact SSA | • | • | • | • | • | |

- SBML support | ○^{1} |
• | ○^{2} |
• | ||

- Human interpretable input | • | • | ||||

- Stochastic test suite | • | |||||

- Extrinsic noise | • | |||||

- Explicit output | • | • | ○^{3} |
• | ||

- Fixed-interval output | • | • | • | • | • | |

- Auto-correlations | • | |||||

- Histogram distance | • | • | ||||

- Propensities | • | • | ||||

- Moments | • | |||||

- Waiting times | • | |||||

- Plotting facilities | • | • | • | ○^{4} |
• | |

- Data exportation | • | • | • | • | • | |

- GUI | • | • | ||||

- Flexible environment | • | • |

: Feature is partially present or requires additional dependencies.

The direct solver of StochPy was benchmarked against the direct solvers of two widely used and high-performance stochastic tools i.e. CAIN and StochKit2 (both implemented in C++) for various stochastic models, the results of which are shown in

Simulation Type | CAIN | CAIN (API) | StochKit2 | |

Small | 0.7–0.10 | 0.24–0.10 | 0.24–0.07 | 1.0–0.31 |

0.5–0.10 | N/A | 96–0.16 | 1.0–0.45 | |

0.04–0.07 | 0.04–0.06 | 0.03–0.18 | 0.28–0.33 | |

1.9–1.9 |
N/A | 1.7–0.18 | 1.0–0.3 | |

3.2–3.7 |
N/A | 1.5–0.18 | 1.0–0.3 | |

N/A | N/A | N/A | 1.0–1.0 | |

Large | 0.14 | 0.28 | 0.09 | 0.56 |

XL | 0.15 | 0.31 | 0.11 | 0.66 |

XXL | 0.24 | 0.37 | 0.11 | 0.93 |

StochPy with interfaces to CAIN and StochKit2. Simulation time includes time to parse results into StochPy.

Cain cannot parse events, so the user most specify them in the GUI.

Optimal theoretical result without including time to merge the output of all sequential simulations.

To fairly compare StochPy’s performance against these other tools, the number of fixed-intervals was set equal to number of time steps in the stochastic simulation. Note that determining the minimal number of fixed-intervals necessary to perform a particular analysis requires doing multiple simulations (as shown in

The first conclusion from this benchmark is that no single solver was the fastest in any case. Secondly, StochPy is the only stochastic simulator that was able to correctly simulate all stochastic models tested in this benchmark. This in contrast to the CAIN API which can accept models consisting of only mass-action kinetics.

Thirdly, for different numbers of simulation time steps, significant differences in simulation time were found for different solvers. For relatively short simulations, StochKit2’s performance is reduced, as it requires substantial time to compile models with e.g. events and non mass-action kinetics. This effect is negligible for relatively long simulations. Both the CAIN and StochKit2 solvers outperformed StochPy’s direct solver for most tested models if simulations were done for a relatively large number of time steps (except e.g. modeling events with CAIN).

Fourthly, StochPy’s performance increased with respect to the performance of both CAIN and StochKit2 when larger modes were considered. For instance, CAIN needed about 4 minutes to parse the largest model tested (parsing time was omitted from the benchmark), while both StochKit2 and StochPy were able to parse this model within seconds. For relatively long simulations of models with many species, StochKit2’s solvers were about 10 times faster than those of StochPy, which is expected because our software is written in Python rather than C++.

Since, we also offer access to CAIN and StochKit2 solvers directly from StochPy, we also tested the speeds of CAIN and StochKit2 for this mode of operation. While exploiting these solvers in StochPy appears slower than the native application, this time only includes parsing of the simulation output for post-simulation analysis. This can take a significant amount of time for large data sets.

As StochPy provides access to multiple SSAs, SSA implementations, and simulation tools and as discussed above, there is no ‘one size fits all’ approach, we provide a decision tree to help guide prospective modelers in how best to select a method that suits their model (see

Both fixed-interval and explicit output have their advantages and disadvantages. The decision whether to use fixed-interval or explicit output depends on the type of analysis.

Stochastic modeling in systems biology demands a certain level of flexibility in simulation, management of stochastic models and the handling of simulation data. Depending on the size of the system of interest and its degrees of time-scale separation, the different SSAs each have their particular (dis-)advantages. The differences in simulation time between stochastic simulation packages are often due to the fixed-interval reporting of simulation data versus the use of explicit output. To achieve the accuracy of explicit solvers the differences in simulation time greatly reduce, and ultimately boil down to, differences in the programming languages. In systems biology applications, often the pure simulation data rather than the fixed-interval simulation data is of interest. The pure simulation data allows for the accurate determination of various time and copy number associated probability measures.

We presented StochPy as a versatile modeling package for stochastic simulation of molecular control networks inside living cells that provides solvers which return explicit stochastic simulation output. Its integration with Python’s scientific libraries and PySCeS makes it an easily extendible and a user-friendly stochastic simulator package. We highlighted this by implementing both the solvers of CAIN and StochKit2 that return only fixed-interval output, which can be useful for obtaining insight into time series and moments. The high-level statistical and plotting functions of StochPy allow for quick and interactive model interrogation at the command-line. Python’s scripting capabilities allow for more complicated and in-depth analysis of stochastic models and meets many of the demands for systems biology.

(ZIP)

(ZIP)

(ZIP)

(PDF)

We thank Anne Schwabe (VU University Amsterdam) for helpful discussions about stochastic simulations.