Modular assembly of dynamic models in systems biology

It is widely acknowledged that the construction of large-scale dynamic models in systems biology requires complex modelling problems to be broken up into more manageable pieces. To this end, both modelling and software frameworks are required to enable modular modelling. While there has been consistent progress in the development of software tools to enhance model reusability, there has been a relative lack of consideration for how underlying biophysical principles can be applied to this space. Bond graphs combine the aspects of both modularity and physics-based modelling. In this paper, we argue that bond graphs are compatible with recent developments in modularity and abstraction in systems biology, and are thus a desirable framework for constructing large-scale models. We use two examples to illustrate the utility of bond graphs in this context: a model of a mitogen-activated protein kinase (MAPK) cascade to illustrate the reusability of modules and a model of glycolysis to illustrate the ability to modify the model granularity.


Reviewer 1
This is a manuscript that essentially summarizes the bond graph approach of network thermodynamics and proposes it to be a suitable method to modularize the development of large-scale models (such as whole-cell models). Overall I find the manuscript to be very clear and accessible and overall I think this will interest the readership of the journal.

Major comments
1. My only major concern regards the glycolysis example used, where they chose to keep AMP, ADP, ATP, NAD and NADH constant. These conserved species are recycled and are too important to ignore by keeping them constant. It would have been much more interesting if they had included their recycling reactions (like alcohol dehydrogenase for NAD/NADH and an ATPase + an adenilate kinase for ATP/ADP/AMP. This would be much more in line with what they did in the MAPK example. However, I do understand that this is not essential for the purpose of their argument (the utility of bond graph approach), but it would be a whole lot more interesting. Importantly, doing that would allow a better energetic analysis of glycolysis...
As noted by the reviewer, the purpose of the example is to illustrate the utility of the approach rather than to produce a detailed model of metabolism. We see the addition of recycling reactions as future work where the model of glycolysis could be regarded as a validated module that could be connected to other reactions to form more comprehensive models. We have added some sentences to the final paragraph of the Results to explain this (page 28).
After incorporating data on thermodynamic constants, this model could be regarded as a validated module available for coupling with other reactions to form more comprehensive models. For example, recycling reactions such as the adenylate kinase and alcohol dehydrogenases could be added to study variations in the concentrations of ATP, ADP, AMP, NAD and NADH.
In our response to Comment 3 from Reviewer 2, we note that it is possible to construct bond graph models using existing kinetic models as a template, although correcting thermodynamic inconsistencies can be time-consuming (pages [29][30]).
An ongoing challenge to creating truly large-scale bond graph models is increasing the number of models available to modellers. Ideally, one would look to convert existing kinetic models in the literature into bond graphs. However, while bond graphs are readily converted into kinetic models, the conversion in the other direction is not always possible as existing kinetic models often use irreversible reactions or fail to satisfy detailed balance [1][2][3]. Thus, to maintain the advantages of using bond graphs, an avenue of future work would be to automatically generate bond graphs versions of existing biomodels, such as in the BioModels repository [4] or the Physiome Model Repository [5], and where necessary correcting thermodynamic inconsistencies while minimally changing model behaviour. In cases where the energetics of a process are considered negligible (for example gating in ion channel and neural control in locomotion), one may choose to embed the kinetics of such processes inside control elements of a bond graph [6]. This allows a kinetic model to be directly added to a bond graph as a "black box" in which the energetics underlying its dynamic behaviour are ignored.
However, we anticipate that the ongoing Physiome Project will produce software tools and models to aid the wider uptake of bond graphs by the community. We have added a paragraph in the discussion to address this (page 31).
In writing this paper, we hope to motivate the uptake of network thermodynamic models (and bond graph models in particular) by the systems biology community.
One of the objectives of the Physiome Project is to embed bond graphs within the CellML modelling language [7]. This will facilitate the development of more user-friendly tools, allowing the conversion of existing models into bond graphs and subsequently coupling them. As the bond graph methodology gradually becomes more widely adopted by the community, we anticipate that existing mathematical and computational methods in the systems biology space will be used in conjunction with the bond graph approach, allowing biomodels to be integrated in a physically consistent manner.

Minor comments
There are a number of minor points that nevertheless I think they could easily address and which would make the manuscript much more useful for readers: 1. Thermodynamics To be fair to readers it is important to be accurate in citing prior work. In this regard I bring attention to your sentence in the Introduction: "While the whole-cell modelling community has emphasised the need to use physically measurable parameters, it is only recently that the importance of thermodynamics in these models has been acknowledged [5]." the importance of thermodynamics in this type of models has been invoked prior to this by several authors, such as your references [32] and [33], , ,  and very much in this large-scale kinetic model context by  (disclaimer: I am a co-author of the latter and I am not writing this seeking a citation; rather I feel that a reader should not be mislead that your ref. [5] was the first time thermodynamics had been thought of in this context).
We have updated this sentence to more accurately represent progress in this space (page 3).
In particular, the laws of thermodynamics have been invoked in the context of metabolic modelling, allowing modellers to estimate the energetic favourability of reactions [8], to constrain fluxes in constraint-based models [9,10] and to improve parameter estimation in large-scale kinetic models [11][12][13] and whole-cell models [14].
2. Functional modularity: I thought it could be pertinent to mention the work of Del Vecchio et al (2008) in this context.
We have now updated this sentence to include this reference (page 5).
The role of functional modularity is pivotal to systems and synthetic biology, particularly in reducing loading effects between engineered genetic circuits [15]. However, functional modularity can only be analysed and designed through the lens of computational modularity [16].
3. Figure 2 is very useful, but it would be better to make 2B actually use the SBGN standard. I note that it would require only one change to make it compliant: the arrows should not point into the reaction nodes, only into the species nodes. Then you could say that this representation follows the standard (rather than it is similar to).
We have updated Figure 2B as suggested and revised a sentence to state that it now follows the SBGN standard (pages 9-10).
This representation follows the Systems Biology Graphical Notation (SBGN) standard [17]. 4. The MAPK are not only significant clinically for being involved in many cancers, but also in inflammation (P38 pathway).
We have updated the sentence to mention inflammation as well (page 16).
MAPK cascades in human cells include the ERK-MAPK, c-Jun N-terminal kinase (JNK) and p38 MAPK pathways, which are of clinical relevance as they are implicated in both inflammation and cancer [18,19]. 5. Fig. 4, I note that in panel B you represent the phosphatase in the direction which seems to be the inverse of the biological flow. Normally phosphatases go from XP → X rather than X → XP (thermodynamically this direction is very unfavorable unless in the presence of a massive and unphysiological concentration of Pi... It would be much better if you would represent 4B in the physiological direction (page 17).
We have updated Figure 4B so that the phosphatase reaction is depicted in the physiological direction (page 17).
References that I cited which are not in the manuscript:

Reviewer 2
The manuscript by Michael Pan et al describes a physics-based approach for modularity in biological modeling. They describe modular construction of models based on bond graphs.
The first part of the manuscript described bond graphs, defining reaction network (RN) through thermodynamic (TD) notations, expressing all kinetic rate constants through TD parameters. Mathematically it means introducing a larger number of parameters that are energy-based and independent, contrary to dependent parameters in a reaction network.
Visually it corresponds to decomposing the RN (usually represented as a bipartite graph of species and reaction nodes) into 4-partite graph (separate nodes for species, reactions, mass-conservation check-points and energy conservation check-points). This part is clearly written and very useful.
The second part describes using bond graphs to construct and combine modules into larger biomedical models. This is a nice and comprehensive description of modular approach, but the authors have not compared their approach with SBML hierarchical model composition package. It is described at http://co.mbine.org/standards/sbml/level-3/version-1/comp. Although SBML hierarchical package is too technical and not written for a general audience, it introduces the same concepts of black-box versus white-box encapsulation and "ports" as interfaces between a module and its containing model. A modular SBML-based approach is implemented in iBioSim (https://async.ece.utah.edu/tools/ibiosim/).
Thus, I would appreciate if authors could address the following comments: 1. It would be useful to have more input on what in Figures 4, 6 and 7 is different from a regular modular composition of RNs using ports.
We have added a paragraph in the discussion (page 29) discussing the similarities and differences between composing bond graph modules and reaction network modules.
Due to their modular nature, bond graphs are compatible with existing approaches to coupling reaction networks, as implemented in the SBML hierarchical package and PySB [20,21]. These approaches are similar in that component definitions can be defined by either outer modules (black-box modularity) or be merged with other components (white-box modularity). This means that bond graphs could potentially be built on top of existing software infrastructure for model annotation, composition and simulation [22][23][24]. However, the bond graph approach expands on existing approaches in a few ways. Firstly, bond graphs explicitly account for the transfer of power, ensuring that models are thermodynamically consistent. Secondly, unlike in typical kinetic models, species components contain their own independent equations and parameters, allowing conflicts between models to be resolved. Finally, the approach is generalisable to multi-physics systems such as mechanobiology and electrophysiology. To use Figure 4 as an example, one could in principle generate the full bond graph by replacing the phosphorylation loop modules with the bond graph in panel C, and then replacing the kinase and phosphatase modules with the bond graphs in panels A and B respectively.
We have made this point clearer when defining the MAPK cascade in Figure 4 (page 18).
We note that while this representation is abstracted, it is still a fully functional bond graph satisfying thermodynamic consistency. One could in principle flatten the bond graph by iteratively replacing modules with their definitions in Fig.4 A-C. Indeed, this approach can be used to algorithmically derive the equations of a bond graph model. 3. Is all TD machinery internal for each module and not exposed? Can we "mix and match" by defining some modules in terms of TD and some in terms of RN?
While thermodynamic parameters (κ and K) can be used to uniquely calculate kinetic parameters (k + and k − ), converting in the other direction is not always possible. Thus, while bond graphs can be converted into reaction networks, reaction networks cannot always be converted into bond graphs. One could convert bond graphs to reaction networks to allow the two formalisms to couple, but the advantages of using bond graphs would be lost in this case. We have added a paragraph in the discussion to address this point (pages [29][30]).
An ongoing challenge to creating truly large-scale bond graph models is increasing the number of models available to modellers. Ideally, one would look to convert existing kinetic models in the literature into bond graphs. However, while bond graphs are readily converted into kinetic models, the conversion in the other direction is not always possible as existing kinetic models often use irreversible reactions or fail to satisfy detailed balance [1][2][3]. Thus, to maintain the advantages of using bond graphs, an avenue of future work would be to automatically generate bond graphs versions of existing biomodels, such as in the BioModels repository [4] or the Physiome Model Repository [5], and where necessary correcting thermodynamic inconsistencies while minimally changing model behaviour. In cases where the energetics of a process are considered negligible (for example gating in ion channel and neural control in locomotion), one may choose to embed the kinetics of such processes inside control elements of a bond graph [6]. This allows a kinetic model to be directly added to a bond graph as a "black box" in which the energetics underlying its dynamic behaviour are ignored.

4.
Can benchmarking rate laws and effects of perturbations be computed in a regular RN simulator (COPASI, Tellurium, PySB, etc) after a simple change of parameters?
Yes, this is possible in all three of these simulators. However, we note that it is important to keep the bond graph as the core representation though since the above tools require the user to specify species constants multiple times. We have added a paragraph in the discussion to address this comment (page 30, final paragraph).
While we used the Python package BondGraphTools for the benchmarking and simulation in this paper, it would also be relatively straightforward to use typical systems biology simulators (for example COPASI [22], Tellurium [23] or PySB [21]) after a change of parameters. However, we note that these programs do not allow species equations to be specified. Thus, we advocate for retaining a bond graph representation of the model as a core representation that can be later modified should the need arise.

5.
Compare their modular approach with SBML hierarchical composition.
Since SBML hierarchical composition uses reaction network ports, we trust that this issue has been addressed in comment 1.
We have also added a sentence in the Methods to mention that the SBML hierarchical composition package is implemented in iBioSim (page 6).
SBML also supports both white and black box modularity through its hierarchical package [20], which is implemented in the graphical tool iBioSim [25].
6. Discuss whether bond graphs can be described as an extension to SBML standard.
We have added some sentences to the discussion regarding this (page 31).
We are working on exporters to convert bond graphs into more commonly used formats such as SBML and CellML, which will aid in model reproducibility and give users access to a wide range of simulators such as COPASI and OpenCOR [22,24]. A longer-term goal is to define bond graphs in SBML and CellML. CellML describes the mathematics of physical systems, thus bond graphs are generally straightforwardly defined in this standard. However, to implement the approach in SBML, one would need to extend the standard so that species have their own equations and parameters, potentially through a separate package.

Reviewer 3
This is a nicely written paper on the use of bond-graphs to build composable systems biology models. I only have a couple of minor points to make and for the authors to clarify in the paper: 1. I assume the authors know that SBML supports a white/black box approach to composition in the hierarchical package: https://pubmed.ncbi.nlm.nih.gov/26528566/ This should be cited, perhaps on page 6 As mentioned in Comment 5 from Reviewer 2, we have added a sentence on this page to mention the package (page 6).
SBML also supports both white and black box modularity through its hierarchical package [20], which is implemented in the graphical tool iBioSim [25].

2.
One important question is whether bond-graph representations (I think they are stored as json files?) could be converted to SBML given the ubiquity of simulators for this format. One could in fact imagine the bondgraph description being a higher level of description that can be converted to a simpler SBML file where the bond graph representation is the model that is edited and improved. The SBML this acts as an intermediary. Or do the authors envisage adding new annotations to SBML (or a package) to extend SBML to store bondgraph models?
We agree that the SBML standard is an ideal intermediary between the bond graph representation and simulation software. As discussed in Comment 6 from Reviewer 2, we have added a paragraph in the discussion to explain the relationship between standard modelling languages such as SBML and CellML (pages 30-31).
While we used the Python package BondGraphTools for the benchmarking and simulation in this paper, it would also be relatively straightforward to use typical systems biology simulators (for example COPASI [22], Tellurium [23] or PySB [21]) after a change of parameters. However, we note that these programs do not allow species equations to be specified. Thus, we advocate for retaining a bond graph representation of the model as a core representation that can be later modified should the need arise. We are working on exporters to convert bond graphs into more commonly used formats such as SBML and CellML to aid model reproducibility. A longer-term goal is to define bond graphs in SBML and CellML. CellML describes the mathematics of physical systems, thus bond graphs are generally straightforwardly defined in this standard. However, to implement the approach in SBML, one would need to extend the standard so that species have their own equations and parameters, potentially through a separate package.
3. One the major disadvantages of the bondgraph approach is its verbosity (plus the terminology is not as well-known as the more simple kinetic formalisms). The same problem arises with Vivarium but this tool has also made the fatal mistake of mixing specification with implementation. Although perhaps not applicable to this manuscript, software tooling will be a major impediment to the widespread use of bond graphs. I looked at bondgraphstools and there might be a case of adding a higher layer of abstraction because currently is seems quite laborious to build even small models, I also wonder whether in its current state it could scale to whole-cell models (the same applies to vivarium) although the composition feature would help. There might need to be additional abstraction levels.
We agree that the existing bond graph software could benefit from further levels of abstraction in the construction of large models, which is an area of active research. While we have taken steps to ease the construction of bond graph models of reaction networks, there is still further work to be done in this area, in particular using rule-based approaches such as BioNetGen, PySB and ANC. We have edited some sentences in the discussion to outline further work for abstracting the construction of bond graph models (page 30).
Embedding bond graphs within a programmatic environment enables the construction of models using higher-level descriptions, allowing modellers to define models using a relatively parsimonious set of commands. Previous work has shown that bond graphs can be constructed from a stoichiometric matrix [26] and a series of reactions [27]. Nonetheless, further work is needed to help automate model construction, in particular making use of rule-based approaches for constructing models of highly complex interactions between proteins, ligands and receptors [28,29].
We appreciate that as with most software there is a learning curve and we have added a basic example to the Supplementary Material (Appendix C of S1 Text) to help aid uptake of this software by a systems biologist. This includes the construction of a bond graph from a reaction network.
4. Page 10, the authors mention K as the thermodynamic parameter, what is that exactly? I see that they cite the appendix for further info but a mention in the main text as to what K is would be useful.
We have now moved some of the relevant discussion from the appendix to the main text (pages 10-11).
Each chemical species (open circles in Fig. 2C) is associated with a chemical potential µ. In dilute systems at constant temperature and pressure, this quantity is related to abundance x by where x [mol] is the amount of the species, K [mol −1 ] is the thermodynamic parameter for that species, R = 8.314 JK −1 mol −1 is the ideal gas constant and T [K] is the absolute temperature. The parameter K is related to the standard free energy of the species; one can also write Eq. 1 as where c [M] is the concentration of species, V [L] is the volume of the compartment and µ 0 [J/mol] is the standard chemical potential taken at a concentration of c 0 (for simplicity, c 0 is often taken to be 1M). By equating Eqs. 1 and 2, K is related to µ 0 through the equation We have updated this sentence to cite previous work on this topic, including the reference suggested by the reviewer (page 20).
When viewed as an amplifier, negative feedback can increase the range of usable input concentrations at the expense of reduced gain [16,30,31].
6. Page 25 second paragraph it says that the mass-action model reached a different steady state. I don't think that can be true because at steady state the mass-action and Michalis formalisms as exactly the same. It must be that the bondgraph model is someone different. Differences in time scales I can understand but not differences in the steady state.
We first note that all of the models move to different steady states under the perturbation, since the perturbations are on external species and therefore prolonged. While the mass action and Michaelis-Menten models have the same steady state without the perturbation, it is not guaranteed that they move to the same steady state once perturbed.
In this particular case, changing the concentrations of the boundary species caused changes in boundary fluxes. Since all of the rate laws behave differently under the perturbed metabolite concentrations, this causes differences in steady states and would occur regardless of whether the model was a bond graph or kinetic model.
We have revised a sentence to clarify this point (page 26).
While all models have the same steady state without the perturbation, they settle to different steady states under prolonged perturbations due to differences in how the rate laws react to changes in the concentration of boundary species. Accordingly, we also quantified the magnitude of the shift in steady state using the Euclidean norm (middle row).
14. Mason  Over the past few decades, advances in both data generation and computational 2 resources have enabled the construction of large-scale kinetic models in systems biology, 3 including whole-cell models that represent every known biomolecule in the cell [1]. An questions about biology may be addressed in a holistic and systematic manner [3,4]. 9 The first comprehensive whole-cell model was developed in the Mycoplasma 10 genitalium [1] and there are ongoing efforts to develop whole-cell models of Escherichia 11 coli [5] and human cells [6]. However, it has been acknowledged that the highly manual 12 practices used in the development of the initial model of M. genitalium are unlikely to 13 scale up to more complex organisms. The biomodelling community has identified 14 several potential roadblocks to whole-cell modelling, including the lack of sufficient 15 biological knowledge and data, model incompatibility, inadequate model development 16 tools, inadequate model formats and parameter uncertainty [4,6]. 17 This paper addresses the issue of approaching model development in a modular 18 manner. Typical requirements for such model development strategies involve reusing 19 and integrating submodels together into more comprehensive models, and swapping 20 between alternative models of the same system for benchmarking and comparison [7][8][9]. 21 There have been several software-related developments in the systems biology 22 community focussing on improving the reusability of models, some of which are 23 beginning to be used in whole-cell modelling [10]. However, ensuring the reusability of 24 fully integrated cell models remains a challenge [11]. While adequate software 25 frameworks are essential to the modular development of large-scale kinetic models, an 26 understanding of the physics of biological systems is also necessary to address issues in 27 model compatibility and provenance. In particular, the laws of thermodynamics have 28 been invoked in the context of metabolic modelling, allowing modellers to estimate the 29 energetic favourability of reactions [12], to constrain fluxes in constraint-based 30 models [13,14] and to improve parameter estimation in large-scale kinetic 31 models [15][16][17] and whole-cell models [5]. This paper argues that the concepts of 32 module interconnection from physics and thermodynamics are consistent with current 33 model development practices in systems biology, and we suggest the use of bond graphs 34 (from the discipline of engineering) as a framework for unifying developments from both 35 software development and biological thermodynamics. 36 We begin by defining modularity in systems biology and arguing that an improved

Modularity in systems biology
Due to their complexity, large-scale models in systems biology need to be constructed 48 by dividing the problem into manageable submodels. Early notions of modularity in 49 biomodelling were borrowed from principles in engineering and software 50 development [18,19]. In those disciplines, modules can be defined as parts of a system 51 that (a) retain their own identity and are often developed and operated independently, 52 but interact with other parts of the system and (b) hide the details of their 53 implementation from the rest of the system, except through pre-defined interfaces [9]. 54 Using this notion of modularity, the parts of a module that are available for connection 55 and communication are said to be "exposed". 56 However, the definition above -which is also known as "black-box" modularity -is 57 not conducive to the incremental accumulation of knowledge that occurs in biology. 58 Advancements in our understanding of biology may force modellers to interface with 59 previously hidden components within existing models [7,9]. It is becoming increasingly 60 apparent that modules in biological modelling need to be more flexible than engineering 61 modules. As a result, the notion of modularity in systems biology is far less clear than 62 in the established disciplines of engineering and software development. In recent years, 63 systems biology has favoured the use of a "white-box" approach to modularity in which 64 modules do not completely hide the details of their implementation, but instead allow individual variables and components to be exposed as required [20]. 66 Broadly speaking, notions of modularity used in systems biology can be categorised 67 into computational modularity, the ability for models to communicate and interact with 68 each other in a physically consistent manner; and functional (or behavioural ) modularity, 69 (2) allowing model development and unit testing to be done on individual submodels; (3) separating the description of the model from its implementation; (4) allowing models to be iteratively updated with a record of how the equations and parameters were derived; and (5) allowing repeated motifs to be abstracted into reusable structures.
the ability of modules to be isolated from the effects of other modules. This paper will 70 focus on computational modularity. The role of functional modularity is pivotal to 71 systems and synthetic biology, particularly in reducing loading effects between 72 engineered genetic circuits [21]. However, functional modularity can only be analysed 73 and designed through the lens of computational modularity [22]. 74 If handled correctly, modular model development can provide a number of benefits 75 to modellers (see Fig. 1), including: 76 1. Enabling large-scale models to be built from smaller submodules that 77 communicate through clear and unambiguous interfaces. 78 2. Providing a framework for models to be developed, tested and validated in 79 isolation before incorporating them into larger models.   simulation protocols can themselves be specified using the simulation experiment 99 description markup language (SED-ML) [26]. standardised, machine-readable ontological terms [9]. This enables a strategy 103 where software can automatically compose separately developed models 104 together [9,20]. SemGen and semanticSBML are two software tools that 105 implement biological-level modularity [20,27]. SBML also supports both white 106 and black box modularity through its hierarchical package [28], which is 107 implemented in the graphical tool iBioSim [29]. 108 3. Programmatic approaches. The "programmatic approach" to modelling was 109 developed to allow models to integrate together in a flexible manner, but also to 110 address the relative inflexibility of standard modelling languages. In the 111 programmatic approach, models are treated as declarative programming objects 112 rather than mathematical equations [7]. Using this approach, models can be approach are little b [7] and PySB [30]. More recently, the BondGraphTools 118 package has been developed to introduce thermodynamics into such an 119 approach [31].

120
Note that these approaches are not independent of each other, and many modelling 121 frameworks use several of these approaches [32,33].  2. It remains difficult to resolve points of conflict between models, such as conflicts 127 between parameters and assumptions. 128 3. There is limited scope for dealing with multi-physics systems that arise in 129 electrophysiology and mechanochemistry.

130
Resolving these issues requires the conservation laws of physics to be embedded within 131 computational modules. Network Thermodynamics, using bond graphs, is a modelling 132 framework that fits with the requirement of developing physically consistent models, 133 while retaining compatibility with existing approaches. Thermodynamics, as a method for incorporating the laws of thermodynamics into 139 theoretical models of living systems [34,35]. This work followed in the tradition in 140 physics and engineering that if you "get the physics right", "the rest is 141 mathematics" [36,37]. Bond graph models are defined by combining constitutive There has been a long history of thermodynamic modelling for biochemical reaction 156 networks [5,13,15]. In this section, we introduce bond graphs as an intuitive method for 157 embedding such approaches within a modular framework.

158
An explicit graphical representation of biochemical systems 159 We first use simple example to illustrate the how the structure of a bond graph encodes 160 differential equations [35,41,42]. Consider the enzyme-catalysed reaction in Fig. 2A, 161 noting that all chemical reactions are thermodynamically reversible. Assuming that the 162 reactions follow the law of mass action, the system can be described using the 163 differential equations where the fluxes through the reactions v 1 and v 2 are given by In the above equations, the rate parameters k + 1 , k − 1 , k + 2 and k − 2 are defined in Fig. 2A, 166 and x E , x C , x S , x P are the concentrations (or molar amounts) of E, C, S and P 167 respectively.

168
Because it is often useful to consider rate laws independently from the stoichiometry 169 of the system, systems biologists may favour an expanded representation of the network 170 as shown in Fig. 2B, where the reactions and species reside in their own components.

173
The bond graph representation in Fig. 2C is a further expansion of the diagram in : where A f (A r ) is the forward (reverse) affinity, or the sum of chemical potentials within 197 the reactants (products). We note that while we have used K and κ as our parameters, 198 these values can also be expressed in terms of energetic quantities such as the free 199 energy of formation (see Appendix A.1 in S1 Text).

200
Therefore, the species in Fig. 2C encode the relationships and the reactions encode the relationships To obtain the correct fluxes for each reaction, the chemical potentials µ of the 204 species need to be correctly mapped onto the reaction affinities A f and A r . Because 205 reactions 1 and 2 are connected directly to µ C in Fig. 2C, it is clear that However, conservation of chemical potential (energy per mole) needs to be considered 207 when determining A f 1 and A r 2 . These affinities are related to the species potentials 208 through the relationships The above energy conservation equations are encoded within triangular ( ) 210 components (Fig. 2C), which constrain the model such that the sum of potentials of the 211 edges directed into the triangles is equal to those directed outwards. Note also that Kirchhoff's current law in electrical circuits.

221
Once Eqs. 1,7-10 are combined, one can derive the differential equations This thermodynamic formulation has the same form as the kinetic formulation 223 (Eqs. 1-2) with the parameters redefined as While the thermodynamic formulation contains more parameters (6)  parameters to avoid thermodynamically inconsistent model behaviour [15,44]. More 230 recently, the approach has been suggested for whole-cell modelling as a method for 231 resolving points of conflict between data [5].

232
Thus, a strength of bond graph models in this context is that the differential 233 equations of a biological network can be directly derived from the network structure, 234 which paves the way for the modular construction of such models.

235
A remark on notation: Traditionally, the bond graph representation uses a textual 236 notation for components rather than the graphical notation used in this paper.  Enzyme-catalysed reactions can be described by rate laws (Fig. 3B). The simplest of 252 these is the law of mass action in Eq. 6 ( Fig. 3B; white box). This can be substituted 253 for more complex kinetics, for example, the reversible Michaelis-Menten equation more complex mass action model [47,48].

261
In some cases, the full dynamics of the enzymatic reaction need to be considered.

262
The advantage of a modular representation is that groups of reactions can be of clinical relevance as they are implicated in both inflammation and cancer [52,53].

291
While the kinetic properties may differ between pathways, a strength of taking a 292 modular approach is that one can use similar network structures to account for many 293 MAPK cascades.

294
Core model 295 Here we construct a model of the Mos/MAPK cascade in a modular fashion using bond 296 graphs. The core MAPK cascade model we considered was based on a study by Huang 297 and Ferrell [50]. We chose this model in particular as it accounts for the elementary 298 mass-action between enzyme states, which is essential when dealing with systems with 299 coupled enzymatic reactions [54].    (Fig. 5C).

352
While this approach took a black-box approach to modularity in this example, recent 353 developments in bond graph modelling have enabled a more flexible white-box approach 354 to modularity [55]. In Appendix B, we outline how this model could be constructed using such an approach where each module is itself a simulatable model.

Incorporation of feedback 357
An advantage of using a graphical and modular representation is that it is relatively easy 358 to make incremental changes to models. We demonstrate this by modifying the model 359 in Fig. 4D -which we will now refer to as the "core" model -to include the effects of 360 feedback. Both positive and negative feedback loops exist in MAPK cascades, but these 361 often operate on molecules upstream of the cascade represented here [56]. To keep the 362 model simple, we incorporate the effects of feedback with a generic mechanism assuming 363 that the feedback operates on the input molecule MAP4K, and that the feedback occurs 364 due to a phosphorylation event that either activates or inactivates the kinase.    are not thermodynamically consistent. The bond graph approach builds on existing 389 work by using thermodynamically independent parameters to ensure that rate laws are 390 thermodynamically consistent [15,44,48]. In this section, we use glycolysis as an 391 example to demonstrate the ability of bond graphs to swap out rate laws for one 392 another and to benchmark the performance of alternative rate laws.

393
One can create a model of glycolysis by using the stoichiometry to define a high-level 394 reaction structure with swappable modules for each enzyme, and then choose an 395 appropriate rate law for each enzyme, depending on the fit to data (Fig. 7). Indeed, this 396 approach was taken by Gawthrop et al. [48].

397
A detailed rate law, used by Mason and Covert [5], is given by the equation whereκ is a rate parameter, S is the set of all reactants, P is the set of all products and 399 R b,z is the binding parameter associated with the substrate z. In the case of multiple For reactions where all reactants and products have a stoichiometry of one, the rate law 403 is identical to convenience kinetics [44]. However, when multiple stoichiometries are 404 involved (for example, in the enzyme pps, Fig. 7A), this contains additional parameters. 405 For this reason, we will refer to this rate law as the "generalised kinetics" (GK) rate law. 406 In many cases, it can be helpful to substitute complex rate laws with simpler ones, 407 for example, to ease parameter estimation or to make mathematical analyses more 408 tractable [61].  where the blue modules are free to be swapped depending on the rate law. Species without circles are considered to be external to the system, and in cases where they occur more than once, they are connected by equal potential components (•, omitted for compactness) to ensure mass conservation. (B) For illustrative purposes, we show the rate laws for the pgk enzyme. (C) The enzyme can be modelled using the mass action (top), Michaelis-Menten (middle) or generalised kinetics (bottom) rate laws. The notation for the mass action and Michaelis-Menten components are defined in Fig. 3B. Note that since generalised kinetics rate laws depend on the chemical potentials of all substrates (and not just their sums), they cannot be decomposed into smaller modules.   Table. 416 In order to obtain nonzero steady-state flows through the system, we assume that external species, or "chemostats" in bond graph terminology [47].

420
Transient perturbations 421 We firstly tested each of the models by perturbing the concentrations of each of the 422 internal species, causing transient shifts away from the reference steady state (Fig. 8). 423 The responses of the simplified models were firstly compared against the original 424 generalised kinetics model by calculating the response time, which was defined as the 425 time required for the system to return to within 5% of its maximum deviation. Distance 426 from the reference steady state was calculated using the Euclidean norm  We also tested the response of the models to prolonged perturbations with external 448 species, which caused the models to move to different steady states (Fig. 9). Once again, 449 the models were compared using the response time (top row) and concentration of PEP 450 (bottom row). While all models have the same steady state without the perturbation, 451 they settle to different steady states under prolonged perturbations due to differences in 452 how the rate laws react to changes in the concentration of boundary species.

453
Accordingly, we also quantified the magnitude of the shift in steady state using the 454 Euclidean norm (middle row). results from previous studies [62,63]. Comparisons between the generalised kinetics and 467 Michaelis-Menten models illustrate that while quantitative differences arise from result in novel insights and predictions [55].

477
The glycolysis pathway contains two points (fba/fbp and pyk /pps) at which carbon 478 species are cycled by two enzymes while dissipating energy. These futile cycles (or 479 "Cyclic Flow Modulation" [64]) are critical points of control, allowing the system to 480 switch between glycolysis and gluconeogenesis [65,66]. Much of this regulation is 481 performed by allosteric regulation, which is not accounted for in this model. pgi + pfk + fba + tpi + 2gap + 2pgk + 2gpm + 2eno + 2pyk.
Thus we can calculate the overall affinity of the pathway to be The total pathway affinity predicted by the model is higher than the experimentally 498 measured values [68]. Furthermore, all reactions contribute significantly to the overall 499 affinity, which differs from experimental measurements finding that the pfk and pyk are 500 the predominant contributors to overall affinity, with the other reactions near 501 equilibrium [68]. We note that for this particular model, the parameters were derived in 502 the absence of standard free energies of formation [5]. Thus, a natural improvement to 503 the model would be to use these values to parameterise models [69], which would likely 504 improve the fit to experimental data. readily converted into kinetic models, the conversion in the other direction is not always 552 possible as existing kinetic models often use irreversible reactions or fail to satisfy 553 detailed balance [47,72,73]. Thus, to maintain the advantages of using bond graphs, an 554 avenue of future work would be to automatically generate bond graphs versions of 555 existing biomodels, such as in the BioModels repository [74] or the Physiome Model elements of a bond graph [76]. This allows a kinetic model to be directly added to a 561 bond graph as a "black box" in which the energetics underlying its dynamic behaviour 562 are ignored.

563
Because bond graphs are inherently a modular and declarative representation, they 564 are well suited for taking advantage of developments in programmatic modelling where 565 models are constructed through a series of instructions from the software [7,30]. Indeed, 566 the models in this paper were constructed using the BondGraphTools Python 567 package [31], a highly flexible and automatable approach for model construction that 568 mirrors the approach of existing packages used within the systems biology community 569 [30,71]. Embedding bond graphs within a programmatic environment enables the 570 construction of models using higher-level descriptions, allowing modellers define models 571 using a relatively parsimonious set of commands. Previous work has shown that bond 572 graphs can be constructed from a stoichiometric matrix [48] and a series of 573 reactions [31]. Nonetheless, further work is needed to help automate model construction, 574 in particular making use of rule-based approaches for constructing models of highly 575 complex interactions between proteins, ligands and receptors [77,78].

576
While we used the Python package BondGraphTools for the benchmarking and 577 simulation in this paper, it would also be relatively straightforward to use typical 578 systems biology simulators (for example COPASI [25], Tellurium [71]  potentially through a separate package.

589
In writing this paper, we hope to motivate the uptake of network thermodynamic 590 models (and bond graph models in particular) by the systems biology community. One 591 of the objectives of the Physiome Project is to embed bond graphs within the CellML 592 modelling language [79]. This will facilitate the development of more user-friendly tools, 593 allowing the conversion of existing models into bond graphs and subsequently coupling 594 them. As the bond graph methodology gradually becomes more widely adopted by the 595 community, we anticipate that existing mathematical and computational methods in the 596 systems biology space will be used in conjunction with the bond graph approach, 597 allowing biomodels to be integrated in a physically consistent manner.

598
In conjunction with the programmatic approach, bond graphs provide a useful 599 framework for updating models and recording their provenance. In the development of 600  where κ ij is the rate parameter for reaction j of enzyme i and we set T = 310 K.
With the value of K XP determined, this procedure can be applied to the next cycle along the pathway and propagated throuout the entire cascade to find all thermodynamic parameters.
A comparison of the bond graph model with the kinetic model is shown in Figure S1. While the two models differ in the reversibility of the catalytic reaction, these differences turn out to be negligible under physiological values of ∆G ATP,hyd .
The same method was used to determine parameters for the feedback loops, but the κ parameters for the phosphatase enzymes were scaled up by a factor of 10. This was to allow the feedback to occur within the range of input concentrations where ultrasensitivity is typically observed.

A.3 Glycolysis model
The models of glycolysis were based on the E. coli model developed by Mason and Covert [2]. The generalised kinetics model used parameters directly from Mason and Covert, with parameters converted using the methods in § A.1. Out of the multiple parameterisations of the model, we chose the one with the shortest response time (bottom row of Fig. 5 of their paper, with penalty weight of 100), which appeared to be the most realistic given that the typical response time for glycolysis is on the order of seconds [5]. The dilution effects of growth were assumed to be negligible.
We then chose parameters for the mass action and Michaelis-Menten models to match important behaviours of the more complex generalised kinetics model. Since the species components are identical between all models, their thermodynamic parameters were unchanged between the models.
For the mass action model, we chose the reaction parameters κ such that the fluxes at the reference steady state would be identical to the generalised kinetics model. Specifically, the reaction parameter κ for each reaction was determined by rearranging Eq. 6 of the Main Text: where v was obtained by simulating the generalised kinetics model to steady state and the reaction affinities A f and A r was inferred from the chemical potentials of each species.
For the Michaelis-Menten model, we chose parameters to match the dynamic behaviour of the system under reference chemostat concentrations. The Michaelis-Menten equation contains only two binding parameters: one for the forward complex and one for the reverse complex. We chose these binding parameters to match the binding properties of the carbon species. For all reactions apart from fba (which we will deal with later as a separate case) there is only a single carbon species in the substrates and products; we denote these by s * and p * respectively.
Additionally, we can define which allows us to rewrite the generalised kinetics equation in the form of the Michaelis-Menten equation.
v =κ By comparing constants to Eq. 13 of the Main Text, we can determine the parameters for Michaelis-Menten kinetics under reference chemostat concentrations.
For the reaction fba, the above approach is not possible as there are two carbon products. The generalised kinetics and Michaelis-Menten rate laws for fba can be written as (S17b) Thus the two rate laws only differ functionally only in the presence of the binding terms for DHAP and GAP in the denominator. Because the two rate laws are identical when the terms involving µ DHAP and µ GAP are neglected, we setκ MM = κ GK and R b0 = R b,F16P . The final parameter R b1 was chosen to match v fba,ss , the flux of the reaction at steady state under reference conditions:

B A white-box modularity approach to the MAPK cascade
In the traditional bond graph approach to hierarchical modelling, modules are specified through the use of fixed ports that are unable to be changed after the development of a model. However, it is increasingly acknowledged that white-box modularity is important to constructing large-scale models in systems biology. This approach requires both the internal details of a model to be externally accessible and for connections between components to be modified as needed. Here we illustrate that bond graphs are compatible with white-box modularity by defining the MAPK cascade model in terms of white-box modules.
To help understand white-box modularity in the context of bond graphs, we first consider the merging of kinase and phosphatase models into a model of a phosphorylation cycle ( Figure S2A-C). In a white-box approach, the kinase and phosphatase modules are specified as closed systems with no external connections ( Figure S2A-B). The coupled phosphorylation loop model can then be defined as the composition of the constituent kinase and phosphatase models, together with a set of "merging rules" that define the components that are shared between the two models ( Figure S2, top panel). Here we denote the unification of n components using the notation (Component 1, Component 2, . . . , Component n) → Merged component (S19) When two or more components within different models are merged together, the original components are disconnected and removed from their original models, being replaced by external ports. These ports are then connected to a new shared component through a mass conservation law. We use forward slashes to represent the hierarchical structure of the components within modules; for example, "Kinase/X" refers to the X component within the Kinase module. Therefore, in the case of the phosphorylation loop, the shared components correspond to the unphosphorylated substrate X and phosphorylated substrate XP that occur in both the kinase and phosphatase modules ( Figure S2C). The red and blue colour code indicates the components and connections that are modified to merge the components together.
When two or more non-identical components are merged together, the modeller must decide which of the original components to use. This is an important feature of the bond graph approach; inconsistencies between models are flagged to the modeller to act on. To simplify the analysis of this example, we assume that the components being merged are identical. Dealing with parameter inconsistencies between models is the subject of future work.
Using this notion of model composition, the full model of the MAPK pathway in Figure 4D could equivalently be defined using the white-box definition in Figure S2D. The merging rules are grouped as follows: • Rules 1-6 connect identical species between each of the phosphorylation loops • Rules 7-9 merge the sources of ATP, ADP and Pi used in each of the loops • Rules 10-14 promote some species to the highest level of the model hierarchy for convenience As with other approaches to white-box modularity, this approach provides the modeller with additional flexibility, since models can be individually simulated at each level of the model hierarchy. From the perspective of software engineering, attributes can be assigned to components at each level of the modelling hierarchy. This could help allow a submodel's parameters to be inherited by a parent model, or to document a change to the parameter of a model during the merging process [6]. While the merging rules were manually specified here, it is envisioned that the components corresponding to each species could be labelled using annotations so that models can be merged automatically.

C Introduction to basic BondGraphTools functionality
This section outlines the construction of a simple bond graph model using Bond-GraphTools, to illustrate basic functionality and also to outline how model construction can be abstracted for biochemical models. Installation instructions can be found on the readthedocs page (https://bondgraphtools.readthedocs.io/ en/latest/).
In this section, we consider the simple biochemical reaction A + B ⇌ C. The bond graph for this system is shown in Figure S3, where the bonds have been numbered in blue for ease of reference.
A model of the system can be constructed using the code below: import BondGraphTools as bgt from BondGraphTools.actions import new, add, connect from BondGraphTools import draw, simulate After adding and connecting the components, we can define the parameters for each parameter as below. In this case, we set all species and reaction parameters to unity. With the full bond graph model defined, BondGraphTools provides functions for analysing the models.
The draw function renders the bond graph as an image, with the output in Figure S4. The constitutive_relations method prints out the differential equations of the bond graph. The simulate function will run a dynamic simulation of the model. The resulting solution with initial conditions (x A , x B , x C ) = (1, 2, 3) is plotted in Figure S5. To reduce the verbosity of this model description, one could also use a reaction network to build the bond graph. The above model could also be constructed using the code below. More resources for using BondGraphTools for biochemical modelling can be found at https://github.com/uomsystemsbiology/BGT_BiochemicalNetworkTutorials.