^{1}

^{2}

^{*}

^{3}

^{2}

^{*}

Wrote the paper: JFO PSS. Conceived research: JFO PSS. Designed research: JFO. Performed research: JFO VS. Conceived research.

The authors have declared that no competing interests exist.

Much of the complexity of biochemical networks comes from the information-processing abilities of allosteric proteins, be they receptors, ion-channels, signalling molecules or transcription factors. An allosteric protein can be uniquely regulated by each combination of input molecules that it binds. This “regulatory complexity” causes a combinatorial increase in the number of parameters required to fit experimental data as the number of protein interactions increases. It therefore challenges the creation, updating, and re-use of biochemical models. Here, we propose a rule-based modelling framework that exploits the intrinsic modularity of protein structure to address regulatory complexity. Rather than treating proteins as “black boxes”, we model their hierarchical structure and, as conformational changes, internal dynamics. By modelling the regulation of allosteric proteins through these conformational changes, we often decrease the number of parameters required to fit data, and so reduce over-fitting and improve the predictive power of a model. Our method is thermodynamically grounded, imposes detailed balance, and also includes molecular cross-talk and the background activity of enzymes. We use our Allosteric Network Compiler to examine how allostery can facilitate macromolecular assembly and how competitive ligands can change the observed cooperativity of an allosteric protein. We also develop a parsimonious model of G protein-coupled receptors that explains functional selectivity and can predict the rank order of potency of agonists acting through a receptor. Our methodology should provide a basis for scalable, modular and executable modelling of biochemical networks in systems and synthetic biology.

The complexity of biochemical networks challenges our ability to create quantitative and predictive models of cellular responses to extracellular changes. In these networks, the regulation of allosteric receptors and proteins by multiple drugs or endogenous ligands introduces “regulatory complexity” because a large number of parameters is required to describe such interactions. Protein interactions also give rise to “combinatorial complexity” by generating large numbers of protein complexes and covalent modification states. To address these twin problems, we propose a modelling framework that combines a modular description of protein structure and function with a rule-based description of protein interactions. We define the input-output function of an allosteric protein through its thermodynamic properties and structural components. We show that our “biomolecule-centric” methodology, in contrast to

A goal of biology is to understand the structure and function of the biochemical networks that underpin cellular decision-making. One organizing principle is that these networks are inherently modular

Efforts to tackle complexity in biochemical networks should also exploit the modularity of protein structure. Protein structure is hierarchical, and a given protein often has domains also present in other proteins or repeated subunits. For example, many signalling proteins contain SH2 or PDZ domains, and many receptors, ion channels and enzymes are multimers. In genetic networks, transcription factors are also often multimers or have a common DNA-binding domain, such as a zinc finger or homeobox. The re-use of protein domains is both a simplifying and confounding feature: once a domain has been characterized, that characterization can be used again, but it is also necessary to model molecular cross-talk between signalling pathways that contain proteins with similar structures.

^{37} states

Rule-based modelling addresses combinatorial complexity and allows biologists to specify the regulatory logic of a system

Rule-based formalisms can describe complex biochemical systems, but inherently offer little guidance on avoiding a number of methodological problems. First, using rules to specify the regulatory logic of a system does not address the system's regulatory complexity. Consider G protein-coupled receptors (GPCRs), which allosterically couple an extracellular ligand-binding site to an intracellular G protein-binding site

Second, a module should have a well-described function and be easily re-used and “portable” between systems, but most rule-based formalisms are not inherently modular. Modellers typically treat proteins as “black boxes” and define interactions using biochemical equations. In such “interaction-centric” approaches, the regulation of proteins is encoded by rules with

Finally, models generated by rule-based methods should be thermodynamically correct. In biochemical networks, there are often sets of reversible reactions that connect into a closed loop, forming a thermodynamic cycle. In many of these cycles no free energy is consumed: for example, when proteins bind multiple ligands, when ligands bind several conformations of a protein, or when ion channels bind multiple agonists and have closed, open, and desensitized states. Thermodynamics imposes a mathematical relationship between the equilibrium constants for all the reactions involved in such cycles: their product must be unity. Equilibrium constants cannot therefore be assigned independently. A thermodynamically correct methodology should ensure that a model satisfies this constraint, ideally by construction.

Here, we present a modular and scalable modelling methodology that alleviates the regulatory as well as the combinatorial complexity of biochemical networks. We first describe our modelling framework, which uses a thermodynamically grounded treatment of allostery in which ligands distinguish only the conformational state of allosteric proteins. We also introduce a rule-based modelling tool that implements our methodology: the Allosteric Network Compiler (ANC). We use ANC to examine how allostery can make macromolecular assembly more efficacious. We then show how our modelling framework describes common mechanisms of allostery by mapping the regulatory properties of a protein onto conformational changes in the protein itself and demonstrate how we can ease the analysis of multiple ligands interacting through an allosteric protein. Next, we discuss how our approach reduces regulatory complexity and thereby increases a model's modularity. Finally, we use our framework to develop a model of G protein-coupled receptors whose regulatory complexity scales with (L+G) instead of LG and consequently has greater predictive power. While our major goal is to introduce a new modular modelling methodology rather than its implementation, we have made ANC and the models we discuss available at:

Our method is based on the Monod-Wyman-Changeux (MWC) paradigm of allostery

Thus, an allosteric protein can be seen as a modular and dynamic computational device, and we can define the input and output of each allosteric component. The input is a “modifier”, a molecule that binds to and locally perturbs the structure of the component; the output is the fraction of time the component spends in each conformation when the allosteric transition is at equilibrium (see

An ANC model consists of a set of modular structures and interaction rules. Using our rule-based approach (Tables 1–5 of

(_{Y1} or Γ_{Y2}. Each of these interactions is also parameterized by a distinct Φ-value. (_{X} with X when A_{X} is in the _{Y} and Y, and we define the affinities K_{RX} and K_{TX} implied by the rates (in gray, e.g. K_{RX} = kf_{RX}/kb_{RX}). A covalent modification rule for the kinase K acting on an unphosphorylated (open dot) downstream target Y follows the Michaelis-Menten mechanism for enzyme-substrate interactions and yields a phosphorylated substrate (filled dot). (_{RT} is the allosteric equilibrium constant, while the regulatory factors Γ_{X} and Γ_{Y} are the differential affinity of the ligands to each conformation of A and are calculated by ANC using the rate constants given in the rules (e.g. Γ_{X} = K_{TX}/K_{RX}). The reaction network is converted into ordinary differential equations by _{Y} vs. X, with A_{TOT} = 1, Y_{TOT} = 1, K_{RT} = 10^{−3} K_{RX} = 0.1, K_{TX} = 10, K_{RY} = 0.01, K_{TY} = 100, arbitrary units).

The overall modelling process for a divalent adaptor protein and two ligands is illustrated in

The generic model of a divalent allosteric protein shown in

In the model, the binding of X and Y to A is cooperative because binding of X to A changes the affinity of A for Y by a factor θ and likewise the binding of Y to A changes the affinity of A for X also by a factor θ. By coarse-graining over the conformations of A (_{RT} is the allosteric equilibrium constant and Γ_{X} (or Γ_{Y}) is the differential affinity of the X (or Y) to each conformation of A. The cooperativity increases as the degree of bias (Γ_{X} and Γ_{Y}) that X and Y exert on the conformational transitions of A increase. We can also define the apparent affinity of X and Y to this coarse-grained A:_{X} of X to A and the affinity θK_{X} of X to A when A is bound by Y are measurable.

(_{RT} and the affinities of X and Y to each conformation of A (K_{RX}, K_{RY}, K_{TX}, K_{TY}) were chosen to yield a desired value of θ and with K_{X} = K_{Y} = 1. _{X}, K_{Y} and the concentrations of X and Y held constant, the efficacy of assembly depends only on the cooperativity parameter θ. (_{RT} on one axis and Γ_{X} and Γ_{Y} (assumed equal) on the other. Increasing Γ_{X} and Γ_{Y} always increases cooperativity, however θ has a maximum value as K_{RT} is changed.

Counter-intuitively, an excess of some components of a macromolecular complex can inhibit formation of the complex

Here, we show that allostery can mitigate the prozone effect, at least for a divalent allosteric protein. We consider the divalent structure A to represent a linking protein with X and Y being the remaining parts of a complex. In ^{4}, and the maximal amount of complex formed increases by a factor of 5.7 (Figure 11 of

That the efficacy of macromolecular assembly depends strongly on the value of the cooperativity parameter θ suggests that assembly could be modulated by changing θ. _{RT} = (Γ_{X}Γ_{Y})^{−1/2} and thus assembly of the XAY trimer could in principle be controlled through the binding of a cofactor or a covalent modification that changes the allosteric equilibrium constant of A from a value far from its optimum to a value near the optimum (or vice versa).

There are two well-known mechanisms for generating cooperative behaviour in proteins: concerted and sequential allostery. In their seminal paper, Monod, Wyman and Changeux introduced a two-state model to explain cooperative interactions in oligomeric enzymes and proteins

ANC-structures can be used to implement these models of allosteric regulation. A concerted model of a generic, homotetrameric protein is shown in

(_{LB}. (_{S} and the effect of each modifier on the kinetics of coupled components is parametrized by Φ_{LB} and Φ_{S}. (_{Q}, and the reciprocal interaction is parameterized by Φ_{T}. (

An advantage of ANC is its ability to easily formulate and simulate mathematically complex models. For example, we will show that the cooperativity of an allosteric protein binding a ligand, such as a transcription factor binding an inducer, can be substantially changed through adding a competing ligand. Although a mathematical analysis of various allosteric models with two competing ligands exists

The allosteric equilibrium of an unligated protein favours a state _{RT} = 10^{3}; for the sequential (tetrahedral) model K_{rt} = 0.1 and Γ_{S} = 10. Ligand affinities were set to K_{RLi} = K_{rLi} = (Γ_{i})^{−1/2} and K_{TLi} = K_{tLi} = (Γ_{i})^{1/2} with Γ_{0} = Γ_{1} = 0.01 (prefers _{2} = 1 and Γ_{3} = 100 (prefers

In addition to ligand binding, our methodology also describes other mechanisms for allosteric regulation that are ubiquitous in cellular signalling. Phosphorylation or other post-translational modifications, dimerization, receptor clustering and point mutations can also regulate or change protein function. Our thermodynamic framework (see

We can distinguish two types of parameters that affect modularity in different ways:

Our biomolecule-centric methodology minimizes regulatory complexity. For example, we analyzed a generic model of an ^{N}

Using our biomolecule-centric modelling framework, we can convert a non-modular model into a modular one. Such refactoring is also useful when a protein has more than two conformational states, unlike the core allosteric components in ANC-structures. To illustrate, we introduce a new model for the activation of G protein-coupled receptors (GPCRs). GPCRs are a common target for pharmaceutical drugs

Although several allosteric models have been proposed

A naive implementation of the cubic ternary complex model in our framework uses a divalent ANC-structure with a single allosteric component (

The mapping between the cubic (A) and quartic (B) models shows how the two models are related. (_{act} is the unligated allosteric equilibrium constant, K_{a} and K_{g} are ligand affinities to the reference (inactive) state, and α and β are ratios of affinities. We parenthesize the cooperativity parameters δ and γ to indicate that these parameters of the cubic ternary complex model have to be added as _{actG} and K_{actL} are the unligated allosteric equilibrium constants, Г is the regulatory factor linking the _{a}′ and K_{g}′ are ligand affinities to the reference state

To resolve this difficulty, we propose a sequential allosteric model of the GPCR with two coupled allosteric components: an extracellular allosteric component, which binds a ligand, and an intracellular allosteric component, which binds a G protein (

Our quartic ternary complex model can be projected onto the cubic model by defining coarse-grained variables that sum over the conformations of the extracellular allosteric component (

Our quartic model for the GPCR is more modular and parsimonious than the cubic model because it includes a structurally and biophysically plausible mechanism for how ligands and G proteins interact cooperatively with the GPCR. We encode the logic of these regulatory interactions in the protein's ANC-structure using intensive parameters, rather than in

The quartic model also has more predictive power than the cubic model and therefore can be more rigorously tested. For each pair of ligands and G proteins, the cubic model requires the specification of two cooperativity parameters, δ and γ, specific to that pair. It is therefore limited in the predictions it can make. For example, for each new G protein added to the system, new cooperativity parameters are needed for all previously characterized ligands to be able to predict the new G protein's GPCR-mediated response to these ligands. In contrast, the quartic model is completely characterized for the new target pathway by measuring four extensive parameters – one for each conformation of the GPCR – and we can then predict the GPCR-mediated response to all ligands. In particular, we can predict the rank order of potency of the ligands to activate the new pathway, a standard means to compare agonists in pharmacology, and detect functional selectivity

Like the cubic model, the quartic ternary complex model also explains functional selectivity, though this is not obvious considering that these models cannot be related through a simple projection when multiple ligands and G proteins interact with a single receptor. Indeed, in the quartic model δ and γ are not free parameters but are correlated because of their dependence on underlying rates. We therefore simulated the GPCR-mediated response to several ligands that cause (in)activation of two different G proteins (

(_{sa}G+R_{ta}G+LR_{sa}G+LR_{ta}G) as a fraction of the total number of receptors and against the concentration of ligand (arbitrary units). The concentrations of receptor and G protein are unity. Parameter values: KactL = 1, KactG = 0.05, Γ = 1, affinities for L1 are given by: (Ka′, α_{t}, α_{a}, α_{at}) = (10,0.1,10,1), for L2: (1,20,20,400), L3: (0.1,10,10,0.01), L4: (100,0.1,0.4,0.01) L5: (20,20,0.05,5), G1: (Kg′, β_{t}, β_{a}, β_{at}) = (10,0.1,10,1) and G2: (1,10,10,100).

The quartic model is modular and therefore is easily extended to include additional signalling interactions such as the regulation of the receptor by allosteric ligands

Biochemical networks are complex yet modular: networks exhibit both combinatorial and regulatory complexity, but individual proteins have intrinsic functional properties that determine how they detect and process information. Complexity is also reduced because similar proteins or similar protein domains appear in many signalling pathways and often interact with similar protein partners. We propose a modelling methodology, embodied by ANC, that exploits the modularity of proteins to reduce the complexity of modelling biochemical networks. Given modular ANC-structures, which encode a protein's regulatory properties, adding new interactions to an ANC model usually requires substantially fewer parameters than with other rule-based models, particularly as the promiscuity of binding of proteins, and hence the complexity of the network, increases. ANC-structures are also portable because different signalling pathways are modeled by simply “re-wiring” proteins rather than through writing new

In our methodology, models are structured to minimize regulatory complexity both to avoid over-fitting data and because large numbers of biochemical parameters are difficult to measure

Our methodology reduces the regulatory complexity but increases the combinatorial complexity of a system because each conformation of an allosteric protein introduces a new state. Thus, a reduction in regulatory complexity incurs the computational cost of modelling additional species. Nevertheless, recent advances in rule-based modeling have introduced new methods that allow fast simulation of systems with large numbers of chemical species and reactions

We also make a first step at integrating free energy-based constraints into a rule-based modelling framework, adding to earlier work on imposing detailed balance in models of biochemical networks

Two other advantages of our modelling framework are significant. First, ANC-structures enable a coarse-grained hierarchical description of physical structure by requiring the specification of protein domains and if desired tertiary and quaternary structure, including oligomeric receptor clusters. ANC-structures can also model the internal geometry of a protein by describing those domains of the protein that interact allosterically and those that do not (

Our modelling framework encourages the modeller to develop a mechanism to explain the regulatory properties of a protein and hence to build models that have predictive power and so can be experimentally tested. For example, an ANC model of the activation of GPCRs suggests that the well-known cubic ternary complex model has implicitly coarse-grained some conformations of the GPCR. By including these conformations in an ANC-structure, our new quartic model prevents over-fitting and has the potential to predict the rank order of potency and efficacy of ligands acting through a GPCR. This model of the GPCR has two linked allosteric components, each with just two conformational states that interact independently with other molecules. These mechanistic assumptions do not, however, apply to the GPCR as a whole, which has four conformational states. Thus, while the two-state assumption may not hold for all proteins, other mechanistic models can be accommodated within our framework.

Having allostery at its centre, our framework can suggest simple mechanisms through which the cell might regulate and increase the efficacy of cellular processes. For example, the assembly of macromolecular complexes can be considerably undermined through the prozone effect when linker proteins are over-expressed

A challenge in designing synthetic biological systems is to have predictive modelling tools. Here, ANC has several potential advantages. First, the modularity of ANC-structures allows models of synthetic systems to be straightforwardly extended: for example, as different synthetic subsystems are combined to generate more complex behaviour

Faced with the complexity of cellular signalling and genetic networks, researchers are developing new computational methods to quantitatively model and predict cellular behaviour despite that complexity. In this spirit, we have identified and discussed a distinct form of complexity – regulatory complexity – which arises from the allosteric regulation of proteins. Combining and extending established biophysical principles with more recent rule-based methods, we propose a modular and scalable methodology, exemplified by our Allosteric Network Compiler, to describe the complexity of cellular signalling. By emphasizing the allosteric control of proteins, we capture the inherent modularity of protein structure and function exploited by cells themselves. Our method is a general, principled and simplifying addition to any modeling framework.

To compute how multiple modifiers collectively bias the conformational equilibrium of an allosteric component, we use thermodynamics

We assume that all modifiers interact independently (non-cooperatively) with each conformational state of the protein component, with the energy of interaction to the

Equation (6) describes the input-output function of an allosteric component, which may embody a domain, a subunit, or an entire protein. The output _{i}_{i}_{i}

To compute how the kinetics of a component's allosteric transition are affected by the presence of modifiers, we first write the forward and backward rate constants for the unmodified component in terms of the free energy difference between the transition state (denoted †) and each conformational state _{i}.

We choose this parameterization because Φ_{i} = Φ_{j} implies the existence of a linear free energy relationship for two modifiers

We validated our overall methodological flow (

ANC possesses a number of features which ease modelling and simulation of biochemical networks. First, ANC allows users to parameterize a model so that parameter values can be changed after compilation. Also, ANC supports

Using Facile

The current implementation of ANC has three principle limitations. 1) The reaction network is enumerated, so ANC's performance may degrade significantly if the compiled network is large. 2) Only rules for binding and Michealis-Menten interactions can be created. 3) While ANC supports unimolecular association and dissociation, detailed balance is enforced only for cycles comprising purely bimolecular associations.

Supplementary information. This file contains all supplementary information for the article.

(4.94 MB PDF)

We thank Vincent Danos, James Faeder, Catherine Lichten, Terry Hébert, David MacLean, Bruno Martins and Victor Neruda for useful discussions and critical readings of the manuscript.