Memory functions reveal structural properties of gene regulatory networks

Gene regulatory networks (GRNs) control cellular function and decision making during tissue development and homeostasis. Mathematical tools based on dynamical systems theory are often used to model these networks, but the size and complexity of these models mean that their behaviour is not always intuitive and the underlying mechanisms can be difficult to decipher. For this reason, methods that simplify and aid exploration of complex networks are necessary. To this end we develop a broadly applicable form of the Zwanzig-Mori projection. By first converting a thermodynamic state ensemble model of gene regulation into mass action reactions we derive a general method that produces a set of time evolution equations for a subset of components of a network. The influence of the rest of the network, the bulk, is captured by memory functions that describe how the subnetwork reacts to its own past state via components in the bulk. These memory functions provide probes of near-steady state dynamics, revealing information not easily accessible otherwise. We illustrate the method on a simple cross-repressive transcriptional motif to show that memory functions not only simplify the analysis of the subnetwork but also have a natural interpretation. We then apply the approach to a GRN from the vertebrate neural tube, a well characterised developmental transcriptional network composed of four interacting transcription factors. The memory functions reveal the function of specific links within the neural tube network and identify features of the regulatory structure that specifically increase the robustness of the network to initial conditions. Taken together, the study provides evidence that Zwanzig-Mori projections offer powerful and effective tools for simplifying and exploring the behaviour of GRNs.


Introduction
Biological systems are complex, comprising multiple interacting components. In many cases this complexity makes it difficult to identify underlying mechanisms and to understand the function of a system. Gene regulatory networks (GRNs) are an example of this problem [Levine and Davidson, 2005]. A GRN comprises the set of interacting genes responsible for the development, differentiation or homoeostasis of a tissue and provides a formal system-level, causative explanation for gene regulation. In physical terms, a GRN consists of modular DNA sequences -cis regulatory elements -that bind to specific sets of transcriptional activators and repressors, which control the expression of associated genes. Some of the regulated genes are themselves transcriptional regulators. Thus, at the core of a GRN is a recursive set of regulatory links that forms a transcriptional network, the dynamics of which is responsible for the spatial and temporal patterns of gene expression.
Attempts have been made to map large transcriptional networks, yet even for relatively small networks, the number of links and the feedback within the system make intuitive understanding difficult to obtain. Various computational models have been developed to address this. Logical models, which describe regulatory interactions qualitatively, provide a flexible and simplifying formalism to explore and understand the behaviour of a network [Glass and Kauffman, 1973]. However, these approaches are unable to capture subtler features of a network that depend on specific aspects of the timing or concentration of components of the network. For this, continuous models based on, for example, ordinary differential equations (ODEs) are often employed [Mogilner et al., 2006, Craciun et al., 2006. These models describe gene regulation in much greater detail, and we have well-developed mathematical theories that provide powerful tools to distill the dynamical details of such systems. These have been used successfully to gain insight into the operation of GRNs and suggest explanations for otherwise difficult to understand behaviours, including emergent phenomena such as self-organisation, oscillations, spatial patterning, and scaling of pattern size [Kondo and Miura, 2010, Novák and Tyson, 2008, Perkins et al., 2006, Umulis et al., 2010. A disadvantage of this approach, however, is that ODE models often require a large number of parameters, including binding affinities, degradation rates, production rates etc. While it is often possible to numerically analyse large systems, the power of analytical tools is in many cases lost due to lack of parameters or the complexity of the network.
Various techniques have been developed to reduce the complexity of models yet preserve the main features of the behaviour of the system [Okino and Mavrovouniotis, 1998, Feliu and Wiuf, 2013, Cardelli, 2014. One such dimensionality reduction strategy, relevant to GRNs, is the construction of reduced models focused on a subnetwork [Alon, 2007]. Such a subnetwork describes in detail part of the network embedded within the rest of the system; we refer to this remainder as "bulk". Analysing the behaviour of the subnetwork and its interaction with the bulk can help reveal and rationalise the properties of the entire network. The relevance and applicability of these techniques depends on methods that efficiently and reliably simplify the bulk and describe how the subnetwork affects and is affected by the bulk.
The Zwanzig-Mori projection approach provides one means of achieving this aim. It was originally developed as a technique to allow the extraction of macroscopic equations from a microscopic description of the dynamics [Mori, 1965, Zwanzig, 1961; a brief overview can be found in e.g. [Ritort and Sollich, 2002]. The method has since been used to separate a network into a subnetwork and bulk in ways that preserve the original temporal dynamics [Rubin et al., 2014, Rubin andSollich, 2016]. Used in this way, the approach describes and tracks the concentration of components in the subnetwork in detail, while the activities of the species in the bulk are replaced with so-called 'memory functions'. These memory functions are derived from the detailed kinetic description of the remainder of the network and summarise how, by acting via the bulk, the past behaviour of the subnetwork influences its current state (Fig. 1). The resulting memory functions are functions of time difference, describing the amplitude of a signal returning from the bulk some specified time after the original signal left the subnetwork. Additionally, it is possible to separate each memory function in order to analyse through which bulk species the signal is flowing, thus providing a means to gain a more systematic understanding of a system.
Evolving from its original applications to critical dynamics and supercooled liquids near the glass transition [Götze and Sjogren, 1992], this approach has been applied to biochemical networks [Rubin et al., 2014], where it has been used to analyse the dynamics of signaling in the EGFR (epidermal growth factor receptor) network. It has since been developed further to include also enzymatic Michaelis-Menten reactions [Rubin and Sollich, 2016]. Here we apply the Zwanzig-Mori approach to the analysis of GRNs. We derive a general method that is applicable to any thermodynamic state ensemble representation of protein dynamics including transcriptional networks. These thermodynamic equations mechanistically represent interactions between proteins via transcriptional regulation [Bintu et al., 2005] and can be expanded into mass action reactions to which the Zwanzig-Mori projection formalism can be applied. We test the method on a cross repressive motif, where we show that we are able to reproduce the full system's dynamics and obtain a simple intuitive interpretation of the memory functions. Finally, we use the approach on a GRN from the vertebrate neural tube, a well characterised developmental GRN composed of four interacting transcription factors [Cohen et al., 2014]. This reveals the importance of specific links within the network and identified features that appear to be present primarily to increase the robustness of the network to initial conditions, rather than maintain the steady states. Taken together, the study provides evidence that Zwanzig-Mori projections are a powerful and efficient tool to simplify and explore the behaviour of gene regulatory networks.

Expanding thermodynamic equations into mass action networks
Several classes of model have been developed to describe gene regulation. One such class consist of thermodynamic state ensemble models in which equations are constructed that relate interactions between key components of the regulatory mechanism to gene expression [Shea and Ackers, 1985, Bintu et al., 2005, Sherman and Cohen, 2012. In these models, all possible states of a regulated gene are enumerated. Each state consists of a specific set of transcription factors bound to a DNA cis-regulatory element, Subnetwork Bulk Time Figure 1: Sketch describing the concept of memory within a reaction network. A network is divided into bulk and subnetwork, x axis represents time. Concentration changes in the subnetwork act as signals that leave the subnetwork and travel into the bulk. There they interact with other bulk species and return at a later timepoint via bulk-to-subnetwork interactions. The net effect of such interactions is thus that the subnetwork reacts to its own past. The precise influence of past subnetwork states is governed by memory functions that depend on the time difference, i.e. on how long ago the relevant signal has left the subnetwork.
weighted by the affinity of interactions. The rate at which a protein is produced is then represented as a ratio of the total weight of the subset of states that promote gene expression, to the total weight of all states. We define the vector x such that it contains the protein concentrations for each gene. We can then generically define the time evolution for any protein concentration, indexed by j, as Here β j is the decay rate of protein j. We use the summation over n to represent all possible binding conformations for the DNA that produces protein j. The term α (j,n) represents the production rate of protein j from a particular DNA conformation n. The factor w (j,n) is the specific affinity for individual proteins to the DNA that produces species x j . Finally x (j,n) is the concentration of DNA producing protein j and in conformation n. This is assigned in (2) as the weight of the conformation, dependent on the protein concentrations in a mass-action form, normalised by the total weight of all conformations. This ensures that x (j,n) lies in the range [0, 1], so it is not an absolute concentration; any overall DNA concentration scale is therefore to be understood as incorporated into the protein production rates α (j,n) . As made explicit in (2), the DNA conformation label n is a collection of integers n i counting how many copies of protein i are bound to the DNA. The above thermodynamic equations (1,2) with their nonlinearities of arbitrary order are not in a form to which the Zwanzig-Mori projection can be readily applied. For this reason we need to expand the network into one of unary and binary reactions. Fortunately, the way to achieve this is already suggested by (2): we represent each possible existing DNA conformation as an additional species whose concentration has its own dynamical evolution, such that in an appropriate limit of fast DNA binding and unbinding, it reaches the quasi-steady state (QSS) values of the DNA conformation concentrations (2). The prefix "quasi" refers to the fact that this steady state is for given protein concentrations, which themselves change slowly in time. Fig. 2A illustrates this expansion procedure for a very simple regulation network, a cross repressive motif. In this motif, each gene is repressed through one binding site for the other factor. This system can be expanded to represent all possible DNA conformations, bound and unbound. This necessitates defining new reaction rates for binding and unbinding that are consistent with the affinities w from the thermodynamic description, which can be done most simply by imposing detailed balance with respect to the desired DNA conformation steady state.
To be explicit, we define k p+ (j,n) as the rate constant for protein p to bind to DNA coding for protein j and in binding conformation n, with a similar meaning for k p− (j,n) . These rates will thus describe the interaction each DNA species has with the protein species that bind to it. With these rate constants we write down a master equation for the time evolution of each DNA concentration x (j,n) : In this way we create a set of equations that has been extended by including the concentration of DNA conformations, as shown graphically by the four outer nodes in the network in the centre of Fig. 2A. The first term on the r.h.s. of (3) represents binding of protein p to DNA conformation (j, n − e p ) to make DNA conformation (j, n). Here e p is a unit vector whose i-th entry is δ ip , i.e. a 1 at position p and zeros elsewhere. Similarly the third term on the right of (2) describes unbinding of protein p. The other two terms capture how DNA in conformation (j, n) can be lost by these two processes. We next establish the relationship between the rate constants k in the mass action representation (2) and the affinities w of the thermodynamic form. The rate constants need to be chosen to ensure the QSS (2) of the DNA concentrations. The simplest assumption is detailed balance, which requires that in this steady state the net change in DNA concentration from any individual pair of binding and unbinding reactions between two conformations vanishes. Detailed balance means, in particular, that where there are multiple reaction paths from one DNA conformation to another, which might be distinguished by the order in which the various proteins bind, each path carries zero reaction flux rather than having a preferred forward or backward reaction direction. Balancing in this way the net change from binding of protein p to DNA in conformation (j, n) with the reverse unbinding transition, for the QSS concentrations (2), gives The protein concentrations cancel from this as they should and we are left with a constraint on the rate constants: To finish our construction, we need to ensure that the DNA concentrations reach the QSS values that we have imposed above with the detailed balance constraint (5). This requires that we make all DNA binding and unbinding reactions fast. Formally we introduce a fast rate factor γ and write all rates in the form k p± (j,n) = γk p± (j,n) , with the understanding that we will take the limit γ → ∞ at fixedk. This construction is analogous to the one previously used for Michaelis-Menten reactions [Rubin and Sollich, 2016]. As there, we also need to ensure that the changes to protein concentrations from binding to DNA are negligible, as such effects are not described by the original thermodynamic equations (1). This is achieved by scaling all DNA concentrations as x (j,n) =x (j,n) /γ . The intuition for this construction is that the amount of protein bound to DNA is, at most, of the order of the total concentration of DNA, which vanishes for γ → ∞. Finally, to compensate for the DNA concentration scaling, protein production rates have to be scaled as α (j,n) = γ α (j,n) .
With the above rescalings, the limit γ → ∞ ensures that the "fast" species, i.e. the DNA concentrations are always in QSS with respect to the "slow" protein concentrations. The limit γ → ∞, on the other hand, means that the effects of binding to, and unbinding from, DNA can be neglected in the equations for the protein concentrations. These statements can be shown mathematically along the lines of the arguments in [Rubin and Sollich, 2016]. After dropping all tildes on rates and concentrations again -this is the notational convention we adopt for the rest of the paper -the time evolution of the concentrations in our expanded network is given by (1) for the protein equations, while for the DNA species one has Note that in the above argument we do not require a specific relation between γ and γ . In reality the rate for producing a single protein from the appropriate DNA, which is of order γ , is lower than the DNA binding/unbinding rates, which are O(γ). However, quantifying this would be non-trivial as the  are expanded into mass action reactions with an appropriate timescale separation (centre). This generates additional nodes that represent the possible DNA conformations for both proteins, with e.g. g1DNA/g2Prot indicating DNA for gene 1 with protein 2 bound to it. To the expanded network we can apply the projection approach, retaining only the concentration of protein 1 in the subnetwork. The effect of the rest of the network -the bulk -is captured via memory terms (right). (B) Comparison of the cross repressive motif described via the original thermodynamic equations and the expanded mass action equations with and without timescale separation. Already for a moderate fast rate factor of γ = 10 the mass action and thermodynamic time evolutions are visually indistinguishable. Two time courses are plotted in figures (B-D), for g1Prot & g2Prot. (C-D) Demonstration of the projection approach with g1Prot in subnetwork and g2Prot in bulk. The projected equations track the dynamics of the original thermodynamic equation with a reduced system that contains memory functions. However, one can observe in (C) that accuracy can be lost in the transient if the system is initiated too far away from the fixed steady state. (D) Starting nearer to the steady state (at t = 1) substantially increases the accuracy of the projected description. (E) Example of memory functions of protein 1 to itself, decaying with time difference. The linear memory function is positive as expected from the network (positive feedback loop), while the nonlinear term is negative to correct for range-limiting nonlinearities that the linear terms cannot capture. model described here simplifies many biological steps into each of the two elementary processes of protein production and binding/unbinding. In any event, for our argument to hold we only need both γ and γ to be large, and for simplicity we take then γ = γ as was done in [Rubin and Sollich, 2016].
Below we will refer to the DNA species in the system as "fast", as expressed mathematically by the factor γ on the r.h.s. of (6), while protein species are "slow". Note that the rate constant for protein production is also fast, of order γ , but that the resulting changes to protein concentrations are slow because of the low concentration of (protein-producing) DNA. Numerical simulations of the expanded mass-action network for our cross repressive motif network (Fig. 2B) show that already with a moderate value of γ = 10 the predicted time courses of protein concentrations are almost identical to those predicted by the thermodynamic equations. (Note that in these simulations we do include in the protein time evolution equations the terms arising for finite γ from DNA binding and unbinding.) As γ → ∞ the trajectories from the mass action equations and the thermodynamical equations become identical. The equations used for Fig. 2 are of the form of (1) & (2): and conversely for g2Prot. Parameters used are α 1 = 1, α 2 = 1, w p1 = 1, w p2 = 2, w 1 = 2, w 2 = 2, β 1 = 1/2, β 2 = 1/2.

Setup of projection method
We now use the Zwanzig-Mori projection method to extract from our expanded mass action network the dynamics for the slow variables that are in the subnetwork. We follow the methodology and notation of [Rubin and Sollich, 2016], generalising from the Michaelis-Menten enzyme kinetics case to our more general networks of proteins and DNA conformations. In this section we set up the framework, then work out the projection for the case of linearised dynamics around a steady state (section: 2.3), before moving on to the nonlinear case in the following section. We first represent the mass action reactions in matrix form. This is done by expanding the time evolution equations (1,6) around a steady state with concentrations y i and y (j,n) , defining δx i = x i − y i and δx (j,n) = x (j,n) − y (j,n) . The r.h.s. of (1,6) then contains only linear and quadratic terms in the δx: constant terms are absent because of the steady state condition. If we define a vector z that collects all δx variables and all their products such as δx 2 1 , δx 1 δx 2 , this means we can write the time evolution equations in the form [Rubin et al., 2014] for an appropriately defined matrix L with entries L βα . This representation of the dynamics is exact except for the quadratic observables in z: their time evolution equations contain quadratic and cubic terms, as can be seen from e.g.
The latter are neglected in (8), in the spirit of an expansion to second order in the changes from steady state δx. These changes are therefore implicitly assumed to be small.

Projected equations for linearised dynamics
For the rest of this subsection we simplify further and ignore all quadratic observables in the time evolution equations (8), effectively linearising the dynamics around the steady state. We then have three types of observables in z: the (concentration deviations from steady state of the) subnetwork proteins, which we collectively denote x s ; the bulk proteins x b that we choose not to track explicitly; and the DNA conformations x a , which we also project out by assigning them to the bulk. To make the bulksubnetwork split obvious we also sometimes use S = {s} and B = {b, a} to indicate the collection of all (linear) observables in the subnetwork and bulk, respectively. The L matrix for the linearised dynamics can then be divided into blocks according to the different observables in z: The zero blocks arise from the fact that bulk proteins do not feature in the time evolution equations for subnetwork proteins (first column) and vice versa (second column). From L in the block form (10) one can calculate the terms in the (linear) projected dynamics of the subnetwork concentrations [Rubin et al., 2014], which has the general form: Here N s is the number of subnetwork proteins. The Ω ji are the elements of the rate matrix, which is given simply by the subnetwork-block of L, The M ji (∆t) are the memory functions, which describe how past concentration fluxtuations δx j (t ) influence the current time evolution. They form the elements of a memory matrix, which can be expressed in terms of the blocks of L as [Rubin et al., 2014]: The r i (t) in (11) are so-called random forces. They arise from the uncertainty about the initial (time t = 0) state of the bulk concentrations, given that the past before that is unknown. For the linear dynamics they can be expressed in closed form [Rubin et al., 2014], but further characterization would require knowledge of the statistics of the initial bulk fluctuations, which is not generally available. We therefore disregard these terms, thus making (11) a closed system of equations for the time evolution of the subnetwork concentrations.
The expression (13) for the memory function still explicitly involves the fast rate factor γ and the rate constantsk p± (j,n) used in the construction of our expanded network. These parameters appear in the third column (in the second expression for L) in (10), which encodes the time evolution equations for the DNA species. Our remaining task is to take the limit γ → ∞, with the aim of obtaining an expression for the memory function that only involves parameters of the original thermodynamic equations (1,2). To take this limit, one notes that all blocks in the third column of (10) are proportional to γ. As the memory function contains an exponential of L, it will then have contributions decaying for time differences of order 1/γ, arising from the DNA dynamics, as well as slow contributions decaying on timescales of order unity.
For γ → ∞ the fast memory function contributions become effectively instantaneous and add to the rate matrix, and only the slow contributions remain in the memory function. This separation into fast and slow pieces of the memory can be performed using the method in [Rubin and Sollich, 2016] because L has exactly the same division into fast and slow blocks as there. It leads to the simple result that the final rate matrix and (slow) memory function for γ → ∞ can be calculated from an effective L-matrix no longer involving the fast degrees of freedom. To state the expression for the effective L, we first isolate the slow piece of our original L: The effective L-matrix that gives us the rate matrix and memory function for γ → ∞ is then of the form L eff = L \a + ∆L \a , which written out reads: Here the second term including the minus is the additional contribution ∆L \a from the fast degrees of freedom. The matrix L eff can be inserted directly into (12) & (13) to obtain the projected equations for linearised dynamics, with the fast rate limit already taken.
We derive in the next two subsections that, remarkably, L eff can be constructed directly from the original GRN equations without requiring reference to the extended network. The resulting method is summarized in the last paragraph of section: refsec:equivLin so readers wishing to skip the mathematical details of the derivation can jump straight there.

Fast binding limit as Quasi-Steady State elimination
As in [Rubin and Sollich, 2016], the result (15) has a simple interpretation: L eff can be obtained by eliminating the fast degrees of freedom from the linearised time evolution equations using a QSS condition. Using the block form (10), these equations can be written as All blocks of L appearing in (16c) are proportional to γ. This justifies elimination of the fast variables using a QSS condition for these fast (DNA) degrees of freedom, effectively setting the r.h.s. of (16c) to zero. The fast species concentrations are then expressed as: If we now substitute this back into the time evolution equations (16a,16b) we obtain: Writing these time evolution equations in matrix form gives exactly the effective L-matrix L eff defined in (15), as claimed. Two terms have been highlighted here by brackets for later comparison.

Equivalence to linearisation
So far we have shown that the rate matrix and memory functions for the linearised dynamics can be found by applying standard projection results to a set of time evolution equations for only the slow degrees of freedom. This reduced set of equations is obtained by first expanding the thermodynamic equations into a set of mass-action equations, linearising, and finally eliminating the fast variables using a QSS assumption. We next show that this procedure is equivalent to directly linearising the original thermodynamic equations. To verify this equivalence, it is useful to write the expanded mass-action equations in the generic form where x s , x b and x a are generic components of x s , x b and x a , respectively. This expanded description is constructed so that for γ → ∞, when the x a can be replaced by their QSS values x * a , one recovers the original thermodynamic equations. These can therefore be written as where the dependence of the x * a on x s and Expanding now the thermodynamic equations to linear order around a steady state one has Here all derivatives are evaluated at the steady state, and the terms in the second line arise from the variation of the QSS values of the fast variables. The required coefficients of the type ∂x * a ∂xs can be found by differentiating (22) with respect to the relevant slow variable, here x s : We can now show that (23) is identical to (18) derived above. To see this, we note that by linearising the expanded mass action equations (19) one can identify the entries of L: for example, L b,s has entries ∂R s /∂x b , L s,a has entries γ ∂R a /∂x s etc. The QSS coefficients arising from (24) can then be written in matrix form as ∂x * a /∂x s = −(L s,a (L a,a ) −1 ) sa , with a similar expression for ∂x * a /∂x b . Inserting these expressions into (23) does indeed lead directly to (18), with matching terms indicated by the brackets. An exactly analogous argument demonstrates the equivalence for ∂ t δx b .
We briefly summarise our method for obtaining linearised projected equations that describe the dynamics of a subnetwork within a larger GRN written in thermodynamic form. First expand the GRN equations to linear order around a steady state. Then construct from this the matrix L eff and partition this into "s" and "b" blocks according to the chosen subnetwork-bulk split. Finally determine the rate matrix and memory function matrix from (12,13). Although we needed to invoke an expanded network of binary reactions involving DNA conformations to derive that this is the correct method, its application only requires the original thermodynamic equations as input. Therefore the resulting projected equations are, as they should be, independent of the details of the ratesk p± (j,n) used in the construction of the expanded network.

Nonlinear elimination of bulk DNA
To capture more of the nonlinearity inherent in GRNs we now extend the approach described in the previous section, to include terms that are quadratic in the deviations from a steady state. Once again we start from an L-matrix, now defined for linear and quadratic observables through (8). We partition this into blocks according to Heres contains the "subnetwork only" observables {s} (linear) and {ss} (quadratic, like δx s δx s ), while {b} collects the slow bulk observables {b, sb, bb}. The fast bulk observables are gathered in {ã}, which contains {a, sa, ba, aa}. Note that with this partitioning of observables we have allocated all fast (DNA) species to the bulk. This is different from the approach in [Rubin and Sollich, 2016] where some fast (enzyme) species were retained in the subnetwork. In our case one could similarly keep in the subnetwork those DNA species that produce subnetwork proteins, but it turns out that this makes the final elimination of fast variables rather intricate and so we do not pursue this option further. For our GRN equations, subnetwork and bulk protein species do not interact, so the blocks Ls ,b and Lb ,s are in fact zero. This restriction is not required for our treatment, however, and direct protein-protein interactions could be included in the formalism without modification. As in the case of the linearised dynamics, only the third column of (25) is fast, i.e. has entries proportional to γ (plus subleading terms of order unity arising from the time derivatives of slow-fast product observables such as ba). Again the memory function then contains fast and slow contributions, with the former being transformed into additional rate matrix terms when γ → ∞. The resulting rate matrix and slow memory functions can be obtained from an effective L-matrix involving only slow (linear and quadratic) observables. As before, this L eff can be viewed as arising from a QSS elimination of the fast observables. Finally, one can show that this explicit elimination of the fast observables is equivalent to directly expanding the original thermodynamic equations, here to quadratic order in concentration deviations from the steady state.
Except for the last step, this chain of argument is analogous to the linearised dynamics case, being based entirely on the identical structure of the partition -compare (10) and (25) -of L into fast and slow blocks. The final reduction to a quadratic expansion of the thermodynamic equations is more subtle because the projection approach treats quadratic observables as distinct and not directly tied to products of linear observables. We defer the details to Appendix A and note only that our derivation there both simplifies and generalises that in [Rubin and Sollich, 2016], from enzyme species that do not interact with each other to arbitrary interactions of the fast degrees of freedom.

Nonlinear projected equations
The projected equations for the nonlinear dynamics have a form analogous to (11) but now including the quadratic observables that we have retained, giving for the dynamics of the subnetwork protein concentrations the form The rate matrix entries and memory functions are calculated from the general formulae (12,13), applied to the L eff constructed as explained above. Because the subnetwork block S now contains linear {s} and quadratic observables {ss}, the matrices have the corresponding block structures. The coefficients Ω ji are collected in the block Ω s,s , for example, and the Ω (jk)i in the block Ω ss,s of the overall rate matrix Ω. The memory functions M ji (∆t) and M (jk)i (∆t) are similarly contained in blocks M s,s and M ss,s of the memory matrix M .

Results
We applied the Zwanzig-Mori projection method to a cross repressive motif system ( Fig. 2A) to exemplify the properties of memory functions. We decided to separate the system so that one species, g1Prot, is in the subnetwork and the other, g2Prot, is in the bulk. We expand the system into protein and DNA species, and proceed to place all DNA species along with g2Prot in the bulk. After applying the approach we describe in Methods we obtain memory functions in place of the bulk species. We have thus reduced the system to the dynamics of a single species with additional memory functions ( Fig. 2A). We are able to replicate the original thermodynamic dynamics quite closely with our reduced set of equations (Fig. 2C). Given that we have performed an expansion around a steady state, the method is more precise when close to the steady state as can be observed in Fig. 2D. Next we analyse the memory functions themselves, which can be thought of as setting the strength of a signal returning from the bulk a specific time after the original signal entered the bulk from the subnetwork (Fig. 2E). The amplitude of the leading linear memory function -where the signal is δx 1 (t ) -is positive. This makes sense as, in the cross repressive motif, a protein is inhibiting its own inhibitor, thus promoting its own production. The second order memory functions govern the effects of the signal δx 2 1 (t ). They are, in this case, negative and act as a correction that captures nonlinearities beyond the linear memory. We observe that the decay in time of memory functions is determined by the decay rate of the bulk protein, which makes intuitive sense as the memory function is a consequence of a protein repressing its own repressor. Hence, the effect only lasts as long as the repressor protein is sustained, which is determined by its decay rate. Overall, the memory functions provide compact and readable descriptions of how the bulk modifies and influences the activity of the subnetwork.

Biological example application
Reducing part of a network into memory functions offers a way to simplify and analyse the effect of the factors in this bulk part on the remaining subnetwork. To illustrate this approach we chose to analyse a four gene network involved in the embryonic patterning of the vertebrate neural tube [Dessaud et al., 2008, Cohen et al., 2013. The developing neural tube is a well characterised example of developmental pattern formation. In this tissue, the secreted molecule, Sonic-Hedgehog (Shh) forms a ventral to dorsal gradient [Roelink et al., 1995]. This in turn generates discrete domains of gene expression that define the progenitors of the distinct neuronal subtypes that comprise spinal cord circuitry (Fig. 3A,B). A transcriptional network has been identified that is controlled by graded Shh signaling and is responsible for specifying ventral progenitor domains [Briscoe et al., 2000, Cohen et al., 2013. Genetic and molecular experiments have determined the activity of the components of this network (Fig. 3C) and documented the temporal dynamics of pattern formation in vitro and in vivo [Cohen et al., 2013, Balaskas et al., 2012, Cohen et al., 2014. For the three most ventral domains of progenitors, four transcription factors are essential, Nkx2.2, Olig2, Pax6 and Irx3. The most ventral domain (p3 domain) is characterised by high Nkx2.2 and low levels of the other three proteins; the domain adjacent to this (pMN domain) has high levels of Olig2, medium levels of Pax6 and low levels of the other two proteins. Dorsal to this, the p2 domain expresses high levels of Pax6 and Irx3 and low levels of the other two proteins.  2). These equations were taken -along with appropriate initial conditions -from [Cohen et al., 2014]. For all plots where x axis represents neural tube position, zero corresponds to the most ventral point. (E) Full bifurcation diagram illustrating the multistable nature of the network. Shown are steady state concentrations of the four molecular species against neural tube position, with unstable steady states marked dashed. Colours in (D, E) identify genes/proteins in the same way as in the labelling of the illustration (B) and of the network nodes in (C). This colour code is used throughout the paper unless otherwise noted.  Position  Mathematical models have been formulated based on the experimental data and these are able to replicate key aspects of cell patterning [Balaskas et al., 2012, Cohen et al., 2014 (Fig. 3D). We take advantage of these to develop a Zwanzig-Mori projection of the system. To this end, we chose Nkx2.2 and Olig2 to be the subnetwork species, given that they are receiving direct input from Shh (Fig. 3C), and replaced Irx3 and Pax6 with memory functions.

Linear memory analysis
We first examined the properties of the linear memory functions. These are the most substantial contributions, close to the specific steady states, of the species in the bulk (Pax6 and Irx3) on the temporal changes in the activity of Nkx2.2 and Olig2. We note that the system is multistable (Fig. 3E) and different combinations of steady states are available at different positions along the dorsal ventral axis (Shh gradient). We therefore analysed each possible steady state along the neural tube. In this case, as the bulk species are mutual repressors with the subnetwork species, they form a positive feedback loop with the factors in the subnetwork. As a consequence, each of the linear memory functions is positive and exponentially decaying on the time-scale of protein degradation of the bulk species. Thus the relative contribution of each of the memory functions can be assessed by comparing their amplitude, defined as the memory function value at zero time difference.
We calculated the amplitudes of the memory functions and determined the memory effects that Nkx2.2 and Olig2 receive over time. Furthermore, we used the method developed in [Rubin et al., 2014] to decompose the memory into contributions from memory signals passing through the two different bulk species (Fig. 4A-D). This allowed us to determine the importance of specific regulatory interactions between transcription factors (TFs) at every position of the neural tube model, and assess their respective contributions. The results indicate that, for the most part, there is only a small memory amplitude of Nkx2.2 to the past of Olig2 passing through Pax6 ( Fig. 4B; note the logarithmic y-axis): the memory of (past) Olig2 on Nkx2.2 is largely dominated by memory through Irx3. The only exception is for steady states that are not reached during normal neural tube patterning (lower pair of curves in the neural tube position range ≈ [0.2 . . . 0.45] in Fig. 4B). These steady states exist because the system is multistable, but the initial conditions that lead to them are incompatible with physiological conditions. We additionally observe in Fig. 4A that the memory of Nkx2.2 to itself is for the most part shared between Irx3 and Pax6, with no particular TF being dominant. There are again exceptions from this, but only in steady states that are biologically unreasonable. In the case of the memory of Olig2, it can only be influenced by Irx3 as is clear from the structure of the network in Fig. 3C. Channel decomposition is therefore unnecessary. We observe that the memory effects of Olig2 to itself are strongest in the p3 domain, and the opposite is true (memory of Nkx2.2 on Olig2 is stronger) in the pMN domain We observe in Fig. 3 that the order of magnitude of the memory amplitudes changes substantially with neural tube position. This is a consequence of the system becoming more or less sensitive to concentration fluctuations of a given species within the network. An example of this is Nkx2.2 memory to itself via Pax6 in the pMN domain (Fig. 4A), where dorsally the lower levels of Shh mean that the binding sites of Nkx2.2 are less occupied by the active form of Gli (Gli is the transcriptional effector downstream of Shh that binds to Nkx2.2 & Olig2, see [Cohen et al., 2014] for details on how this is implemented). Active Gli sets the rate of production of Nkx2.2 and thus has a direct effect on the amplitude of the memory functions. In addition to this, as one moves dorsally in the pMN domain, the steady state concentration of Pax6 is increasing (Fig. 3E, also observed experimentally). This increase causes the system to become insensitive to fluctuations of Pax6 proteins as Pax6 binding sites are typically already saturated. This phenomenon is at work also in the other memory function amplitudes that change across neural tube position as seen in Fig. 4A-D.

Nonlinear memory analysis
We next examined the second order memory functions. These provide corrections to the linear memory terms and encode information about how the bulk nodes drive the system towards a steady state when it starts further away from such a state.
As a first observation, it is apparent that once memory terms are included, different trajectories cross. This is visually most obvious with the full nonlinear memory terms (Fig. 5C,G) and consistent with the full dynamics (Fig. 5D,H). The crossings arise from the fact that, in the presence of memory, systems in the same state can follow different trajectories towards a steady state depending on their past. In the full dynamics, the additional information as to which path a system will take is contained in the bulk species, which effectively hold the information on the past of the system. In the case of the memory functions, the memory is not contained in network nodes outside of the subnetwork, but is represented explicitly via the memory terms, with the time difference dependence of the memory functions indicating, for example, the timescale of the memory. This offers a useful abstraction: while it is not easy to visualize the fourdimensional concentration space for the full dynamics, the projection approach reduces the system to two components with memory of their own past. This could be useful experimentally, as it is often only possible to assay a subset of the components of a system simultaneously.
From our previous analyses we saw from the linear memory functions that the link from Olig2 to Pax6 (see Fig. 3C), forming part of the "Pax6" channel for memory of (past) Olig2 on Nkx2.2, made only a very minor contribution to this memory, for all biologically relevant steady states. In the nonlinear memory functions, we can assess the importance of the link from Olig2 to Pax6 by considering the squared deviations of Olig2: the Pax6 channel contribution to this memory necessarily involves signals being propagated via this link. We find that the Pax6 channel makes a very small contribution to the nonlinear memory amplitude in comparison to the Irx3 channel. This is shown in Fig. 4E for the p3 domain but is true also for all other states. We additionally checked the time-dependence of the memory function: from Fig. 4F one sees that the relative contribution of the Pax6 channel to the nonlinear memory of (past) Olig2 squared on Nkx2.2 remains small at all time-differences. Together with our results above for linear memory functions, we can thus conclude that the interaction by which Olig2 represses Pax6 is dispensable for sustaining dorso-ventral patterning. We plot an example of nonlinear memory terms (Fig. 4G-H) to illustrate how considering the temporal dynamics of memory terms can reveal non-monotonicities and sign changes that mean memory amplitudes do not tell the full story of the effect of nonlinear memory. This plot offers insight into the specific contributions of the nonlinear memory terms to the dynamics leading to the p2 domain's state, which we desribe below.
We next address how the inclusion of nonlinear memory functions in the projected equations affects the predictions for the dynamics. An example of this can be seen in Fig. 5A-D where the system is approaching a high Olig2 state. This illustrates the situation in the pMN domain in Fig. 3D  diate levels of Shh signalling. We show in Fig. 5A the behaviour of the system without memory, where the time evolution depends exclusively on the instantaneous concentrations of Nkx2.2 and Olig2. We can then determine how much the behaviour changes with the addition of linear and nonlinear memory, which capture the contributions from the bulk species (Fig. 5B,C) and so give better approximations to the full network dynamics (Fig. 5D). Specifically, we observe that the memory leads to Nkx2.2 reducing at a faster rate than Olig2 increases. A comparison of Figs. 5A and 5B shows that the linear memory encapsulates some of this trend. This is then further emphasised by the nonlinear memory, which captures additional bulk effects and brings the predictions closer to the qualitative behaviour seen in the full dynamics (Fig. 5D). As expected, the effects of including nonlinear memory increase as we consider the behaviour of the system further from its steady state.
The effects of including the memory terms can also be visualized by considering the drift vector, which contains the rates of change of the concentrations of Nkx2.2 and Olig2. Without memory, this only depends on the current concentrations. The system will pass quickly through regions where the drift is high and spend most of time in regions where it is low. Once we include memory in the description of the dynamics of the subnetwork, the drift vector depends on the entire history of the system. But for small times, i.e. when the system has not moved too far from its initial condition, we can still express the drift in terms of the current concentrations as explained in Appendix B. The colour maps in Figs. S1A-C show the norm of the resulting effective drift at time 0.8. These confirm the effects seen in the trajectory plots: inclusion of the memory terms causes the low-drift region to shift to lower Nkx2.2 concentrations, i.e. the system will more rapidly reduce Nkx2.2 and then spend more time increasing Olig2 at small Nkx2.2.
We do not show here similar plots for the approach to high Nkx2.2-steady states, which occur at high levels of Shh input (p3 domain). Here the dynamics are dominated by Nkx2.2, so memory effects are generally weak. Accordingly we find that even leaving out all memory terms from the projected equations gives a reasonable account of the full dynamics.
For lower Shh signalling levels, in the p2 domain, memory effects do contribute (Fig. 5E-H). The steady state here is one with low Nkx2.2 and low Olig2 concentrations. The memoryless and linear memory predictions for trajectories are quite close, with levels of Nkx2.2 and Olig2 decreasing on similar timescales towards the steady state. Nonetheless, the plots showing the norm of the effective drift indicate the trend that memory pushes the system more quickly towards low Nkx2.2 (Fig. S1D,E). Including the nonlinear memory then further enhances this tendency: similarly to the full thermodynamic model simulations, the levels of Nkx2.2 drop more rapidly than Olig2 on the approach to the steady state (Fig. 5G,H & Fig. S1F). The behaviour is not entirely that of the full thermodynamic model, yet as before, the nonlinear memory effects contribute qualitative information about the behaviour of the system further away from the steady state.
We can gain insight into what drives the distinct dynamics of Fig. 5C,G by analysing the respective nonlinear memory functions, specifically their amplitude (value at ∆t = 0) as shown in Fig. 6A-D. We note that in most cases the memory contributions arise via Irx3 (In the case of memory of Olig2 this is the only possibility in any case.). A clear exception to this is the case of nonlinear Nkx2.2 memory to itself (Fig. 6A) where the memory is dominated ventrally by the Pax6 channel and dorsally by the Irx3 channel. These memory functions also change with time difference, but it turns out that they decay on similar timescales so that the memory amplitude gives a reasonable characterization of their strength. We observe that Olig2 does not produce memory effects that are substantial enough to change the dynamics (Fig. 6B,D): the dominant nonlinear memory effects come from Nkx2.2 (Fig. 6A,C). We see further that Nkx2.2 represses both itself (negative nonlinear memory amplitude, Fig. 6A) and, even more strongly, Olig2 (Fig. 6C). These two terms combined mean that at high levels of Nkx2.2, the memory functions will impede Olig2 from increasing as quickly as it would in cases without nonlinear memory, while simultaneously decreasing the concentration of Nkx2.2. Both effects are consistent with what we observe in Fig. 5C, D.
Understanding the effect of nonlinear memory terms in Fig. 5G requires a more detailed analysis of the memory effects. The memory amplitudes show that Nkx2.2 is not subject to substantial memory effects (Fig. 6E,F) and the largest memory effects act on Olig2. (Fig. 6G,H). At first glance, the latter memory amplitudes suggests that Nkx2.2 substantially represses Olig2 while Olig2 activates itself to a lesser degree. However, on closer inspection of the temporal dependence of the memory functions, the repression by Nkx2.2 is a short pulse that for larger ∆t turns into an activation of Olig2 (Fig. 5F). Meanwhile, the Olig2 self-activation is a sustained signal that lasts much longer than the repression by Nkx2.2. When taken together, these two memory terms (combined with cross terms, not shown) promote activation of Olig2 from high levels of Nkx2.2 even after a substantial time difference has elapsed, of  the order of the decay time of the linear memory functions. This is consistent with what we observe in Fig. 5G,H: the initial high levels of Nkx2.2 lead to Olig2 levels being sustained via the memory, while Nkx2.2 decays in a way almost unaffected by memory. Eventually Olig2 also decays, once enough time has passed for the nonlinear memory effects to fade away.

Exploration of network properties
The analysis of both linear and nonlinear memory functions in the previous section suggested that the repression of Pax6 by Olig2 is not critical for the dynamics of the steady states observed in neural tube patterning. We based this on the observation that the amplitude of memory that is transmitted through this link is relatively small compared to the other memory functions, for all biologically relevant steady states along the neural tube.
To test this prediction of the memory function analysis, we removed the repressive link from Olig2 to Pax6 in the full model. Consistent with our prediction of the relative insignificance of the link from Olig2 to Pax6, the resulting steady state concentrations of the simulations are similar to those in the original system (Fig. 7A). The only qualitative change is the increased level of Pax6 between neural tube position 0.2 and 0.7, which results from the loss of the repressive link. But the spatial aspects, such as the positional sequence of genes and qualitative concentrations, of the domains are otherwise unaltered.
Since the repressive link between Olig2 and Pax6 has been experimentally documented [Zhou andAnderson, 2002, Balaskas et al., 2012] but from our analysis is not required for the dynamics around the steady state, we next sought to understand what purpose it might serve during development. We performed simulations from varying initial conditions for both WT (Fig. 7B) and the system lacking the link (Fig. 7C). These show that the p3 domain is reached from only a small range of initial conditions following the removal of the repressive link between Olig2 and Pax6. To rationalise this, we performed a bifurcation analysis (Fig. 7D). This indicated that the removal of the Olig2 inhibition of Pax6 markedly increased the range of positions at which the Olig2 steady state is present, thus facilitating its invasion into the Nkx2.2 domain. In particular, under initial conditions of high Pax6 and Irx3, the pMN domain is induced ventrally. In the unperturbed network it is not possible to reach this steady state in the ventralmost part, because the system is monostable, allowing only the p3 domain to arise. On the other hand, in the absence of Olig2 inhibition of Pax6, the ventral region is bistable. Hence, depending on the initial conditions a pMN fate can be reached (Fig. 7D). Taken together, this analysis suggests that the repressive link from Olig2 to Pax6 contributes to the robustness of the system for patterning the neural tube.

Discussion
We have developed a generally applicable method for applying Zwanzig-Mori projections to transcriptional networks that allows the analysis of subnetworks extracted from a larger network. We demonstrate the approach on a simple genetic cross repressive motif and on a more complicated, but well-characterised, transcriptional network operating in the ventral part of the vertebrate neural tube. This showed how the method allows the function and importance of specific links within a network to be defined in an intuitive manner. We use these insights to identify structural features of the neural tube network that appear primarily to increase the robustness of the network to initial conditions, rather than maintain its steady state.
To apply the projection method, we took advantage of an argument of timescale separation for molecular mechanisms operating in transcriptional networks, in which fast processes (DNA-protein interactions) are assumed to be in QSS compared to the slow processes (changes in protein concentration). We also assume the effective concentration of DNA species to be small in comparison to that of protein species. We consider these to be reasonable assumptions because DNA-protein interactions occur at a much faster rate than the changes in protein concentrations produced by transcription and translation [Garcia et al., 2010]. Moreover, there are usually less than four DNA copies of each gene per cell, whereas the number of individual protein molecules is normally considerably higher. The production of individual proteins per DNA copy number, i.e. the relevant rate constant, can then be large while the resulting relative changes to protein concentration are slow. These assumptions allowed us to construct an extended reaction network involving fast molecular (DNA) species and at most binary reactions. This construction is sufficiently general for it to be applicable to other systems with a timescale separation, as long as they satisfy the above conditions for the relevant concentrations and rate constants. We illustrated how one can apply projection methods to the extended network while retaining the influence of the bulk network in which it is embedded. Quantitative comparisons of the projected equations with simulations of the full system confirmed the accuracy of the approach.
The projection technique can be applied to any GRN described using the thermodynamic formalism and we developed a simple prescription to formulate memory functions directly from the original GRN equations. This is suitable for any type of DNA-protein conformation, including regulatory interactions representing activation, repression, competitive and cooperative binding. Moreover, further protein-protein interactions could readily be incorporated into a system as these would be represented by simple first order (unary) or second order (binary) interactions. Thus, in addition to gene regulation, nonlinear protein-protein interactions mediated by enzymes such as those found in signal transduction pathways could be incorporated into the analysis [Rubin and Sollich, 2016]. It would also be possible to include Hill functions as long as these possess integer exponents [Thomas et al., 2012]. The size of the matrices involved in constructing the memory functions scales as the square of the number of bulk species, which in our experience means that networks with up to 200 nodes can be investigated without difficulty. Thus, the generality of the method and the ability to implement it algorithmically provide a comprehensive mathematical toolkit to simplify and analyse dynamical systems describing a range of cellular and molecular processes.
A feature of the Zwanzig-Mori projection method is that the memory functions describe all the behaviours of a system in the vicinity of a steady state. This means that there is no need to perform simulations to analyse memory functions or to test different positions in phase space as a steady state is approached. Moreover, by analysing the memory functions, the amplitudes and timescales of the interactions between the network and bulk can be explored. These can reveal information that is not otherwise easily discernible from the parameters of the full dynamics or from the functional form of the equations. Each memory function represents the total of the contributions from a given subnetwork component that feeds back to the subnetwork over time after passing through the bulk. This total memory can be decomposed into channels that describe the flow of signals through specific components in the bulk. In this way, the channels that dominate a memory effect can be identified, providing insight into the dynamical mechanism responsible for achieving and maintaining a steady state.
An example of the type of insight provided by the projection method came from our analysis of the neural tube network. We focused on a subnetwork comprising Nkx2.2 and Olig2. Decomposition of the memory functions suggested that an experimentally documented repression of Pax6 by Olig2 [Zhou and Anderson, 2002] plays only a minor role in sustaining the steady state pattern. Consistent with this prediction, simulations in which this regulatory link had been eliminated in the full model showed qualitatively unchanged steady states. This raised the question of what purpose the regulation might serve and prompted us to explore the system dynamics away from the steady state. Simulations in the absence of the Olig2-Pax6 repression, compared to the full model, revealed a marked increase in sensitivity to initial conditions. Comparison of the bifurcation diagrams of the systems, with and without the Pax6-Olig2 link, indicated that the well-defined Nkx2.2 monostable region (p3 domain)which appears at high levels of Shh signalling in the full system -was replaced by a region of bistability for Nkx2.2 and Olig2 in the absence of the Olig2-Pax6 regulation. In this latter case the steady state induced by high levels of signal therefore depends on the initial concentrations of Pax6 and Irx3. This suggests that, while not necessary in order to maintain steady states, the Olig2-Pax6 regulation ensures access to the appropriate steady states irrespective of the initial levels of Pax6 and Irx3. As these may differ between cells or at different locations along the neural tube, the Olig2-Pax6 regulatory link might make the system less sensitive to these variations. This implies that the primary purpose of Olig2 repression of Pax6 is to increase the robustness of pattern formation.
While the use of steady states to construct projections provides a straightforward way to formulate memory functions, it also highlights a limitation of this Zwanzig-Mori method. Predictions become increasingly inaccurate the further the system is from the fixed point under consideration. Inclusion of higher order non-linear terms in the memory functions increases the accuracy. Nevertheless, there remains a limit beyond which memory functions, expanded around a fixed point, no longer provide an accurate approximation of the behaviour of the system. This is particularly evident if dynamical transients are important for the function of the system. Alternative implementations of the Zwanzig-Mori projection [Zwanzig, 1980, Chorin et al., 2000 have been developed that do not rely on expanding around a fixed steady state and we are currently adapting these for transcriptional networks. While these alternative methods may provide a solution to the specific limitations of our current Zwanzig-Mori projection method, the nonlinear memory functions they produce are much richer. Further work will be needed to fully understand the more complex information they encode.
One way to view projection methods is as dimensionality reduction techniques. From this perspective, they provide a principled way to visualise the behaviour of selected factors in multicomponent transcriptional networks. Tools for this purpose are likely to become increasingly important as larger networks are deciphered and analysed. Employing Zwanzig-Mori projections to reduce the four-dimensional neural tube network to two components -Nkx2.2 and Olig2 -provided a demonstration of this. With this approach, we were able to plot 2D phase portraits indicating how the concentrations of the two factors change as they approach steady state. Notably in these visualisations, individual trajectories of Nkx2.2 or Olig2 cross one another. Such intersections would not occur in the original four dimensions of the system, but, when projected into two dimensions the individual trajectories are determined not only by the present state of the system but also the past state. This is a consequence of the trajectories incorporating the memory terms that capture the effect of the bulk factors on the subnetwork. Furthermore, using our analysis, we determined which interactions in the bulk are responsible for the trajectories crossing in the phase portraits. In this way, the trajectories show how behaviour is determined not only by the local interactions within the subnetwork, but also indirectly by interactions with the surrounding network. This analysis is facilitated by being able to explicitly interrogate the dependence of memory functions on time difference, which gives a quantitative understanding of how past states of the network affect present ones.
Projection methods might also provide a way to analyse systems for which only partial information is available. Incorporating into a model of a subnetwork memory functions that modify the response of the known components so that they more accurately produce experimentally measured behaviour would provide a means to define the function of unknown factors. This could then be used to identify plausible network structures that generate these memory functions. Taken together therefore, the projection approach provides tools to simplify, visualise and explore the behaviour of large networks that would otherwise be difficult to analyse in their entirety.
A Appendix: Construction of L eff for projection with quadratic observables

A.1 Setup
In this appendix we discuss how the matrix L eff for projection with linear and quadratic observables can be obtained. It is derived by taking a fast rate (γ → ∞) limit of the projection results for an expanded network where concentrations evolve in time according to These are equations (1, 6) from the main text, or more generically (19), but we have grouped together the subnetwork and bulk concentration vectors x s and x b into a single vector of slow variables x l to keep the notation for the following discussion compact. The vector x a contains the fast variables, which in the GRN context are concentrations of DNA conformations, while γ is a fast rate parameter as before.
In the limit of large γ, the fast variables are always in QSS with the slow ones so that the expanded network reduces to the "thermodynamic" (in the GRN case) dynamics where the second equation implicitly determines x a as a function of x l . In the main text we marked this QSS value of x a with an asterisk; we omit this here for notational simplicity.

A.1.1 Fast rate limit from projection
In the projection approach, one expands around a fixed point to second order to write the equations of motion as ∂ t x lT = x lT L ll + x aT L al + x llT L ll,l + x laT L la,l + x aaT L aa,l (A.5) γ −1 ∂ t x aT = x lT L la + x aT L aa + x llT L ll,a + x laT L la,a + x aaT L aa,a (A.6) All x appearing here and below are deviations δx from steady state; we drop the δ to lighten the notation. The x ll etc are product variables -we assume the indices are ordered to avoid duplicate observablesand the L matrices contain the appropriate derivatives of the "drift" functions R at the fixed point, e.g. L ll,a has elements L ll ,a = ∂ x l ∂ x l R a for l < l and L ll,a = (1/2)∂ x l ∂ x l R a when the two indices are equal. From the above equations then follow the evolution equations for the slow product variables x ll and the fast products x la and x aa ; see (A.12, A.17) below. From the product rule these equations only involve product variables on the r.h.s.; third order terms are in principle present but discarded within the second order expansion. Collecting all variables into a vector z that concatenates x l , x ll , x a , x la , x aa gives the equation of motion in the form ∂ t z T = z T L, where the block form of L can be found in (25). As explained in the main text based on the reasoning in [Rubin and Sollich, 2016], one can write down expressions for the rate matrix and memory functions from the full matrix L and then take the fast rate limit γ → ∞. The result is that the resulting rate matrix and (slow) memory function can be found from a matrix L eff describing only the dynamics of the slow variables x l and x ll . This L eff is obtained by solving the QSS conditions for the fast variables x a , x la and x aa . These conditions are, within the projection approach, linear equations because product observables are treated as independent from linear observables.

A.1.2 Statement of the equivalence
In this Appendix we show that L eff can equivalently be obtained directly, by expanding the slow equations to quadratic order in x l . This is convenient in cases where these slow equations are the starting point of the analysis anyway and the fast species are necessary only to bring the system into the form of a network of unary and binary reactions to which the projection method of [Rubin et al., 2014] can be applied.

A.2.1 Direct expansion of slow equations
In the direct expansion one writes the dynamical equations as where the "circle" product indicates the actual products of the regular linear observables, with the same index ordering as in the projection approach (so that all the L-matrices are as before). One now needs to determine x a by setting the r.h.s. of (A.8) to zero, and substitute the result into (A.7). As we are only expanding to second order in x l , it is enough also to obtain x a to this order. Starting with the first order of (A.8) one obtains where x a 0 will be a convenient shorthand. (Note that the coefficient matrix −L la (L aa ) −1 is the one we worked out in the main text below (24), in a slightly more pedestrian fashion.) All second order terms in (A.8) can now be evaluated to the required accuracy by replacing x a by x a 0 . Solving for x a then gives x aT = x a 0 T − (x l • x l ) T L ll,a + (x l • x a 0 ) T L la,a + (x a 0 • x a 0 ) T L aa,a (L aa ) −1 (A.10) Inserting into (A.7) and dropping terms of higher than quadratic order gives the required expansion of the slow equation, ∂ t x lT = x lT L ll + x a 0 T L al + (x l • x l ) T L ll,l + (x l • x a 0 ) T L la,l + (x a 0 • x a 0 ) T L aa,l − (x l • x l ) T L ll,a + (x l • x a 0 ) T L la,a + (x a 0 • x a 0 ) T L aa,a (L aa ) −1 L al (A.11)

A.2.2 Projection approach
The projection approach has the same starting point as the direct expansion, with (A.5,A.6) being equivalent to (A.7,A.8). The difference is that we have separate conditions from which to eliminate the fast product variables, via the QSS requirement for these variables. So what we need to demonstrate is that this distinct elimination assigns to x la and x aa the same values as the direct approach, namely (x l • x a 0 ) and (x a 0 • x a 0 ). Once this is shown, it follows that the linear fast variables x a are eliminated in the same way in the two approaches, because the same quadratic fast variables are substituted into the relevant equations (A.6,A.8). Thus all fast variables are eliminated in the same way from the equation of motion for the slow variables, to the quadratic order we consider here, hence proving the equivalence.

A.2.3 Elimination of x la
Using the product rule of differentiation, the equations of motion for the x la are ∂ t x laT = γ((x lT • x lT L la )) + γ((x lT • x aT L aa )) + ((x lT L ll • x aT )) + ((x aT L al • x aT )) (A.12) Here the double brackets on the right indicate that after the circle products are taken the real products have to be replaced by product variables, to remain within the projection framework. We want to eliminate the x al from the condition that the r.h.s. vanishes. Fortunately for large γ the first two terms, which stem from the time evolution of x a , dominate; the last two become negligible in comparison. Thus one has to solve ((x lT • x lT L la )) + ((x lT • x aT L aa )) = 0 (A.13) From the structure of this one sees that the x l only act as "spectators", while considering the second factors one has to solve the same equation as at linear order. The solution is therefore expected to be x laT = ((x lT • x a 0 T )) as we want to show. To see this in more detail we write out (A.13) in components: l x (ll ) L l a + a x la L a a = 0 (A.14) Here we have written x (ll ) to indicate that the indices are to be taken as ordered, i.e. x (ll ) = x ll if l ≤ l and x (ll ) = x l l otherwise. The proposed solution is We proceed again using the product rule of differentiation, which gives as the equations of motion for the x aa ∂ t x aaT = γ((x lT L la • x aT )) + γ((x aT L aa • x aT )) + γ((x aT • x lT L la )) + γ((x aT • x aT L aa )) (A.17) Here all terms contribute for large γ, but one sees that in the first two the right factor of x a is again a "spectator" and similarly with the left factor for the last two terms. Accordingly one can show that the proposed solution, which is x aaT = ((x a 0 T • x a 0 T )), ensures that each pair of terms vanishes separately. For explicit calculation it is again useful to write out components. The aa component of the first two terms of (A.17), without the overall factor of γ, reads x (l1l2) L l1a1 (L aa ) −1 a1a L l2a2 (L aa ) −1 a2a L a a (A.20) The last factors in the second term again cancel, reducing it to l1l2a1 x (l1l2) L l1a1 (L aa ) −1 a1a L l2a (A.21) After a relabelling of summation indices this is identical to the first term. This means that (A.18) vanishes, i.e. the first two terms on the r.h.s. of (A.17) cancel. Similarly the last two terms vanish, showing that (A.19) is the correct QSS assignment of the x aa .

A.2.5 Equations of motion for x ll
Above we have shown that the direct and the projection elimination procedures give the same equation of motion for x l . The same can then also be checked straightforwardly for the product variables (x l • x l ) and their projection analogues x ll . By analogy with (A.12), the latter evolve in time according to ∂ t x llT = ((x lT • x lT L ll )) + ((x lT • x aT L al )) + ((x lT L ll • x lT )) + ((x aT L al • x lT )) (A.22) The real products obey the same equation, just written differently: ∂ t (x l • x l ) T = (x lT • x lT L ll ) + (x lT • x aT L al ) + (x lT L ll • x lT ) + (x aT L al • x lT ) (A.23) From both, the fast products (x l • x a ) (respectively x la ) and (x a • x a ) (respectively x aa ) then need to be eliminated. As we have already established that these eliminations are identical, also the resulting equations for the ll-product variables must be identical.

B Appendix: Calculation of effective drift in the presence of memory B.1 First order time expansion
It is useful to be able to visualize the effects of memory terms in the projected equations in terms of an effective drift. This is possible at least perturbatively for short times, as we now show. We illustrate the method for the linearized dynamics, where the projected equation (11) without the random force reads in vector form: For small times t, the memory function can be treated as approximately constant, so that obtain in an expansion to first order in t: On the right hand side one has a function of the current concentrations only, as announced. Given that this is a linear expansion in t we refrain from exploring the system too far away from t = 0. Higher order expansions in t could be performed yet the first order is enough to demonstrate the qualitative contributions of the memory terms. We illustrate this in Fig