Circuits with broken fibration symmetries perform core logic computations in biological networks

We show that logic computational circuits in gene regulatory networks arise from a fibration symmetry breaking in the network structure. From this idea we implement a constructive procedure that reveals a hierarchy of genetic circuits, ubiquitous across species, that are surprising analogues to the emblematic circuits of solid-state electronics: starting from the transistor and progressing to ring oscillators, current-mirror circuits to toggle switches and flip-flops. These canonical variants serve fundamental operations of synchronization and clocks (in their symmetric states) and memory storage (in their broken symmetry states). These conclusions introduce a theoretically principled strategy to search for computational building blocks in biological networks, and present a systematic route to design synthetic biological circuits.

In what follows, we analyze in detail the FFL. First, we present the analytical solution for the discrete time model of the FFL, where we show that the FFL does not synchronize nor oscillates. After that, we reach the same conclusion by using a continuous variable approach.
Finally, we show the discrete time solution with the OR gate. Solutions for the FFL have been considered in the literature. Here we adapt those results to the particular models used in our studies to perform consistent comparisons with the solutions of the fiber dynamics obtained through the paper.
A. FFL discrete time model Starting from Eq. (1) in the main text, we define rescaled variables ψ t = y t /k y and ζ t = z t /k y , we rewrite Eq. (1) as where we set β = (1 − α), η = γ x /αk y , and λ = γ x γ y /αk y . Equation (1) defines an iterative map ψ t+1 = f (ψ t ) which provides a solution A closed form for ψ t depends on the value of x. For the sake of simplicity, consider x t = x constant in time. If x < k x , the solution is simple, and always decays as ψ t = ψ 0 e −t/τ , where we choose to write β t = e −t/τ such as τ −1 = − log(1 − α). On the other hand, if x > k x , the iterative map is f (ψ t ) = βψ t + η, leading to a solution that converges to η as ψ t = ψ 0 e −t/τ + η 1 − e −t/τ . Therefore, the solution to ψ t is given by: Similarly, the solution for ζ t depends on x, but it also depends on ψ 0 and η. For x < k x , it always decays to zero as ζ t = ζ 0 e −t/τ . When x > k x , the variable ζ t follows different behaviors.
From this discussion, we find that the rescaled variables ψ t and ζ t do not synchronize.
That is, they do not reach the same value at their fixed points: ψ t = ζ t when t → ∞, unless we use a specific set of parameters. Moreover, ψ t and ζ t also do not reach oscillatory states.
The same conclusion is extended to the original variables y t and z t .

B. FFL ODE
In order to show that our results presented in the main text are consistent with a continuous variable approach, now we focus our attention on the modelling of the FFL [4,20,21] by using ordinary differential equations (ODE). First, we write the ODE governing the dynamics of expression levels y(t) and z(t): For the sake of simplicity, we consider x(t) = x constant in time.
For the case of x < k x , Eqs. (6) become a set of homogeneous ODEs with solutions that decay exponentially: Where, ψ 0 and ζ 0 are the initial conditions for the rescaled functions.
When x > k x , we find Therefore, the solution for ψ(t) converges to η as t → ∞, similar to the solution of the discrete time equation presented in the main text.
On the other hand, the solution for ζ(t) also depends the values of ψ 0 and η. By carrying on a calculation similar to the one presented on the main text for the discrete time case, one finds: for t > t 1 , Here, ζ 1 = ζ(t)| t=t 1 .
Clearly, from the above solutions, the expression levels from genes Y and X do not synchronize, since y(t) = z(t) as t → ∞. In addition, y(t) and z(t) also do not reach oscillatory states, in accordance with the results of the discrete time model.

C. FFL with OR gate
We present the solution for the FFL [4,20,21] with an OR gate. We consider the coherent version cFFL where all regulations are activators. The discrete-time dynamics of expression levels y t and z t of genes Y and Z in the cFFL with a Boolean OR gate are given by the following pair of difference equations: where α, γ y , and γ z have the same definition as in the main text. By adopting the same rescaled variables, ψ t = y t /k y and ζ t = z t /k y , we rewrite Eq. (10) as: where we have β = (1 − α), η = γ x /αk y , λ x = γ x /αk y , and λ y = γ y /αk y . Again, without lack of generality, we consider x t = x constant in time. Clearly, the solution for ψ t is not affected by the OR gate, therefore, it depends only on the value of x and is given by: Again, τ = − log(1 − α). Thus, if x < k x , ψ t exponentially decays to zero. On the contrary, if x > k x , ψ t converges to η. The solution for ζ t displays different solutions, depending on the combination of x and the initial value ψ 0 .
For the case of ψ 0 > 1 and η > 1, the rescaled variable ζ t always converges to λ x + λ y as ζ t = ζ 0 e −t/τ + (λ x + λ y ) 1 − e −t/τ . However, if η < 1, this solution ceases to be valid at t * = τ log ((η − ψ 0 ) / (η − 1)) . Therefore, for this case ζ t converges to λ x as From the discussion above, it is clear that the rescaled variables ψ t and ζ t neither synchronize nor oscillate. Extending this results to the original variables y t and z t , we conclude that the expression levels of the genes Y and Z also do not synchronize: y t = z t when t → ∞.
Note that the Heaviside function θ(y t − k y ) represents the activator feedback on the autoregulation of the Y gene. We consider an AND gate for the interactions [4]. Analogous results can be obtained for OR gates or with an first-order ODE model. Writing down the set of equations for the rescaled variables ψ t = y t /k y and ζ t = z t /k y , we get: Here, we made use of λ = γ x γ y /αk y and β = (1 − α). We note that, since the second term on the right-hand side of both equations are equal, the dynamical variables ψ t and ζ t must synchronize, as well as y t and z t . Again, considering x t = x constant, for x < k x , the solutions for ψ t and ζ t are trivial: both variables decay exponentially as ψ t = ψ 0 e −t/τ and . This behavior is shown by the red solid line in S1 Fig. S1B with ψ 0 = 0.9.
In terms of the iterative map, the dynamics of the SAT-FFF for the rescaled variable ψ t with x > k x is: so we find This iterative map ψ t = f (ψ t ) provides different solutions depending on ψ 0 . Similar to the case of x < k x , if ψ 0 < 1, the solution decays to zero as ψ t = ψ 0 e −t/τ . However, if ψ 0 > 1, there are two possibilities, depending on the values of λ = γ x γ y /αk y .
For λ < 1, ψ t approaches 1 at a time t * given by Then, for t > t * , the solutions decay to zero as ψ t = e −(t−t * )/τ and ζ t = ζ t * e 1(t−t * )/τ . This behavior is presented on S1 Fig λ = 0.9. The rescaled variables ψ t and ζ t always synchronize, so do y t and z t . This can be proved by finding the difference t = ψ t − ζ t . For all the cases discussed above, t decays exponentially fast as t = (ψ 0 − ζ 0 ) e −t/τ . Now, we can use the solution with x t constant to qualitatively understand the SAT-FFF in general. An example of the SAT-FFF with non-constant x t is depicted on Fig. 1F in the main text. As shown, variable y t and z t do synchronize but with no internal oscillations. We feed an external oscillatory pattern of x t as a square wave. For x t < k x , both y t an z t decay exponentially. When x t > k x , they tend to saturate at γ x γ y /α. The SAT-FFF synchronizes at a fixed point.

B. SAT-FFF ODE
Here, we consider the ODE model of SAT-FFF to confirm results presented in Section II A. The dynamics of gene X is driven by outside sources, so we only consider the dynamics of genes Y and Z which are described by equations: Taking ψ(t) = y(t)/k y , ζ(t) = z(t)/k y and δ = γ x γ y /k y we transform Eq. (18) to: Without loss of generality, we consider the case of then the solution of Eq. (19) is given by: where ψ 0 and ζ 0 are the initial conditions. Now, let's consider the case x > k x . Equation (19) then transforms into: Due to the existence of isomorphic input trees between both genes, ψ(t) and ζ(t) synchronize and therefore y(t) and z(t) synchronize also, so we only consider dynamics of the It's easy to see that for ψ 0 < 1, Eq. (20) will be the solution of Eq. (22). When ψ 0 > 1 and δ/α > 1, the solution is given by: In the case ψ 0 > 1 and δ/α < 1, the dynamics of ψ is split into two parts. One part before ψ decays to 1 and the other one after ψ crossed 1. The time when ψ(t) crosses 1 is equal to: and the dynamics can be written as: To summarize, the solution is: This solution is analogous to the one obtained for the discrete time model in Section II A.

A. UNSAT-FFF ODE
The UNSAT-FFF circuit can be reduced to study the base of the circuit since both genes Y and Z synchronize their behaviour as shown in the main text. The base of this circuit is a negative autorregulation loop (Fig. 3) plus an external regulator given by X. This circuit has been synthetically implemented by Stricker et al. in Ref. [18] using a promoter that drives the expression in the absence of LacI (and acts as a negative feedback loop) or in the presence of IPTG, which acts as an activator. It was shown experimentally that this circuit leads to oscillatory behaviour in the expression profiles. This result was corroborated with a dynamical ODE model in [18] which here we adapt to study the case of the UNSAT-FFF with ODE. See also the review paper [53] for further reading.
Following the same approach as above, we consider gene x(t) = x constant in time and larger than x > k x , and rescale the expression of genes y(t) and z(t) as ψ(t) = y(t)/k y and Since genes y(t) and z(t) synchronize their activities, then only one equation needs to be considered, ψ(t).
The key to observe oscillations in a first-order ODE is to consider the delay in the signal  (6) in Supplementary Information in Ref. [18]). We consider delays in the negative feedback loop which is the key feature to explain the experimentally observed robust oscillation in this circuit [18].
Delays in a biological circuit arise from the combined processes of intermediate steps like transcription, translation, folding, multimerization and binding to DNA. This series of biological processes are lumped into a single arrow between two genes in the network representation of the circuit. In reality this arrow represents processes that should be modeled in detail. These biological processes can be approximately taken into account by a delay in the interaction term in the dynamical equations. The interaction term can be written where τ represents the delay caused by the fact that the process of self-repression is not instant. Therefore, the dynamics of ψ(t) are described by a first-order delay-differential equation (DDE) [18] of the form: where τ represents the delay caused by expression process.
We derive analytical solutions to this equation following a procedure outlined in [54] (Chapter V). We start by noting that initial conditions used to solve a DDE are not given Let's consider Eq. (27) with initial function f 0 (t) for t ∈ [−τ, 0]. Then for t ∈ [0, τ ] Eq.
Moving the degradation term to the left and multiplying by e αt we get: Re-writing the left part, we obtain: and integrating on the interval t 0 , we get: Considering that ψ is continuous at 0 (ψ(0) = f 0 (0)) and ψ(t) for t ∈ [0, τ ] is given by then, following the same procedure, we can derive the general formula for finding the solution ψ(t) on the interval [kτ, (k + 1)τ ], assuming that the solution on the previous interval [(k − 1)τ, kτ ] is given by f k−1 (t). We then need to solve the following iterative equation: The solution of this equation can be found by applying the integrating factor method integrating on t kτ . We obtain: is equal to zero, we get a solution that decays exponentially to zero. Likewise, when the Heaviside function is equal to 1, we get a solution that exponentially grows to δ α . In other words, the solution will grow until θ(1 − ψ(t − τ )) changes to zero (i.e., when ψ(t − τ ) > 1) and will decay until θ(1 − ψ(t − τ )) changes to one (i.e., when ψ(t − τ ) will cross 1 again, but from different side). Therefore, we get oscillations consisting of two exponential pieces.
One period of the oscillation is shown in S1 Fig. S2B. The solution on this interval is given by: which is the predicted behavior. Additionally, we note that this circuit functions as a capacitor charging and discharging in an RC circuit.

B. Period-amplitude relationship
As shown in the main text, the solution of the discrete-time Boolean interaction model for λ > 1 oscillates in time, as well as the DDE considered in the previous section. Next, we show that this oscillation has a characteristic amplitude and period. First, to compute the amplitude of oscillations A ψ for the rescaled variable ψ t , we recall that the iterative map ψ = f (ψ) satisfies the recursive equation: Thus, the amplitude of oscillations A ψ is given by which implies that To find the period T of the oscillations, we recall from Eq. (6) in the main text that the solution for the minimum value of ψ, ψ min < 1, evolves to its maximum value ψ max in T − 1 iterations as ψ max = e −(T −1)/τ ψ min + λ 1 − e −(T −1)/τ . Since ψ min = (1 − α)ψ max , due to the fact that ψ max > 1, we find where we used ψ max = 1. For example, using α = 0.2 and λ = 1.01, we find A ψ = 2.02 and T = 15, which agrees with the numerical simulation presented in S1 Fig. S3A.
Equation (39) allows to define a rescaled amplitude A = (λ − 1)/α, and a reduced period which corresponds to the period-amplitude relationship of the UNSAT-FFF. A plot of this relationship is shown in S1 Fig. S3B, where we plot (T − 1)/τ as a function of Coming back to the original variable y t = k y ψ t , we have that the amplitude of oscillations of y t , A = k y A ψ , is given by: S1 Figure S3C shows the period of oscillations T as a function λ for α = 0.60, α = 0.20, and α = 0.05.

C. UNSAT-FFF clock functionality
The clock functionality of the UNSAT-FFF can be understood by analyzing its response function, i.e. the relation between oscillations at the input and at the output of the circuit.
The amplitude A y , and period T of the oscillations are not independent like in the harmonic oscillator, but are related through a 'period-amplitude' relation expressed by Eq. (40) and S1 Fig. S3B. From Eq. (42), for α sufficiently small, which constrains the 'clock' (T ) of the circuit to the power (A y ). As a consequence, A y and T cannot be controlled arbitrarily, and this (A-T) constraint helps to stabilize the UNSAT-FFF response against disturbance in the input X. For example, for a given available power supply, the system is constrained to dissipate this power, and when the UNSAT-FFF oscillates, it is automatically set to operate on an extended time window (T large) at low amplitude A when a small expression level is required (A small) and vice-versa. Results for the clock functionality of the Fibonacci Fiber and n = 2 Fiber can be carried out in a similar manner. The idea is that Fibonacci fibers with longer and longer loops can carry more robust oscillatory patterns than simple autoregulation negative feedback loops.

IV. EXAMPLES OF SYMMETRY AND BROKEN SYMMETRY CIRCUITS
A. Symmetry circuits (fibers) from [24] Our findings show that simple sub-graphs ubiquitous on gene regulatory networks are analogous to symmetric electronic circuits which can work as clocks, revealing a hierarchy of symmetry circuits. Here, we describe these symmetric circuits in more detail following [24].
Supplementary File 1 presents the full list of circuits found across the regulatory networks of A. thaliana, M. tuberculosis, B. subtilis, E. coli, salmonella, yeast, mouse and humans.
Below we enumerate the set of symmetric fibers found in Ref. [24] in the genetic networks of these species.

Repressor regulator link (stub)
We start by an isolate repressor link. As depicted in Fig. 3A, this repressor regulator alone does not form an input tree, neither it forms a base. Since it works as a transistor, its logic representation is a NOT gate.

Repressor AR loop: |1, 0
In Fig. 3B, we show the repressor autoregulation loop. When a repressor link is found as an AR loop, we have a genetic network with one single loop (n = 1) and no external regulator ( = 0), which we denote symbolically by |1, 0 . Such simple network has an input tree that feeds its own expression levels. Also, it is equivalent to its own base. By the analysis of the corresponding logical circuit, one can observe that it naturally oscillates, since it is a NOT gate that feeds itself.  Starting from a Fibonacci Fiber, the addition of a second autoregulation on gene X results in a symmetric input tree. Besides, the genetic network of the n = 2 Fiber collapses into a base with a single gene with two autoregulations (Fig. 3E). From the corresponding logic circuit, one can conclude that it is possible to achieve synchronization between genes X, Y and Z. In Fig. 3E we show examples of the n = 2 Fiber from the regulatory networks of B.
subtilis (the first with the pair of genes lexA and rocR, and the second with genes hprT, tilS and ftsH). This results in fast oscillations which, in the particular application to integrated circuits in digital electronics, are undesired. Then, in digital electronics, the input S = 0 and R = 0 is said to be forbidden, since the NAND gates set both Q = 1 and Q = 1, which violates the logical state Q = not Q. In Fig. 4, we use the NAND gates, but the use of a NOR gates leads to similar conclusions.

B. Symmetry breaking circuits
Two biological realizations of the AR symmetry breaking class are shown in Fig. 4, both from human regulatory networks with genes NFKB1 and HOXA9 (upper), and the regulatory network of genes IRF4 and BCL6 (bottom). Gene NFKB1 further regulates two genes, BST1 and HAX1 as its outputs, but this regulation does not affect the functionality of the flip-flop.

V. DESCRIPTION OF DATASETS
Datasets are described in S1 Table 1. We use a set of datasets of transcriptional regulatory networks found in the literature. All datasets are freely available from online sources. In Supplementary File 1 we present a plot of all found circuits. In the case of symmetric circuits, same colored nodes indicate the genes in the fiber. The external regulators are colored with different colors. We also present all the symmetry broken circuits across all species as indicated in the file. The statistics, count and Z-scores of the circuits are presented in Table 1 and 2 in the main text. The file with all circuits can be found at https://bit.ly/2YM5x3H.

VI. ALGORITHM TO FIND FIBERS
To obtain the set of nodes in the graph that belong to a fiber, we use the algorithm described in detail by Kamei and Cock [41] and developed in Ref. [24] to obtain the 'minimal balanced coloring' of the graph (referred as balanced coloring for simplicity), where we color the network by assigning a different color to each fiber.
To understand what balanced coloring means in the context of graph theory, we need to define the concept of input set (which is a part of the input tree). In a directed graph, the input set of a given node is the set of nodes with edges pointing into that node. Thus, the input set is the first layer of the input tree. Next, we define the Input Set Color Vector  Finding graph balanced coloring is equivalent to finding node sets with isomorphic input trees. A brief explanation of that is the following. If two nodes have the same ISCVs, nodes of their input sets have same colors. Thus these nodes are said to belong to the same equivalence class [41]. Inductively, the same can be said of the input set of the nodes in the input set. This recurrent relation implies that two nodes that have the same input sets will have isomorphic input trees and will belong to the same fiber (for rigorous proof see chapter 4 in [42]).
The algorithm to find balanced coloring was described in detail by Kamei and Cock [41].
In their algorithm, all nodes of the graph have the same initial color and, through a series of operations, they are recolored until balanced coloring is reached. The detailed description of the algorithm is as follows: 1. Initially, all nodes are assigned the same color. 4. Otherwise, if coloring is unbalanced, each unique vector is assigned a new color and the graph is recolored accordingly.

Steps 2-4 are repeated until condition in
Step 3 is satisfied.
For example consider the FFF graph in S1 Fig. S4A. Initially, we assign the same color, white, to all nodes. Then, we assign a 1-dimensional vector to each node which counts the in-degree of each node (S1 Fig. S4B). Since ISCVs of X and Y (which have the same color) of this vector is related to a new color, for example, red and green (see S1 Fig. S4D). Then, the network is recolored accordingly, as depicted in S1 Fig. S4C. At this step, ISCV(Y) and ISCV(Z) are the same, and different from the ISCV(X), and both have the same color, also different from the color of X, therefore balanced coloring is reached and the algorithm stops. We provide an implementation of this algorithm at https://github.com/makselab/ fiberCodes.

VII. ALGORITHM TO FIND BROKEN SYMMETRY CIRCUITS
To count the number of occurrences of broken symmetry circuits in a given graph we count the number of appearances of induced subgraphs as defined in Refs. [43][44][45]. Subgraphs and induced subgraphs are graph-theoretical concepts introduced in the social and computer sciences as applications to graph matching and pattern recognition [44]. We allow symmetry breaking to come from any gene in the circuit. Let's consider a graph G = {V, E}, where V is the set of nodes and E is the set of links. For instance in S1 Fig. S5A E} is a graph such that V ⊂ V and E ⊂ E [43]. For example S1 Fig. S5B is a subgraph of the graph in S1 Fig. S5A with An induced subgraph G = {V , E } of a graph G = {V, E} is a subgraph with a set of nodes V ⊂ V and all links E ⊂ E such that their heads and tails are in V . For example the subgraph of S1 Fig. S5B is not induced graph of G since it is missing the link {Y → Y }.
That is, S1 Fig. S5B is not an induced subgraph (but it is just a subgraph of G), because one of the links {Y → Y } with endpoints in V is missing. However the graph of S1 Fig. S5C is an induced subgraph of G since V = {Y, Y } and E = {Y → Y , Y → Y } are included, but links {A → Y, Y → B, Y → C, D → Y } don't belong to G . The problem of finding broken symmetry circuits consists on three steps: (1) identify a base and create a replica symmetry circuit, (2) find subgraphs isomorphic to the replica symmetry circuit, and (3) remove subgraphs that are not induced.
The first step is to find subgraphs isomorphic to the replica symmetry circuit. Let's consider an example. The matrix in S1 Fig. S5D represents the adjacency matrix A of the symmetric part of the SR flip-flop (i.e., a toggle switch). That is, A is the replica symmetry part (Fig. 4, third row) of the broken symmetry circuit. Similarly, S1 Fig this for all possible subgraphs in the entire network. However, this task is computationally expensive. Even for circuits with 5 nodes, the computational time is N 5 , where N is the total number of nodes in G, which means that for big enough graphs the algorithm can take very long computational time. Different approaches to this problem have been widely studied and are nicely reviewed in Ref. [46]. Time costs can be cut if unprofitable paths are identified and skipped in the search space. One of the recent works in the field is the VF2 algorithm developed by Cordella et al. [47]. It is designed to deal with large graphs and uses state of the art techniques in order to reduce computational time. We use the algorithm implemented in a popular R package igraph [48] as a function subgraph isomorphisms(...).
We provide the analysis and plotting scripts allowing to reproduce our results at https: //github.com/makselab/CircuitFinder.
The second step is to remove all the subgraphs that are not induced or, simply speaking, have extra links between the genes in the broken symmetry circuit. We follow this procedure: take a node set identified above, find the induced subgraph of the complete graph with this node set and compare the adjacency matrix of the induced subgraph with the adjacency matrix of the circuit. If the matrices are different, then the circuit is removed. All remaining circuits are the broken symmetry flip-flops that we are looking for. Multi-links and self-loops are removed from the network prior to consideration. By applying the steps described above we get the full list of induced subgraphs that are isomorphic to the given circuit.