Fig 1.
A minimal GRN obtained through the CoReSym reduction method.
Reduction of Gene Regulatory Network (GRN) of E. coli bacteria. Left: 879 nodes operon-GRN of E. coli. Here we show only the weakly connected component, i.e.small disconnected pieces of the network are not shown, since they do not play a significant role for the network dynamics. Node sizes and font size are proportional to out-degree of the node. Right: a representative of the minimal GRN obtained after the application of our CoReSym reduction method. The simple depiction of the core network illustrates the signal flow between its different components (bigger “nodes”), the strongly connected components of the network (SCC: a component of a graph that within it, each node can be reached from every other node). The smaller gene nodes inside these SCCs form the computational core of the network. Colored nodes represent collapsed-fiber nodes. Bigger arrows represent the edges between the components. The three nodes outside the components are representative of controller nodes, which send signals to different SCCs. Interestingly, two parallel feedforward structures exist between the components: the central crp-fis SCC regulates the soxS SCC. In one feedforward structure, they regulate jointly the pH SCC and in the other one the mara-rob SCC.
Fig 2.
Canonical fiber building blocks.
These correspond to the canincal fiber building blocks observed in the GRNs of E. coli and B. subtilis, with examples taken from E. coli. The networks can be seen as assemblies of 5 basic classes of fibration building blocks: (i) Trivial fibers. A number of external regulators identically regulate the genes in a fiber, which then show synchronous dynamics. Operons with only one promoter belong to this class, where colored nodes represent genes belonging to the operon (perhaps with more colored nodes in the fibers, depending on the number of genes in the operon). (ii) The feedforward fiber and its sub-classes of
-FFF with
external regulators. The FF fiber is defined by a feedforward motif with a self-loop in the synchronous set of genes, and the number of
external regulators. (iii) The Fibonacci fiber,
-FF. A more complex building block, defined by a fractal dimension branching ratio that occurs given the presence of a self-loop and a feedback regulation from the fiber back to the regulator(s). The Fibonacci fibers observed in E. coli have a branching ratio between 1 and 2, placing this building block in between the FFF fibers and the n=2 fibers. (iv) The n=2 fibers, defined by two self-loops in the synchronized genes. When this symmetry is broken it forms the memory and oscillatory logic circuits embedded in the SCCs. And finally (v) composite fibers of the previous ones. By adding different types of the previous 4 building blocks, in a sequential manner, a composite fiber is obtained. An interesting consequence of this is the synchronization of genes that may be far apart from each other and don’t share any regulation.
Fig 3.
Fibration and kout-core decomposition.
(A) Graph G, a subgraph of the GRN of E. coli, shows a Fibonacci building block with class number [10, 11]. All three mappings are morphisms since the images of all the nodes in
and G3 are connected only when corresponding nodes in G are connected, respecting the incidences. The mapping
, in the left, corresponds to a surjective fibration: all nodes with isomorphic input trees are collapsed to one (nodes 2 and 3 collapsed to
), all input trees are preserved, hence the lifting property is satisfied. Mapping
is an injective fibration. Indeed, it is easy to see that the original graph is embedded in G2 making this map a morphism where all input trees are preserved. Some nodes and edges are added but without breaking the original input trees. The mapping
, which maps node 4 to
does not correspond to a fibration given that the input-tree of node 4 (seen on B) is not preserved in its image node
in graph G3, the same problem occurs with the images of nodes 2 and 3 (
and
respectively), their input trees are not preserved as the former input from node 4 is lost. Edges
and
cannot be uniquely lifted at
, since they need to be lifted to a,f and c,g, respectively, for the mapping to be a morphism. In practical terms, since the input from node 4 is lost, graph G3 represents an entirely different dynamical system from graph G. If the graph G represents a GRN, genes
and
in G3 would have a different expression pattern than genes 2 and 3. ( B) Shows the input sets and ( C) the input trees of nodes in graph G. The input set of node 2 is repeatedly attached to node 2 in every layer of the trees, due to its self-loop, this process is repeated ad infinitum. As a result, the input trees of nodes 1, 2 and 3 are infinite; however, since G has only 4 nodes, it suffices to verify the isomorphism up to the third layer of their trees, hence nodes 2 and 3 are determined to have isomorphic input trees. ( E) Example of the k-core decomposition of graph G2 from ( A). Even though node
on the outer
shell (in Yellow) does have one output, once nodes
and
in the shell are removed, it will then be left with no output and will be removed as well. All the remaining nodes in the
core have at least 1 output after doing this process.
Fig 4.
Network reduction of the GRN of E. coli and B. subtilis.
For E. coli’s network, we start with the network from Fig 1, rearranged to show the outward flow of signals from the minimal network and with genes in the same fibers colored the same. SCCs are enclosed by ellipses. Genes names are shown only for genes that are part of the minimal network. Most of the genes belonging to fibers can be seen located in the periphery (outer regions) of the network. Step one of the CoReSym procedure collapses all fibers into one representative node, resulting in the base network obtained from the minimal surjective fibration. Step two uses a k-core decomposition to removes all the dead-end paths ending at nodes with no output, resulting in the minimal network, with only 42 nodes for E. coli and 22 in B. subtilis. Both minimal networks have a master SCC that regulates the rest, connector nodes connecting different SCCs as well as controller nodes sending inputs to the SCCs.
Table 1.
Gene counts of original and reduced GRNs.
The full genomes for E. coli and B. subtilis contain 4,690 genes (according to RegulonDB [12]) and 6121 genes(obtained from SubtiWiki [45]), respectively. Among all genes only 1843 and 2482, respectively, express TFs with known interactions. The first reduction step in E. coli was performed by a trivial fibration (collapsing the operons, which are trivial fibers), before applying CoReSym. This could also be seen as a part of Step 1 and as such is not needed for B. subtilis.
Table 2.
Statistics and fiber coverage of the two GRNs.
For E. coli we start with the 879 operon-GRN from Step 0.2 (see previous Table 1). For E. coli, Step 1 collapses the 416 nodes within fibers into 92 fiber-collapsed nodes (one for each fiber), to give the collapsed-fibers 555 nodes Base-GRN. For B. subtilis, the 2263 fibered nodes are collapsed into 302 fiber-collapsed nodes (one for each fiber), resulting in the 521 nodes Base-GRN. Step 2 removes all the nodes in the outer shell, including 82 of the fiber-collapsed nodes for E. coli and 290 for B. subtilis, thus leaving only the minimal GRN. The minimal networks are composed of the nodes in SCC and the connectors nodes. The k-shells are actually much bigger, but we count the genes inside them after collapsing the fibers, resulting in only 82 fiber-collapsed nodes (in the case of E. coli) instead of counting all the original nodes that belong in these fibers.
Table 3.
List of gene circuits in E. coli and B. subtilis.
The circuits found in E. coli are described in detail in Fig 5 and discussed at length in Section C in S1 Text.
Fig 5.
Circuits in the GRN of E. coli.
(A) The minimal GRN of E. coli and the circuits embedded in it, shown with red links for the symmetry breaking inputs to the toggle-switch fliz-csgd. The biggest SCC is in charge mostly of pH responses. Colored nodes represent fibers. ( B) The two-node SCCs and ( C) the soxS SCC and its circuits. For each circuit, the incoming signals that break the symmetry are shown.
Fig 6.
Some simple directed cycles in E. coli.
The networks shown are different cycles that cross through the logic circuits soxS-fur (left) and csgD-fliz (right). Arrow colors denote the overall sign (overall activation: blue; overall inhibition: red).
Fig 7.
Sketch of reduced GRN of B. subtilis.
The 4 SCCs are shown: siga SCC, sigk-gere SCC, sigf-sigg SCC, and lexa-rocr SCC. The signaling flows between them: with the siga SCC the hub controlling the other three SCC and being fed information signals by the two controllers sens and sala, controller node spoiiid connects the sigk-gere SCC and sigf-sigg. A feedforward structure between the siga SCC and the sigk-gere SCC is visible.
Fig 8.
Statistically significant large-scale structures in E. coli and B. subtilis core GRNs.
The structure between SCCs is compared to the corresponding structure in randomized networks with the same in- and out-degrees of all nodes (and preserving the edge types). A) Histogram illustrating the distribution of the observed structures in the core of the random networks along with a sketch of the structure itself atop the histogram. Following the format of Fig 1, purple circles represent SCCs and arrows stand for edges between SCCs. B) Structures observed in E. coli’s and B. subtilis’s core shown with the Z-scores of obtaining a structure with such number of SCCs from the randomized networks.