^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: ZS VP. Performed the experiments: ZS. Analyzed the data: ZS. Wrote the paper: ZS VP.

As scientific advances in perturbing biological systems and technological advances in data acquisition allow the large-scale quantitative analysis of biological function, the robustness of organisms to both transient environmental stresses and inter-generational genetic changes is a fundamental impediment to the identifiability of mathematical models of these functions. An approach to overcoming this impediment is to reduce the space of possible models to take into account both types of robustness. However, the relationship between the two is still controversial. This work uncovers a network characteristic, transient responsiveness, for a specific function that correlates environmental imperturbability and genetic robustness. We test this characteristic extensively for dynamic networks of ordinary differential equations ranging up to 30 interacting nodes and find that there is a power-law relating environmental imperturbability and genetic robustness that tends to linearity as the number of nodes increases. Using our methods, we refine the classification of known 3-node motifs in terms of their environmental and genetic robustness. We demonstrate our approach by applying it to the chemotaxis signaling network. In particular, we investigate plausible models for the role of CheV protein in biochemical adaptation via a phosphorylation pathway, testing modifications that could improve the robustness of the system to environmental and/or genetic perturbation.

Advances in the ways that living systems can be perturbed in order to study how they function and sharp reductions in the cost of computer resources have allowed the collection of large amounts of data. The aim of biological system modeling is to analyze this data in order to pin down the precise interactions of molecules that underlie the observed functions. This is made difficult due to two features of biological systems: (1) Living things do not show an appreciable loss of function across large ranges of environmental factors. (2) Their function is inherited from parent to child more or less unchanged in spite of random mutations in genetic sequences. We find that these two features are more correlated in a specific subset of networks and show how to use this observation to find networks in which these two features appear together. Working within this smaller space of networks may make it easier to find suitable underlying models from data.

Biological systems in general show various types and degrees of robustness to environmental changes, meaning that they continue to function even when changes in the environment occur. This imperturbability is often accompanied by robustness to genetic perturbations, meaning that progeny function even though their genotype is not identical to the parent genotype

It has been argued that the ability of an organism to withstand genetic mutations improves its ability to evolve

In this study, we develop a computational experiment to investigate the plausibility of this hypothesis, that there is a general correlation between environmental and genetic robustness, and provide a quantitative measure of the degree of correlation, if any. In more detail, we shall show that the presence of a specific dynamic network characteristic in networks is associated with a better correlation between genetic and environmental robustness than found in networks where it is absent. Rather than focusing on a particular system in a specific organism, we choose one function of interest: The ability to attain steady state output for constant input. If a network capable of carrying out this function is robust to external environmental perturbations, what is the probability that it is also robust to internal (e.g., genetic) disruption? To be specific, we define environmental robustness of a biological network as the ability to maintain an output in the face of input perturbations. Genetic robustness is defined as the ability of a biochemical system to maintain the same output in the face of genetic mutations represented as rate constant changes in the equations representing it. This representation of a mutation as a jump from one set of parameters to another is a standard assumption

For mathematical convenience, we restrict our discussion to Michaelis-Menten type networks as they are likely to reach a steady state under constant inputs relative to general networks without sigmoidal saturation. Such networks were also used in the analysis of three node biochemically adaptable networks by Ma et al.

Defining a topology to be a graph of interactions independent of parameter values, we test a large number of random

There is a large literature on functional motifs that are necessary for a biological system to carry out specific tasks

Our approach can be used to select/reject plausible/improbable models of a system of interest. We demonstrate this via a comparative study of bacterial chemotaxis signaling systems. Chemotaxis is a process generally used by bacteria to sense changes in their chemical environment

In summary, we provide extensive evidence for a mathematical principle stating that, statistically speaking, dynamical systems that are biochemically adaptable are also genetically robust. We apply this knowledge to search for topological categories and subcategories within 3-node networks that show a particularly strong correlation and a linear relationship between their robustness to input and to parameter perturbations, and to shed more light on the chemotactic signaling pathways in bacteria. This method of searching for motifs can be extended to other functions and to bigger networks in order to find motifs that combine more complex functions necessitating larger numbers of nodes.

In the current work, we sample over 50,000 topologies each of 5-node, 10-node, 15-node, and 30-node networks, and over all 3^{9} possible topologies of 3-node networks. For each topology _{,} we compute two values:

^{(1)} for the 3-node networks the topologies are not randomly generated, rather sequentially in order to test all 3^{9} possible combinations. *^{(2)} if the network takes too long to reach equilibrium or the Jacobean matrix

In previous work on biochemical adaptability, it was assumed that networks that quickly respond to input change are better adapted than those with slower response

The Pearson test is performed to decide whether a network (a particular choice of a set of parameters for a particular topology) shows a transient response to a step change in input or not. The starting point is at steady state under a constant input concentration

We define and derive (see

For topologies with more than 3 nodes we sample over at least 50000 different ones of each size (5, 10, 15, and 30-node topologies) while 3-node topologies are exhaustively sampled. The different topologies (that have more than 3-nodes) are sampled randomly as described in

We find that over the parameter space of a topology

The values of the Pearson correlations within TR and NP networks show no clear pattern. This is mainly due to the variability introduced by parameters whose robustness stays invariant and reducible topologies within

Sampling over all 3^{9} possible topologies, our results show only 4153 topologies have associated TR networks. Within these topologies we find a significant linear correlation before (

We first consider two known motifs, the incoherent feedforward motif (IFF) and the negative feedback loop motif (NFL) and examine their corresponding relations. IFF (^{−28}, type2b: r = 0.97 and p = 10^{−51}), type2b shows a much steeper slope (1.12 for type2b, 0.33 for type2a, t_{test} = 3.8 and p = 0.0002). This steeper slope may be advantageous for specific biological functions, though both types show strong correlation between the two types of robustness. In the presence of IFF, the two types show no correlation (p = 0.14 and 0.20 for type2a and type2b respectively).

The red, green, and gray arrows indicate deactivation, activation, or either activation or deactivation through a direct or indirect path, respectively. We test two known general motifs, the incoherent feedforward loop (IFF) and the negative feedback loop (NFL). (A) All possible IFF and NFL motifs. (B) All 8 possibilities (types) each for the NFL1 and NFL2 motifs. (C) NFL1 Type2 subtypes.

Topologies are divided according to the motifs they contain. All the regressions are significant with p<0.0001). (A) Linear regression is applied for each category. IFF_only (red): _{3} auto-deactivation, Type2b (green and black)⇒type 2 without X_{3} auto-deactivation. Type2a without IFF (red):

In this section, we answer the following questions: (1) What is the reason for the large variation around the regression lines in

To answer the first question, we speculated that since clearly each of the parameters in a topology will have different robustness values, we might be able to separate the parameters into different categories such that the regression along each category leads to different slope values. If we show this to be true, then as the number of possible categories increases, one expects larger variation in the value of

Consistent with the 5-, 10- 15-, 30-node analysis above, we remove topologies with a low fraction of TR networks (

The links of each topology (and thus their corresponding parameters) are divided into 7 categories:

The second question is related to whether within each topology the parameter subspace corresponding to input robustness is positively correlated with that corresponding to parameter robustness. If they are not correlated, then the two subspaces could be disjoint and the collective/coarse-grained correlation (i.e., the correlation between the

We follow the same procedure as above and separate the parameters into the 7 categories depicted in

Within each topology ^{2}

Within each topology

The main proteins/receptors involved in E. coli chemotaxis are CheA, CheW, CheB, CheR, CheZ, and CheY. E. coli uses an anticlockwise rotation of its flagella to move forward. A decrease or increase in the concentration of nutrients (chemo-attractants) or harmful chemicals (chemo-repellents), respectively, provokes a change to a clockwise rotation which causes the E. coli to tumble and thus change direction. This signal to the flagella is controlled by the chemotaxis protein CheY. A stimulus (i.e., a change in the chemical concentration in the environment) is sensed by periplasmic binding proteins which couple to CheA in the inner membrane with the help of CheW. An increase in chemo-attractant concentrations inhibits the phosphorylation of the receptor complex CheA-CheW (RC-P) (^{−14}).

(A) and (B) are the original networks of E. coli chemotactic adaptation as described in the literature

Chemotaxis in many other bacteria is more complex and involves more proteins. One such protein is CheV which generally contains a phosphorylatable domain ^{3} possible sets of signed directed edges as listed in

Index | a | b | c |

1 | 1 | 1 | |

1 | 1 | 0 | |

1 | 1 | −1 | |

1 | 0 | 1 | |

1 | 0 | 0 | |

1 | 0 | −1 | |

1 | −1 | 1 | |

1 | −1 | 0 | |

1 | −1 | −1 | |

0 | 1 | 1 | |

0 | 1 | 0 | |

0 | 1 | −1 | |

0 | 0 | 1 | |

0 | 0 | 0 | |

0 | 0 | −1 | |

0 | −1 | 1 | |

0 | −1 | 0 | |

0 | −1 | −1 | |

−1 | 1 | 1 | |

−1 | 1 | 0 | |

−1 | 1 | −1 | |

−1 | 0 | 1 | |

−1 | 0 | 0 | |

−1 | 0 | −1 | |

−1 | −1 | 1 | |

−1 | −1 | 0 | |

−1 | −1 | −1 |

For each of the 27 topologies, we compute the

Each blue circle represents a topology responding to change in the concentration chemo-attractants in the environment. The index of each topology is given inside the circle. The blue circle highlighted in red represent two topologies, 14 and 15, whose positions overlap. The corresponding values of the fraction of TR networks (A), slopes of the linear regression of their

Each blue circle represents a topology responding to change in the concentration of chemo-repellents in the environment. The index of each topology is given inside the circle. The blue circle highlighted in red represent three topologies, 14, 15, and 17, whose positions overlap. Their corresponding values of fraction of TR network (A), slopes of the linear regression of their

In this work, we demonstrated that there is a general positive power-law correlation between environmental and genetic robustness in TR networks, and a statistically significant trend to a directly proportional linear relationship between the two in the limit of large networks. Conversely, monotonically responsive and non-responsive (NP) networks show a weaker relationship than TR ones. Furthermore, this distinction between the two classes becomes more prominent as the size of the networks increases. Therefore, this relationship associated with TR may be relevant to the evolution of biochemical networks. While other factors have played a role in the evolution of genetic robustness, our results show that, for TR networks, as the system evolves to withstand external environmental perturbations, it will, with high probability, concomitantly become robust to certain genetic perturbations.

We speculated that the inverse of the slope is proportional to _{,} p = 0.008 (1-tailed), p = 0.016 (2-tailed)). To confirm our results, we performed a Bayesian analysis for the model _{,} p = 0.07 (1-tailed), p = 0.13 (2-tailed)). The value of

A drawback of our method is that the random generation of large networks does not account for reducible topologies which can introduce more variability and thus more error and a lower correlation between the two robustness measures. This makes a comparison between the correlation coefficients of topologies of different sizes a trifle problematic. However, the space of topologies grows so rapidly with the number of nodes that the likelihood of randomly selecting a reducible network decreases precipitously. Similarly, the averaging method does not distinguish between links contributing to the robustness of either input or parameters and those that do not. A method that could pinpoint such links would be useful in this context.

Our results on the adaptability of 3-node motifs differ somewhat from Ref

Our results are consistent with biological networks described in the literature. For example, we show that the coarse-grained network topology of E. coli chemotaxis, as described in the literature _{1}_{1}_{1}_{1}

An example of IFF is the Ras model of MAPK cascades discussed in Ref

In Ref

Traditionally, network motifs represent subgraph topologies that appear in biological networks much more often than one would expect in a randomly constructed network

We use our approach to differentiate between plausible models of the role of the CheV-P protein in bacterial chemotaxis. We find that there are only a few possible ways that CheV-P can be linked to RC-P and M. We suggest that while there are at most 9 possible topologies, the most plausible one has M enhancing the phosphorylation of both CheV and the receptor complex.

Some specific network features have been associated with robustness to environmental variation in bacterial gene expression. Insulating gene expression by different modes of control, from activation to repression depending on the required high or low activity, has been suggested as a general control feature

Our approach to motif discovery can be extended to networks with backbones with more than 3 nodes. While exhaustive enumeration of small motifs with desired functions is fascinating

Following the same initial setup as in Ref

Assuming that the enzymes are non-cooperative and hence that they obey the Michaelis-Menten kinetics, the rate equations governing the dynamics of the network take the following compact form

In

^{4} and 10^{5}, while for ^{6} and 10^{7}. Typically, an iteration takes less than 10^{−3} seconds of CPU time for small networks (

We define a TR network as one whose output dynamics has a non-monotonic transient between two steady states as a response to input change (i.e., the steady state values before input perturbation and that after input perturbation). We find the transition time (i.e., the time at which the concentration is maximal/minimal before it starts decreasing/increasing again) and enzyme concentrations,

We use a Pearson test to determine if a given network is TR.

First, we define two functions

Note that the mean values are taken as the average over all the discretized time-steps; for example, ^{−10}). For example, in

When the Pearson test is performed using two different definitions of

We test the robustness of the Pearson test described above by comparing the results from the 3-node simulations to those employing instead the Spearman correlation using the same definition of

To quantify the degree of robustness to input and parameter perturbations of a particular network, we calculate the relative change in the steady state concentrations of the output node due to perturbing the input and parameter values, respectively. Let

Defining the degree of input and parameter robustness of a network

A robust topology is one that gives rise to robust networks with a higher probability when tested with a large number of parameter sets. Quantitatively, the degree of robustness to input perturbations of a given topology is taken to be the geometric average of

We choose the geometric average as more suitable than the arithmetic average as a conservative approach to detecting a possible correlation, as the latter gives too much weight to much larger outliers.

A trial is rejected if it takes too long to reach equilibrium, or its corresponding Jacobian with respect to the node concentrations is singular (i.e.,

We sample over 50,000 different topologies for each

We automatically reject trials wherein

As discussed above, the degree of input and parameter robustness is seen as inversely proportional to the average of the sensitivity of the steady state concentration of the output to each input and each parameter, respectively. Then,

In this section we analyze the parameter robustness of different types of parameters. Thus, the

Thus, the measure of robustness of a topology

Next, to obtain an idea about how robustness to input and parameter perturbations correlate within the networks of each individual topology, we calculate the value

_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}_{succ}

(TIF)

(TIF)

_{I}, E_{p} in the original and coarse-grained Ecoli topologies._{1} is the topology shown in _{2} is its coarse-grained equivalent shown in _{1} is the topology shown in _{2} is its coarse-grained equivalent shown in ^{−14}) respectively. Here we see more variation in the slope than in (A) as the fraction of TR networks is too low for accurate results.

(TIF)

_{I}, E_{p} for topologies number 1–3, 5–7, 10–12, 16, 19 when the input is a chemo-attractant.

(TIF)

_{I}, E_{p} for topologies number 4, 8–9, 13–15, 17–18, 20–27 when the input is a chemo-attractant.

(TIF)

_{I}, E_{p} for topologies number 4, 8–9, 13–15, 17–18, 20–27 when the input is a chemo-repellent.

(TIF)

(TIF)

(TIF)

_{ad}_{nad}_{test} = 0.18 and p = 0.86. Similarly, Pearson2 and Spearman2 (B) resulted in a significant linear correlation (r = 0.60 for both) and no significant difference between the two slopes (0.51 and 0.47, repectively): t_{test} = 1.71 and p = 0.09. Comparing the slopes of Pearson1 and Pearson2 (C), we obtain: t_{test} = 0.86 and p = 0.36. Comparing those of Spearman1 and Spearman2 (D), we obtain: t_{test} = 0.75 and p = 0.46.

(TIF)

_{test} = 0.57, p = 0.57).

(TIF)

(TIF)

(TIF)

^{2} within the networks of each 3-node topology divided into 7 categories.

(TIF)

(TIF)

S1 Pearson Correlations within TR and NP networks. S2 Steady state analysis.

(DOCX)

This study utilized the high-performance computational capabilities of the Biowulf Linux Cluster at the NIH, Bethesda, MD (