Systematic Reverse Engineering of Network Topologies: A Case Study of Resettable Bistable Cellular Responses

A focused theme in systems biology is to uncover design principles of biological networks, that is, how specific network structures yield specific systems properties. For this purpose, we have previously developed a reverse engineering procedure to identify network topologies with high likelihood in generating desired systems properties. Our method searches the continuous parameter space of an assembly of network topologies, without enumerating individual network topologies separately as traditionally done in other reverse engineering procedures. Here we tested this CPSS (continuous parameter space search) method on a previously studied problem: the resettable bistability of an Rb-E2F gene network in regulating the quiescence-to-proliferation transition of mammalian cells. From a simplified Rb-E2F gene network, we identified network topologies responsible for generating resettable bistability. The CPSS-identified topologies are consistent with those reported in the previous study based on individual topology search (ITS), demonstrating the effectiveness of the CPSS approach. Since the CPSS and ITS searches are based on different mathematical formulations and different algorithms, the consistency of the results also helps cross-validate both approaches. A unique advantage of the CPSS approach lies in its applicability to biological networks with large numbers of nodes. To aid the application of the CPSS approach to the study of other biological systems, we have developed a computer package that is available in Information S1.

T1. Determination of optimal number of Cluster for Case I (normal) Study Fig. S1 presents clustering results of the 'good' sample data points for Case I (normal) for cluster sizes 2, 3, 4, and 5 using K-mean clustering algorithm [1,2]. By visualizing the top two principal components, we can observe clusters are nicely attached to each other, suggesting that these clusters might be sub-clusters of a single cluster. To explore this possibility and hence to determine optimal cluster number more precisely, we computed the Mean Silhouette Coefficient (MSC) value [3,4] for each cluster, for each cluster size.
The Silhouette coefficient is a numerical value assigned to each point in each cluster that indicates how well the point was clustered. This value ranges between -1 and 1, with negative values indicating that the point was poorly assigned to its cluster, and a positive value indicating that the point was properly assigned. increase of the cluster number and it is highest for cluster number is 2. Since the minimum cluster size from the K-means clustering algorithm is 2, next we explore the possibility that the two clusters actually belongs to one big cluster. To assess this, we further check whether the parameter sets are indeed distributed in two distinct clusters or they form a single cluster by a manual inspection. First, we separate the good parameter sets for each cluster. Then, we examine the pair distribution of each combination of the 6 parameters. For most links (group 1, links 3, 7, 8, 9) the distributions are completely merges each other. An example is shown in Fig. S3a. For link 4 and 6 (group 2), clearly the K-mean clustering algorithm artificially divide a continuous distribution into two clusters (Fig. S3b). This can also been seen from the distribution between a link from group 1 and a link from group 2 (Fig. S3c). Therefore these two clusters should be merged into a single cluster. We follow a similar treatment for Case I (constrained) situation also (results not shown here) and found that all 'good' parameters are distributed in a single cluster.

T2. Parameter Search result for Case II (normal)
We perform another case study where we have considered all nine possible links (including links 1, 2 and 5 in Fig. 2 (b)) among three nodes. We use a similar two-stage Metropolis algorithm to generate 'good' sample data points. We take 42 'good' points distributed in space from stage I search and perform 10 7 random iterations for stage II searching method, starting from each 'good' point. Finally, we get ~2.43x10 5 good sample data points distributed in the parameter space. Further analysis shows that these sample points form a single cluster.
(a) Mean Network topology for Case II (normal) situation: The mean value matrix ( Fig. S4 (a)) and the topology matrix ( Fig. S4 (b)) suggest that the self-interaction strength for both MD and RB nodes, i.e.; link 1 and link 5 respectively are very weak and thus those links do not present in the mean network topology (Fig. S4(c)). This implies that these two links are not needed to obtain a resettable bistability. On the other hand our search shows that an inhibition from RB node to MD node (link 2) appears in the mean network topology. Thus link 2 and link 4 form a double negative feedback loop. This feature from the full model search was absent in the work of Yao et. al. [5], where the links 1, 2, and 5 are not considered.

T3. Parameter Search result for Case II (Constrained) situation:
Similar to our Case I study in the main text, we have performed the parameter search with another constrain (R 0 >3) on the resettable bistability. We refer this study as Case II (constrained) situation. The mean network topology in Fig. S5 shows that the additional constrain does not change the mean motif obtained in the 'normal' situation

T4. Backbone motif from CV matrix for Case II situation:
We construct the CV matrix for both normal (Fig. S6 (a)) and constrained (Fig. S6 (b)) situation and set a cut-off 0.8 and 0.5 respectively to determine the backbone motif. For both situation we found links 7 and 9 (Fig. S7) construct the backbone motif to generate resettable bistability. Table S1 describes the sample ration we obtain for each case to determine the backbone motif from the CV matrix. Table S2 gives a few top minimum motifs with 2, 3, or 4 links.   7-----9 Case II (normal) 91.5 Yes