A novel structure-based control method for analyzing nonlinear dynamics in biological networks

Exploring complex biological systems requires adequate knowledge of the system’s underlying wiring diagram but not its specific functional forms. Thus, exploration actually requires the concepts and approaches delivered by structure-based network control, which investigates the controllability of complex networks through a minimum set of input nodes. Traditional structure-based control methods focus on the structure of complex systems with linear dynamics and may not match the meaning of control well in some biological systems. Here we took into consideration the nonlinear dynamics of some biological networks and formalized the nonlinear control problem of undirected dynamical networks (NCU). Then, we designed and implemented a novel and general graphic-theoretic algorithm (NCUA) from the perspective of the feedback vertex set to discover the possible minimum sets of the input nodes in controlling the network state. We applied our NCUA to both synthetic networks and real-world networks to investigate how the network parameters, such as the scaling exponent and the degree heterogeneity, affect the control characteristics of networks with nonlinear dynamics. The NCUA was applied to analyze the patient-specific molecular networks corresponding to patients across multiple datasets from The Cancer Genome Atlas (TCGA), which demonstrates the advantages of the nonlinear control method to characterize and quantify the patient-state change over the other state-of-the-art linear control methods. Thus, our model opens a new way to control the undesired transition of cancer states and provides a powerful tool for theoretical research on network control, especially in biological fields. Author summary Complex biological systems usually have nonlinear dynamics, such as the biological gene (protein) interaction network and gene co-expression networks. However, most of the structure-based network control methods focus on the structure of complex systems with linear dynamics. Thus, the ultimate purpose to control biological networks is still too complicated to be directly solved by such network control methods. We currently lack a framework to control the biological networks with nonlinear and undirected dynamics theoretically and computationally. Here, we discuss the concept of the nonlinear control problem of undirected dynamical networks (NCU) and present the novel graphic-theoretic algorithm from the perspective of a feedback vertex set for identifying the possible sets with minimum input nodes in controlling the networks. The NCUA searches the minimum set of input nodes to drive the network from the undesired attractor to the desired attractor, which is different from conventional linear network control, such as that found in the Maximum Matching Sets (MMS) and Minimum Dominating Sets (MDS) algorithms. In this work, we evaluated the NCUA on multiple synthetic scale-free networks and real complex networks with nonlinear dynamics and found the novel control characteristics of the undirected scale-free networks. We used the NCUA to thoroughly investigate the sample-specific networks and their nonlinear controllability corresponding to cancer samples from TCGA which are enriched with known driver genes and known drug target as controls of pathologic phenotype transitions. We found that our NCUA control method has a better predicted performance for indicating and quantifying the patient biological system changes than that of the state-of-the-art linear control methods. Our approach provides a powerful tool for theoretical research on network control, especially in a range of biological fields.


Introduction
Numerous biological systems can be represented as networks, and several approaches have been developed to construct reliable biological networks [1,2]. Since the control process is dominated by the intrinsic structure and dynamic propagation within the system, the concepts and approaches of structure-based network control are emergently required to investigate the controllability of complex networks through a minimum set of input nodes [3][4][5][6][7][8][9][10][11][12][13]. The analysis of biological systems from the structure-based control viewpoint provides a deeper understanding of the dynamics of complex large-scale biological systems [14][15][16]. So far, the studies exploiting the structure-based control of complex networks can be mainly divided into two categories according to the styles of the networks, that is, the approaches focusing on directed networks [3][4][5][6][10][11][12][13]17] and the methods focusing on undirected networks [7][8][9] For directed networks, many researchers have developed linear structural control tools to identify the minimum number of input nodes that need to be controlled by external signals for the system to achieve the desired control objectives [5,6,13]. Although those linear control tools have many applications to biomolecular systems, such as in the detection of driver metabolites in the human liver metabolic network [14] and driver gene discovery in pan-cancer datasets [15], those tools may only give an incomplete view of the network control properties of a system with nonlinear dynamics [17]. Recently, an analytical tool called a feedback vertex set control (FC) has been shown to study the control of large directed networks in a reliable and nonlinear manner, where the network structure is prior-known and the functional form of the governing equations is not specified, but must satisfy some properties [12,18]. This formalism identifies the set of feedback vertex nodes (FVS) in networks, uniquely determining the long-term dynamics of the entire network. With such a scheme, the source nodes can converge to a unique state (or trajectory) without independent control [12,18]. Zañudo et al. showed that both the state of the source nodes and FVS can change the dynamic attractors available to the network; they identified the source nodes and FVS as the input nodes to control the direct networks with nonlinear dynamics [17]. The above approaches only focus on the linear or nonlinear dynamics of directed networks. There are few approaches to investigate the linear dynamics on undirected networks. For example, an exact controllability framework [7], an analytical framework, offers a tool to treat the structural controllability of undirected networks; the Minimum Dominating Sets (MDS) [8] is an alternative way to investigate the controllability of undirected linear networks, since it works with a strong assumption that the controllers can control its outgoing links independently. Therefore, there is still a need for efficient tools to analyze the structural controllability of the undirected networks with nonlinear dynamics.
In this paper, we first formalize the nonlinear control problem of undirected networks (NCU), that is, how to choose the proper input nodes to drive the network from one attractor to a desired attractor in the networks with nonlinear and undirected dynamics. We developed a novel graphic-theoretic algorithm (NCUA) to measure the controllability of undirected networks based on the feedback vertex sets. Specifically, (i) we assume that each edge in a network is bidirectional; (ii) we construct a bipartite graph from the original undirected network, in which the nodes of the top side are the nodes of the original graph and the nodes of the bottom side are the edges of the original graph (Figure 1 (b)); (iii) we adopt an equivalent optimization procedure for determining the MDS of the top side nodes to cover the bottom side nodes in the bipartite graph that can control the whole network using mathematical terms; and (iv) we apply random Markov chain sampling to obtain the distribution of the input nodes set and uncover the possible sets of the input nodes to control the undirected network.
Since most real world networks have a statistically significant power-law distribution, we generally have defined the control characteristics as the fraction of identified minimum input nodes and applied NCUA for multiple synthetic scale-free (SF) networks and real-world networks, and obtained several counterintuitive findings: i) the fraction of input nodes in the network increases when the degree exponent value increases for fixed average degree, indicating that control characteristics is affected by degree heterogeneity; ii) new degree heterogeneity is defined and the fraction of input nodes decreases monotonically when degree heterogeneity becomes larger for fixed average degree. Furthermore, the degree heterogeneity and the average degree determine the minimum number of control input nodes; iii) the set of input nodes tends to be highly target-connected nodes, whereas the previous linear control study suggested that driver nodes tend to avoid high-degree nodes [9][10][11][12].
We also investigated the network transition between the disease state and normal state identifiable with the stable network states (dynamical attractors) in personalized patient networks. For each sample of each cancer patient from 10 kinds of cancer sites in TCGA, we constructed a personalized differential network between the normal state and disease state, and applied the NCUA for finding their key control genes on pathologic phenotype transitions. We found that (i) although most of the cancer samples have a similar nonlinear controllability, the determining control genes still differ for different cancer samples; (ii) we identified the controllability of the reconstructed individual networks for single samples across 10 cancer datasets, and we found the high confidence cancer-specific key genes have significant enrichments in the cancer genes census (CGC) set and the FDA-approved drug target genes (DTG) set. Compared with the traditional control model of linear networks (Exact control and Liu's linear control) [7,8], our results imply that a single-patient system in cancer may be more controllable than predicted on linear dynamical networks due to the ubiquity of the nonlinear features in biological networks. In contrast to another model on the network control of undirected networks called MDS [8], our NCUA also showed a higher performance in identifying the key genes in the CGC and DTG, which were underestimated by the MDS. In conclusion, our model provides a new powerful tool for theoretical and empirical study of network controllability, especially in biological and biomedical fields.
. CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted December 21, 2018. . https://doi.org/10.1101/503565 doi: bioRxiv preprint

Formulation of the NCU
Network dynamics are commonly nonlinear, especially at the level of nodes or small groups of nodes in the network [19]. In past decades, the focus of network control research has shifted from linear dynamics to nonlinear dynamics [12,[20][21][22][23][24].
One of these methods, namely the feedback vertex set control (FC) [12,18], can be reliably applied to large complex networks in which the structure is well known and the functional form of the governing equations is not specified but must satisfy some properties. Although Zañudo et al. applied the FC to study dynamic models of direct networks to predict nodes for the control of various technical, social, and biological networks [17], we still lack a framework to solve the nonlinear control problem on undirected networks [8]. Here, we focus on the nonlinear control problem of undirected networks. Given an undirected network G (V, E), we generally consider the following broader class of the model [23] to be the following: . Then, we formalize the concept of the nonlinear control of the undirected networks, which is how we chose the set of input nodes that are injected by input signal u with the minimum cost to control the above equation (1) from an initial attractor to a desired attractor. In Figure 1, we give a diagram illustration of our NCU with a simple example.

Algorithm for the Nonlinear Control of an Undirected Network (NCUA)
In many complex biological systems, there is adequate knowledge of the underlying wiring diagram, but not of the specific functional forms [17]. Analyzing such complicated systems requires concepts and approaches of structure-based control, which investigates the controllability of complex networks through a minimum set of input nodes. The traditional structure-based control methods, such as the Structural by the cycle structure and the source nodes of the network [17]. However, they focused on the structure control of direct networks with nonlinear dynamics. We still lack an analytical framework for the feedback control of undirected networks.
. CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted December 21, 2018. . https://doi.org/10.1101/503565 doi: bioRxiv preprint Therefore, to solve the above proposed NCU, we developed a novel algorithm, the NCUA, which is based on the assumption that the edges of the undirected networks are modeled as the bi-directed edges. with nonlinear dynamic, we chose the set of input nodes with the minimum cost to control the system which is represented by an undirected network from the initial attractor to the desired attractor. By controlling the three minimum feedback vertex nodes 1 4 9 { , , } v v v and ensuring that the removal of the three nodes leaves the graph without cycles, the system is guaranteed to be controllable from initial attractor to desired attractor. By using Liu's linear control and exact control method, we identified random Markov chain sampling to obtain different input node sets. In Figure 1 give a diagram to illustrate the process of our NCUA for discovering the possible input nodes. The details of the NCUA are introduced below.

I. Constructing a bipartite graph from the original undirected network
For a given undirected network G (V, E), we assume that each edge is bi-direct

II. Obtaining the cover set with minimum cost by using Integer Linear Programming (ILP)
problem for determining the nodes to control the whole network, that is, how to select a proper node set S, in which for each node This problem can be solved by solving the following ILP model, where it will take the value x i =1 when node i belongs to the cover set; the object is to obtain the minimum number of nodes to cover set V ⊥ . Although it is an NP-hard problem [8], the optional solution is obtained efficiently for moderate sizes of graphs with up to a few tens of thousands of nodes by utilizing an algorithm that uses the LP-based classic branch and bound method [26,27] to determine the optimal solution.

III. Obtaining different input nodes by using random Markov chain sampling.
Here, we define the minimum dominating nodes in the "bipartite graph"as a Iteration: For t=1, 2,…, obtain M t+1 from M t as follows: . CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted December 21, 2018. . https://doi.org/10.1101/503565 doi: bioRxiv preprint Choose a node w uniformly at random in M t . Then, delete node w and add a new which can cover the edges connected by node w in the bipartite has been obtained.
Accept the new Markov Chain M t+1 randomly.
We terminate the procedure of the MC sampling when the absolute percentage error,

Controllability of the SF network revealed by the NCU in synthetic networks
In order to evaluate the control characteristics of the NCU, we applied our NCUA to the synthetic SF networks generated by the static model [13,29]  Then, we applied our NCUA to estimate the minimum number of input nodes to control the networks with nonlinear dynamics. For a given γ and average degree <k>, 100 networks of 10,000 nodes were constructed. The results of the NCUA were averaged over all realizations. We list the numerical results of our NCUA for the synthetic networks in Figure 2.
In fact, we plotted the NCU size as a function of the degree exponents and the average degrees and list the results in Figure 2 (a-c). In Figure 2 Figure 2(a, c). These results are complemented by Figure 2 (b-c), where it . CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a  [13]; however, these results are in agreement with the results of the MDS control scheme [8]. Note that the diagram of EC control, Liu's linear control, MDS control, and our NCUA are shown in Figure S2.

Counterintuitive findings of the controllability from the NCU on real-world networks
We collected 17 networks with 11 categories, which were chosen for their diversity in applications and scopes (Additional File 2). By calculating the P-value of the Kolmogorov-Smirnov goodness-of-fit statistic [31], whose results are listed in Table   S3, we found that the above networks are significantly subject to the power-law distribution; the detailed results are shown in Supplementary Note 5 of Additional  Figure 3 (a), we show that the number of input nodes has a tendency to increase as the exponent and the average degree increase.
Furthermore, in Figure 3(a), we can evaluate the value of scaling exponent approximately by fitting its control characteristic to that on the synthetic networks.
We observed that the degree heterogeneity becomes larger as the number of We list the results of the number of input nodes in the function of the average degree and the new converted degree heterogeneity measure in Figure 3(b). As shown in Figure 3 (b), we find that networks with a lower average degree and higher degree heterogeneity are easier to control than those with a large average degree. The control characteristics of networks can be fully discriminated by the new converted degree heterogeneity and the average degree. We also find that the set of input nodes tends to highly target connected nodes, whereas the previous linear control study suggested that driver nodes tend to avoid high degree nodes, as shown in Figure 3(c) [7].  We observe that most types of biological networks (e.g., gene regulatory, PPI, and genetic networks) require the control of a smaller fraction of nodes than social networks (trust and social communication networks); the fraction of input nodes is between 10% and 30% in biological networks vs. more than 40% in social networks.
These predictions match well with recent experimental results in cellular reprogramming and large scale social network experiments [32,33]. Note that this prediction stands in contrast with those of linear control [7] on the same type of networks, and to some extent, can address the initial arguments on network controllability [34,35].
To ensure that our NCU is physically significant, we then focused on the required control energy and the control time to achieve control for networks with nonlinear dynamics. We applied a 3-dimensional stable nonlinear Lorenz oscillator system [36,37] on the real-world network to control the networked system to the  ). Note that in Figure 4, the energy cost and time cost of a given network are the average energy and time cost of different input nodes, respectively.
Finally, we evaluated the differences between closed-loop controller and linear feedback controller on the nonlinear network control. Here, we adopt the local feedback controllers [36,37] and closed-loop controllers [23] on the real-world networks. Figure 4 shows that closed-loop controllers demand a greater number of determining nodes, but require less control time and control energy than the traditional linear feedback controllers. . CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted December 21, 2018. . https://doi.org/10.1101/503565 doi: bioRxiv preprint expressed genes and where the edge exists in the both gene-gene interaction network and the differential expression network for each patient ( Figure S1 in Additional File 1).
We first used KS statistics to find that these networks have scale free properties.
We then computed the linear regression coefficient between the frequency of the network degree d (log10(f(d))) and the log 10 transformed degree d (log10(d)). We found that the scale free exponents of the single sample networks in different types of cancer are less than 2 (Additional File 3). We also found that the NCU controllability for the single sample network in different types of cancer, that is the ability to alter the normal state and tumor state, will be much easier to control than the controllability of the network linear dynamics, including the EC control scheme and Liu's control scheme (Figures 5 (a) and S5 of Additional File 1). This result reveals that for the cancer patient, we only need a small fraction of genes to change the network state between the stable states, which is not applicable for controlling the biological network from initial states to any states in linear dynamics. This observation is in agreement with previous biological conclusions [34,35].
The key control genes in the patient-manner network were further investigated using the NCUA method. In fact, the NCUA method provides a ranking of the nodes as the input control nodes according to the value of the frequencies of the nodes, in which the input control nodes are ordered by decreasing the sampling frequency in the random Markov chain sampling. We first defined the personalized key control genes as the genes that appear as the key control nodes with a high frequency (f>0.6) in the patient-manner network. Then, we calculated the frequency of the personalized key control genes for different cancer datasets. We defined the high confidence cancer-specific determining genes (f>0.6), middle confidence cancer-specific key control genes (0.3<f<=0.6), and low confidence cancer-specific key control genes (f<=0.3). The computational results of 10 cancer datasets are listed in Figure 5 (b1). Finally, we computed the p-value of the high-confidence key control genes enriching the cancer genes census set [42] or FDA-approved drug targeted genes set [43] by using the hyper-geometric test [44]. If the calculated p-value was less than 0.05, then we regarded that this cancer gene set is significantly enriched in the Cancer Genes Census set and FDA-approved DTG set. Figure 5 (b2) shows that the high-confidence determining control genes for different cancer datasets have a good enrichment in the cancer genes census set and the FDA-approved DTG set.
Furthermore, we find that the set of input nodes tends to target highly connected nodes, as shown in Figure S6 of Additional File 1. These results are in agreement with previous biological observations [45,46].  not in the NCU nodes in the Cancer Genes Census set and FDA-approved drug targeted genes set. Note: *scores of the ESG are larger than 5, but less than 10; **scores of the ESG are larger than 10, but less than 15; ***scores of the ESG are larger than 15.

Discussions
. CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted December 21, 2018. . https://doi.org/10.1101/503565 doi: bioRxiv preprint Nacher and Akutsu introduced the MDS to study the controllability of undirected networks by assuming that each node in the MDS can control all of its outgoing edges separately [8]. However, the MDS-based model assumes that more powerful control is possible (because each driver node can control its outgoing links independently), which has the possible drawback of requiring higher costs and may not be possible in many kinds of networks. Even if such powerful controllers exist, the MDS-based model still suffers from the underestimated nonlinear control of complex systems (networks). Despite its success and widespread application in searching for the important genes in the protein interaction network [16,[47][48][49][50], the MDS-based model may give an incomplete view of the undirected network control properties. In the case of a network with nonlinear dynamics, the definition of control (full control; from any initial to any final state) for the MDS-based model does not always match the meaning of control in biological, technological, and social systems, where control tends to involve only naturally occurring system states.
In this work, our control model NCU drives the whole-networked system from the initial state toward its desired dynamical attractors (e.g., the steady states and limit state cycles) by steering the input nodes to the desired dynamic attractors. Our NCU algorithm (NCUA) predicts the input nodes whose override (by an external controller or drive signals) can steer a network's dynamics toward its desired long-term dynamic behaviors (its desired dynamical attractors). Furthermore, we used the NCU control model on biological, technological, and social networks, and we identified the topological characteristics underlying the predicted node overrides. We also identified that the networks with a low average degree are easier to control than those with a large average degree, which is opposite to the previous observation from the MDS theory, as shown in Figure S7. We summarize the difference between the MDS-based method and the NCU method in Table S2 of Additional File 1.
The NCU and MDS methods are very different methods, so one should be careful about extending their predictions beyond their realm of applicability. In fact, in the case of network with MDS's assumption, the key nodes identified using our . CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted December 21, 2018. . https://doi.org/10.1101/503565 doi: bioRxiv preprint NCU control model can provide sufficient conditions to control the system from any initial state to any desired final steady state. For example, in Figure 1, the key nodes 1 4 9 { , , } v v v identified using our NCU control model dominate the nodes of the whole network for the MDS model, but the key nodes 1 4 { , } v v identified using the MDS model cannot cover the edges of the whole network for our NCU control model. To further emphasize the advantage of the NCU method over the MDS method, we provide the enrichment results from the CGC set and DTG set of the input nodes, which are nodes of the NCU, but not of the MDS for individual (paired) samples in the 10 cancer datasets. Figure 5(c) shows that the NCUA can identify the key genes in the CGC set and the FDA-approved DTG set, which are missed using the MDS method. The NCU model provides us with a more complete insight into the control of network-based systems.

Conclusions
Generally, complex biological networks whose data are limited can be diagrammed less accurately than networks, such as power grid networks. Recently, several control principles have been developed to control complex networks, but controlling complex biological networks is still hindered by network data [24]. In biological networks, we usually utilize undirected networks to model the protein interactions. Controlling the network dynamics by regulating some key nodes in the undirected networks to achieve optimal performance is still a big challenge. Two conventional control frameworks for undirected network dynamics, that is, the exact controllability and MDS-based model, focus on the linear dynamics in undirected networks. A theoretical control framework is urgently required to solve the nonlinear control problem in the undirected networks. Instead of focusing on how to obtain the state transitions of the undirected network with linear dynamics, a new concept, the nonlinear control of undirected networks (NCU), is introduced to understand how we can estimate the ability of the proper set of input nodes to achieve the control from the initial attractor to the desired attractor in undirected networks. To solve this problem, an NCUA based . CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted December 21, 2018. . https://doi.org/10.1101/503565 doi: bioRxiv preprint on feedback vertex sets was designed and implemented. The NCUA has been evaluated on multiple synthetic SF networks and real complex networks, and it has exhibited the novel control characteristics of the undirected SF networks with nonlinear dynamics. The NCUA has also been applied to investigate the networks and their nonlinear controllability of cancer samples from TCGA by screening known driver genes and known drug targets as controls of their phenotype transitions, as well as to provide meaningful predictions with biological significance. Interestingly, we find that the control performance of our nonlinear control method in the single-patient system in cancer is much better than that of the traditional linear control methods, which are limited to a canonical linear time-invariant approximation. The key control genes for the individual cancer samples have significant enrichments both in the CGC set and the FDA-approved DTG set. Furthermore, it is worth exploring how to solve the NCU model with more constrained conditions (such as the target control and constrained target control [5,6,13]) and how to extend our method to the edge dynamics [10] to create new avenues to tackle complex systems. Note that although the NCUA is applied to the analysis of undirected networks, we believe that, in the future, it can be extended to the analysis of directed or semi-directed networks after the implementation of a module processing technique on directed or semi-directed networks with a network community detection algorithm from the microcosmic perspective [47,51].