A novel structure-based control method for controlling complex large scale nonlinear dynamical networks

Exploring large scale complex systems requires the concepts and approaches delivered by structure-based network control, which investigates the control objective of complex networks through minimum number set of input nodes. However most of the existing network control methods focus on the control-lability and ignore the exist of multiple input nodes for the actual control in complex large scale nonlinear dynamic networks. Considering that the selection of input nodes may depend on algorithms and network representations,we de-sign and implement our algorithm focusing on nonlinear control of undirected networks (NCUA) for ﬁnding multiple input nodes conﬁgurations in large scale nonlinear networks with symmetric edges.The main idea of NCUA is to evaluate the controllability and actual control for driving the undirected network from the undesired attractor to the desired attractor, which is diﬀerent from the traditional linear network control approaches. We valiate our NCUA algorithm in two main respects. One is that we apply our NCUA to both synthetic networks and real-world networks to investigate how the network parameters, such

complex large scale nonlinear dynamic networks. Considering that the selection of input nodes may depend on algorithms and network representations,we design and implement our algorithm focusing on nonlinear control of undirected networks (NCUA) for finding multiple input nodes configurations in large scale nonlinear networks with symmetric edges.The main idea of NCUA is to evaluate the controllability and actual control for driving the undirected network from the undesired attractor to the desired attractor, which is different from the traditional linear network control approaches. We valiate our NCUA algorithm in two main respects. One is that we apply our NCUA to both synthetic networks and real-world networks to investigate how the network parameters, such as the scaling exponent and the degree heterogeneity, affect the controllability of networks with nonlinear dynamics. Another respect is to apply the NCUA to analyze the actual control of the patient-specific molecular networks corresponding to patients across multiple datasets from The Cancer Genome Atlas (TCGA). The experimental results demonstrate the advantages of the nonlinear control method to characterize and quantify the patient-state change over the other state-of-the-art control methods. Thus, our model opens a new way to control the undesired transition of cancer states and also provides a powerful tool for theoretical research on structure based network control.
Keywords: Structure based control, Nonlinear dynamics, Undirected networks, FVS, Single patient system

Introduction
From a network perspective, most of physical, social, biological and computer systems can be represented as networks [1,2,3,4,5]. Analyzing networks from the controllability viewpoint will lead us to a deeper understanding about the dynamics of complex systems, especially large-scale complex systems [6,7,8,9]. 5 Since the control process is dominated by the intrinsic structure and dynamic propagation within the system, the concepts and approaches of structure-based network control are emergently required to investigate the controllability of complex networks through a minimum set of input nodes[9, 10,11,12,13,14].
The analysis of complex systems from the structure-based control viewpoint 10 provides a deeper understanding of the dynamics of complex large-scale biological systems [15,16,17,18]. So far, the studies exploiting the structure-based control of complex networks can be mainly divided into two categories according to the styles of the network dynamics, that is, the approaches focusing on linear dynamic networks and the methods focusing on nonlinear dynamic 15 networks. For linear dynamic networks, many researchers have developed structural control tools including the Liu's Maximum Matching Sets (MMS) based control methods[9, 10,11,18] and the exact control method [12] and the Min-imum Dominating Sets (MDS) based control methods [13,19] to identify the minimum number of input nodes that need to be controlled by external signals 20 for the system to achieve the desired control objectives. The diagram of EC control scheme and Lius control scheme and MDS control scheme are shown in Figure S1.For the nonlinear dynamic networks, an analytical tool called a feedback vertex set based control (FC) has been shown to study the control of large networks in a reliable and nonlinear manner, where the network structure is 25 prior-known and the functional form of the governing equations is not specified, but must satisfy some properties [20,21]. This formalism identifies the nodes and feedback vertex sets (FVS) in networks, uniquely determining the long-term dynamics of the entire network. With such a scheme, Zanudo et al. identified the source FVS as the input nodes to control the direct networks with nonlinear 30 dynamics [14]. In fact, recently Bao et al. present an algorithm to compute and evaluate the critical, intermittent, and redundant vertices for controlling direct networks under the FC framework [22]. However, they still cannot find multiple input nodes configurations and the selection of input nodes configurations in FC is still a big challenge. 35 In fact, the selection of input nodes configurations in FC may depend on algorithms and input network representations (i.e. direct networks or undirected networks). Finding multiple sets of input nodes in direct networks with nonlinear dynamics is much more difficult and is still an unsolved problem. However we can solve the multiple solution discovery problem by considering the spe- 40 cial symmetric edges in undirected networks. In this paper, we first formalize the nonlinear control problem of undirected networks (NCU), that is, how to choose the proper input nodes to drive the network from one attractor to a desired attractor in the networks with nonlinear and undirected dynamics.The NCU focus on the control and controllability of large scale nonlinear dynamic 45 networks with symmetric edges information. To solve this problem, we developed a novel graphic-theoretic algorithm (NCUA) to measure the controllability and the actual control of undirected networks based on the feedback vertex sets.
Specifically, (i) we assume that each edge in a network is bidirectional which forms a feedback loop; (ii) we construct a bipartite graph from the original undirected network, in which the nodes of the top side are the nodes of the original graph and the nodes of the bottom side are the edges of the original graph ( Figure 1 (b)); (iii) we adopt an equivalent optimization procedure for determining the MDS of the top side nodes to cover the bottom side nodes in the bipartite graph that can control the whole network using mathematical terms; 55 and (iv) we apply random Markov chain sampling to obtain the distribution of the input nodes set and uncover the possible sets of the input nodes to control the undirected network.
Since most real world networks have a statistically significant power-law distribution, we generally have defined the control characteristics as the fraction 60 of identified minimum input nodes and applied NCUA for multiple synthetic scale-free (SF) networks and real-world networks, and obtained several counterintuitive findings: i) the fraction of input nodes in the network increases when the degree exponent value increases for fixed average degree, indicating that control characteristics is affected by degree heterogeneity; ii) new degree het-65 erogeneity is defined and the fraction of input nodes decreases monotonically when degree heterogeneity becomes larger for fixed average degree. Furthermore, the degree heterogeneity and the average degree determine the minimum number of control input nodes; iii) the set of input nodes tends to be highly target-connected nodes, whereas the previous linear control study suggested 70 that driver nodes tend to avoid high-degree nodes [9, 10,11,12].
We also investigated the network transition between the disease state and normal state identifiable with the stable network states (dynamical attractors) in personalized patient networks. For each sample of each cancer patient from 10 kinds of cancer sites in TCGA, we constructed a personalized differential 75 network between the normal state and disease state, and applied the NCUA for finding their key control genes on pathologic phenotype transitions. We found that (i) although most of the cancer samples have a similar nonlinear controllability, the determining control genes still differ for different cancer samples; (ii) we identified the controllability of the reconstructed individual networks 80 for single samples across 10 cancer datasets, and we found the high confidence cancer-specific key genes have significant enrichments in the cancer genes census (CGC) set and the FDA-approved drug target genes (DTG) set. Compared with the traditional control model of linear networks including the EC linear control scheme [12] and Liu linear control scheme[9], our results imply that 85 a single-patient system in cancer may be more controllable than predicted on linear dynamical networks due to the ubiquity of the nonlinear features in individual patient system. In contrast to another model on the network control of undirected networks called MDS [13], our NCUA also showed a higher performance in identifying the cancer-specific key control genes in the CGC and 90 DTG, which were underestimated by the MDS. In conclusion, our model provides a new powerful tool for theoretical and empirical study of structure based network control.

95
Network dynamics are commonly nonlinear, especially at the level of nodes or small groups of nodes in the network [23]. Here, we focus on the nonlinear control problem of undirected networks. Given an undirected network G (V, E), we generally consider the following broader class of the model [26] to be the following: where x i denotes the state variable of the i-th node at time t. The set I i is a set of neighborhood nodes of node i; B i ∈ R N * N C characterizes the driving by the N C controllers with the network. nonlinear dynamic. Therefore, our NCU is a more practical problem than the existing structure based control researches of undirected networks [12,13]. Note that the NCU control problem for direct networks with nonlinear dynamics is still an unsolved problem and need to be further studied in the future because current structure based control methods cannot find multiple sets of input nodes 115 for controlling directed networks with nonlinear dynamics. In Figure 1 (a), we give a diagram illustration of our NCU with a simple example. (more details in Supplementary Note 1 of Additional File 1)

Algorithm for the Nonlinear Control of an Undirected Network (NCUA)
In many complex systems, there is adequate knowledge of the underlying 120 wiring diagram, but not of the specific functional forms [23]. Analyzing such complicated systems requires concepts and approaches of structure-based control, which investigates the controllability of complex networks through a mini-  [23,24,25,26,27].

130
Only one of these methods, namely the feedback vertex set control (FC) [20,21] , can be reliably applied to large scale complex networks in which the structure is well known and the functional form of the governing equations is not specified but must satisfy that (i) is the continuous differentiability of that is,F i (x i , x Ii ) ∈ C 1 , and (ii) dissipativity, that is, for any initial condition 135 x(0) and for a finite time t , the dynamical state x(t) is bounded by a positive constant C: x n (t) ≤ C ; and iii) the decay condition is  [14]. Note that the minimum FVS in undirected network G must exist but may have multiple solutions under our assumption that each edge in undirected network is a feedback loop. Therefore our NCU 155 aims to find multiple solutions of minimum input nodes. However, the above FVS based control algorithms cannot search multiple possible proper sets of input nodes for controlling the networks. That is, we still lack an analytical framework for the NCU problem. Therefore to solve the above proposed NCU, we developed a novel algorithm, NCUA for discovering the possible minimum dynamic. Therefore, our NCU is a more practical problem than the existing structure based control researches of undirected networks [12,13]. Note that 170 the NCU control problem for direct networks with nonlinear dynamics is still an unsolved problem and need to be further studied in the future because current structure based control methods cannot find multiple sets of input nodes for controlling directed networks with nonlinear dynamics. In Figure 1  Constructing a bipartite graph from the original undirected network.
For a given undirected network G(V, E), we assume that each edge is bi-direct that the removal of the three nodes leaves the graph without cycles or edges, the system is guaranteed to be controllable from initial attractor to desired attractor. By using EC linear control, they identified one random set Obtaining the cover set with minimum cost by using Integer Linear Programming (ILP). After we obtain the bipartite graph G(V T , V ⊥ , E 1 ), we adopt a modified version of the dominating set, in which the dominating set 200 S must be selected from V T and is also sufficient to dominate all of the nodes in V ⊥ . We use a minimum dominating set cover problem for determining the nodes to control the whole network, that is, how to select a proper node set S, This problem can be solved by solving the following ILP model, where it will take the value x i = 1 when node i belongs to the cover set; the object is to obtain the minimum number of nodes to cover set V ⊥ . Although it is an NP-hard problem [29], the optional solution is obtained efficiently for moderate sizes of graphs with up to a few tens of thousands of nodes by utilizing an algorithm that uses the LP-based classic branch and bound method [12, 13, Figure 1 (b). The MC method is described as follows: Initialization:. By using ILP, obtain the initial Markov Chain M 0 .
Iteration:. By using ILP, obtain the initial Markov Chain M 0 . For t = 1, 2, , 225 obtain M t+1 from M t as follows: • Choose a node w uniformly at random in M t . Then, delete node w and add a new node which can cover the edges connected by node w in the bipartite been obtained.

230
• Accept the new Markov Chain M t+1 randomly.
We terminate the procedure of the MC sampling when the absolute percent-

255
Controllability of the SF network revealed by the NCU in synthetic networks In order to evaluate the control characteristics of the NCU, we applied our NCUA to the synthetic SF networks generated by the static model [9, 32] (more details are listed in Supplementary Note 1 of Additional File 1). We assumed the degree distribution of the undirected network G(V, E) followsP (k) ∝ k −γ .

260
We first defined the fraction of the input nodes n d = S V , where S denotes the set of input nodes to control the whole network and V denotes the number of connected nodes in the network. Then, we applied our NCUA to estimate the minimum number of input nodes to control the networks with nonlinear dynamics. For a given γ and average degree < k >, 100 networks of 10,000 nodes 265 were constructed. The results of the NCUA were averaged over all realizations.
We list the numerical results of our NCUA for the synthetic networks in Figure   2.
In fact, we plotted the NCU size as a function of the degree exponents and the average degrees and list the results in Figure 2 (a-c). In Figure 2(a), we show 270 that for γ < 2, the number of input nodes increases as γ increases, while the number of input nodes does not depend on the average degree < k >. However, if the value of γ is above 2, the number of input nodes is governed by both γ and < k >. Furthermore, SF networks with a large value of γ above 2 or large value of < k > are hard to control, as shown in Figure 2(a, c). These results 275 are complemented by Figure 2 (b-c), where it shows that, compared with the Erdos-Renyi random networks (ER), only a few nodes are needed to control the entire network if the power law degree exponent γ is smaller than or around 2, whereas it is more difficult to controlled with a value of γ above 2. This result gives insight into which SF network will be easier to control with the 280 minimum number of input nodes. To more clearly visualize the impact of the network structure on the number of input nodes, we plot the NCU size as a function of the network degree heterogeneity for fixed < k > in Figure 2(d).
We observe that for the fixed average degree < k >, the NCU size decreases as the degree heterogeneity defined in Methods increases. These results illustrate 285 that heterogeneous networks are not difficult to control, which is opposite to the conclusions of the EC linear control scheme [12] and Liu's linear control scheme[9]; however, these results are in agreement with the results of the MDS control scheme [13].In Figures 2(e) and (f), S3, and S4, we list the number of input nodes in the function of the network size for a fixed degree exponent with 290 γ = 1.4, 1.6, 1.8 , and γ = 2.4, 2.6, 3.4, and3.6. We find that the number of input nodes decreases with the increasing network size for γ < 2, while for γ > 2, the number of input nodes are not affected by the network size.
Counterintuitive findings of the controllability from the NCU on real-world networks 295 We collected 17 networks with 11 categories, which were chosen for their diversity in applications and scopes (Additional File 2). By calculating the p-value of the Kolmogorov-Smirnov goodness-of-fit statistic [33], whose results are listed in Table S3 of Additional File 1, we found that the above networks are significantly subject to the power-law distribution; the detailed results are  Figure S4, we show that the number of input nodes has a tendency to increase as the exponent and the average degree increase. Furthermore, in Figure S4, we can evaluate the value of scaling exponent approximately by fitting its control characteristic to that on the synthetic 310 networks.
We observed that the degree heterogeneity (defined in Methods) becomes  larger as the number of network nodes increases ( Figure S3 and Figure S4). To clearly visualize the impact of the degree heterogeneity and the average degree on the number of input nodes, we first eliminated the effect of the network 315 number on the degree heterogeneity by using the following formula,  Figure 3 (a), we find that networks with a lower average degree and higher degree heterogeneity are easier to control than those with a large average degree. The control characteristics of networks can be fully discriminated by the new converted degree heterogeneity and the average degree. We also find 330 that the set of input nodes tends to highly target connected nodes, whereas the previous linear control study suggested that driver nodes tend to avoid high degree nodes, as shown in Figure 3(b) [12].
We observe that most types of biological networks (e.g., gene regulatory, PPI, and genetic networks) require the control of a smaller fraction of nodes 335 than social networks (trust and social communication networks); the fraction of input nodes is between 10% and 30% in biological networks vs. more than 40% in social networks. These predictions match well with recent experimental results in cellular reprogramming and large scale social network experiments [34,35].
Note that this prediction stands in contrast with those of linear control [12] on 340 the same type of networks, and to some extent, can address the initial arguments on network controllability [34,36].
To ensure that our NCU is physically significant, we then focused on the required control energy and the control time to achieve control for networks with nonlinear dynamics. We applied a 3-dimensional stable nonlinear Lorenz  (1),  Advanced discovery of individual phenotype-transition genes in cancer samples using the NCUA

370
Cancer is a complex disease that generally results from a dysfunction of the relevant system or network with nonlinear dynamics, which dynamically changes with time and conditions [40,41]. We also investigated the network transition  Table S1 in Additional File 1. We constructed the patient-manner interaction network by integrating the tumor and normal expression data for each patient and gene interaction network data [41,42]. First, we chose all the normal data as the reference data and constructed the tumor 395 network and normal network, respectively, based on the reference data with SSN method [41]. Next, we constructed the patient-manner differential expression network where the edge between gene i and gene j will exist if the p-value of the edge is less than (greater than) 0.05 in the tumor network, but greater Construct the patient-manner differential expression network where edge between gene i and gene j will exist if p-value of the edge is less than (greater than) 0.05 in the tumor network but greater than (less than) 0.05 in the normal network; ii) Calculate the differential expression genes with the +/-1 log2 fold change and obtain the sub-network which consists of the differential expressed genes and where edge exist in our constructed patient-manner interaction network if edges exist in both PPI network and the patient-manner differential expression network. Note that the method for constructing network for each patient needs at least three control (normal) samples.
than (less than) 0.05 in the normal network. We calculated the patient-manner 400 differential expression genes with the +/ − 1 log2 fold change and defined the patient-manner interaction network as the sub-network, which consists of the differential expressed genes and where the edge exists in the both gene-gene interaction network and the patient-manner differential expression network for each patient ( Figure 5).

405
For each sample of each cancer patient from 10 kinds of cancer sites in TCGA, we constructed a personalized differential network between the normal state and disease state, and applied the NCUA for finding their key control genes on pathologic phenotype transitions. We found that (i) although most of the cancer samples have a similar nonlinear controllability, the key control genes Other cancer datasets used here are shown in Figure S6. Finally, we computed the p-value of the high-confidence cancer-specific key control genes enriching the cancer genes census set or FDA-approved drug targeted genes set [43,44] by using the hyper-geometric test [45]. If the calculated 450 p-value was less than 0.05, then we regarded that this cancer gene set is significantly enriched in the Cancer Genes Census set and FDA-approved DTG set. Figure 7 (b) shows that the high-confidence key control genes for different cancer datasets have a good enrichment in the cancer genes census set and the FDA-approved DTG set. Furthermore, we find that the set of input nodes tends 455 to target highly connected nodes, as shown in Figure S8 of Additional File 1.
These results are in agreement with previous biological observations [34,36].

Comapare NCUA and MDS in undirected networks
Nacher and Akutsu introduced the MDS to study the controllability of undirected networks by assuming that each node in the MDS can control all of its 460 outgoing edges separately [13]. However, the MDS-based model assumes that more powerful control is possible (because each driver node can control its out- In this work, our control model NCU drives the whole-networked system from the initial state toward its desired dynamical attractors (e.g., the steady 475 states and limit state cycles) by steering the input nodes to the desired dynamic attractors. Our NCU algorithm (NCUA) predicts the input nodes whose override (by an external controller or drive signals) can steer a network's dynamics toward its desired long-term dynamic behaviors (its desired dynamical attractors). Furthermore, we used the NCU control model on biological, technological, 480 and social networks, and we identified the topological characteristics underlying the predicted node overrides. We also identified that the networks with a low average degree are easier to control than those with a large average degree, which is opposite to the previous observation from the MDS theory, as shown in Figure S8. We summarize the difference between the MDS-based method and the NCU method in  Figure 8), are defined as NCU key cancer specific genes, but not of the MDS key cancer specific genes, while the unique MDS key cancer specific genes (red color in Figure 8), are defined as the MDS key cancer specific genes, but not of the NCU key cancer specific genes. Finally we provide the enrichment results from the CGC set ( Figure 8 (a)) and DTG set (Figure 8 (b)) of the unique NCU key cancer specific genes 505 and the unique MDS key cancer specific genes in the 10 cancer datasets. Figure   8 shows that the NCUA can identify the key genes in the CGC set and the FDA-approved DTG set, which are missed using the MDS method. That is, our NCU capture the nonlinear dynamic of undirected networks while MDS focus on the linear dynamic of undirected networks. Furthermore the output 510 of NCU is all possible input nodes while the output of MDS is one random set of input nodes. Therefore the NCU model provides us with a more complete insight into the control of undirected network-based systems.

Conclusions
Controllability and actual control are two key issues associated with con-515 trolling large scale nonlinear dynamic networks. In the past decades, existing control methods can be divided into two main categories: one is the linear dynamic focus control approaches, which focus on the large scale linear networks and ignore the nonlinear dynamic on complex networks[9, 49,11,18]; another is the nonlinear control methods which evaluating the actual control and con-