Efficient target control of complex networks based on preferential matching

Controlling a complex network towards a desired state is of great importance in many applications. Existing works present an approximate algorithm to find the input nodes used to control partial nodes of the network. However, the input nodes obtained by this algorithm depend on the node matching order and cannot achieve optimum results. Here we present a novel algorithm to find the input nodes for target control based on preferential matching. The algorithm elaborately arranges the matching order of the nodes to reduce the size of the input node set. The results on both synthetic and real networks indicate that the proposed algorithm outperforms the previous algorithm.


Introduction
The control of complex networked systems plays an important role in many nature and technology applications. According to control theory [1][2][3], a system is controllable if the system can be driven from any initial state to any desired state in finite time. The external control signals can be inputted into the system through some suitable selected nodes. The nodes which received independent external signals are called input nodes [4], controls [5] or driver nodes [6]. An input node is the first node of a control path which transmits the control signals.
The input nodes, used to fully control the network, can be obtained by maximum matching of the network [7]. The unmatched nodes are the minimum set of input nodes (in short, MIS). Based on this framework, the researchers have analyzed the structural properties of MIS [8][9][10], roles of nodes in control [5], and robustness of controllability [11]. The size of MIS is found to be tied to the degree distribution [6], and mainly dominated by the number of the source and sink nodes [5]. Furthermore, the possible input nodes which participate in at least one MIS are connected by the input adjacency [4], and they exhibit a surprising bifurcation phenomenon of the dense networks [12], which is rooted by the emergence of giant control component [4].
In many real control scenarios, only a small fraction of nodes need to be controlled. This is called target control [13]. To control the target community of a network, Piao et.al [14] a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 presented a method which used immune nodes to facilitate the control of target communities. To find the input nodes to control any specific target nodes, a recent work [13] presented an analysis framework to investigate the target control of complex networks. They proposed an approximate greedy algorithm (GA) based on multiple maximum matchings to obtain the input nodes used to control the target nodes.
However, the GA can only find the approximate minimum set of input nodes. If there exists more than one maximum matching in the network, the results of the GA strongly depend on which maximum matching is selected. For example, the number of input nodes may vary over a large range [13] (Fig 1). Therefore, finding the minimum number of input nodes for target control is still an unsolved problem.
Here, we present a novel algorithm for finding input nodes to control the target nodes of a network. In contrast to the previous approach, we elaborately arranged the matching order of the nodes and tried to reduce the total number of input nodes. The results on both synthetic and real networks showed that we obtained fewer input nodes than the previous approach.

Method
Consider a linear time-invariant system, its states can be described by the following: Where x(t) = (x 1 (t),. . ., x N (t)) T represents the system's state; u(t) = (u 1 (t),. . .,u M (t)) T represents the input vector and y represents the output vector; A is the transpose of the adjacency matrix, B is the input matrix and C is the output matrix which defines the target nodes we want to control. Let the network representation of above system be G(V, E), where V is the nodes set and E is the edges set. For a target node set T, we say the system is target controllable if the states of the target node set T can be driven from any initial state to a desired final state [13]. In previous work [13], a k-walk theory was proposed, and this theory proved that in a treelike network, if a node has paths of different lengths to each target node, the node can control these target nodes. However, a single node cannot control all target nodes in many networks. Therefore, for more general networks with loops, previous work [13] proposed an approximate algorithm based on multiple maximum matchings to obtain the input nodes. The algorithm constructs a series of bipartite graphs B = {B 1 (T 1 ,F 1 ,E 1 ),. . .,B i (T i ,F i ,E i )} by following procedures: 1. Let the set of target nodes be T 1 , find the set of in-neighbor nodes of T 1 and denoted it as F 1 , construct bipartite graphs B 1 (T 1 ,F 1 ,E 1 ), where E 1 is the set of edges between nodes set T 1 and F 1 ;2. Let the F 1 be the new target set T 2 and repeat step 1 to get bipartite graph B 2 (T 2 ,F 2 ,E 2 ); 3. Repeat above steps until the set of in-neighbor nodes of the current target set T i is empty. After constructing the bipartite graphs, the algorithm finds the maximum matchings of each bipartite graph, and the union of the unmatched nodes of all bipartite graphs is the set of input nodes used to control the target nodes.
The key idea of this algorithm is to find the maximum matching for each sub-bipartite graph. However, in most networks, the maximum matchings are not unique. Therefore, even for a simple network, the algorithm produced different results with different maximum matchings. For example, for the network shown in Fig 1A, the algorithm obtained three different input node sets: D 1 = {1, 2, 5}, D 2 = {1, 5} and D 3 = {1}. The reason for the multiple results is that the maximum matchings used in the algorithm are different. For example, if we matched edge e (1!4) rather than e(2!4) in sub-bipartite graph 3, node 2 would not act as an input node, resulting in the input node set D 2 = {1, 5}. If we match edge e(6!7) rather than e(5!7) in sub-bipartite graph 2, we obtain only one input node D 1 = {1} to control the entire target node set.
Therefore, to reduce the total number of input nodes, we need to select the appropriate maximum matching for each sub-bipartite graph. However, the number of unmatched nodes of each sub-bipartite graph is fixed because the maximum matchings of each bipartite graph have the same size. The only way to decrease the number of input nodes is to allow the input nodes of different sub-bipartite graphs to overlap with one another. For example, in Fig 1D, the unmatched node of all four sub-bipartite graphs is node 1, which decreases the total number of input nodes from three to one.
To obtain the expected input nodes of each bipartite graph, we use the preferential matching [15] to find maximum matching of each bipartite graph. The preferential matching method arranges the matching order of the nodes based on a predefined queue, and ensure that the nodes in the rear of the queue have a high probability of being input nodes. The preferential matching method first constructs a series of sub-graphs based on the node queue, and then finds the maximum matching of each sub-graph until the maximum matching of the whole network is obtained. This iterative matching process ensures that the nodes in the front of the queue have a high probability to be matched. Therefore, the resulted input nodes are most likely the nodes in the rear of the queue.
Therefore, the problem is selecting the appropriate input nodes of each sub-bipartite graph to reduce the total number of input nodes. Here we present the following strategies: 1. The input nodes of the current sub-bipartite graph should be overlapped with the input nodes of the previous sub-bipartite graph. This process will decrease the total number of input nodes.
2. The nodes that frequently appear in the matching graph (for all sub-bipartite graphs) should be input nodes with high priority, which will give the nodes in subsequent subbipartite graphs high probability to overlap with existing input nodes.  Fig 2A, we construct a matching graph (MG) that starts from the target nodes and iteratively adds the parent nodes of current nodes to the graph, until no more nodes are added. We count the frequency with which each node appeared in the matching graph and arrange the nodes in ascending order of frequency. For example, Fig 2B shows the matching graph of Fig 2A, and the counts of nodes are n 1 = 4, n 2 = 3, n 3 = 1, n 4 = 3, n 5 = 2, n 6 = 3, n 7 = 2 and n 8 = 1, respectively. Therefore, the matching sequence of nodes should be {n 8 , n 3 , n 7 , n 5 , n 4 , n 6 , n 2 , n 1 } according their counts by ascending order. For each sub-bipartite graph of MG, we used this matching sequence to find input nodes.
Overall, for a network G and target node set T, the algorithm based on preferential matching (PM) for finding input nodes consists of the following steps: 1. For target node set T, construct bipartite graph B 1 (F, T), where F are the node sets pointing to target node set T.

Result
To quantify the efficiency of the algorithm, we evaluated the fraction of input nodes n D = N D / N based on a PM algorithm and GA [13]. We used the following two different schemes for target node selection: 1. Random selection scheme: Select nodes from the network uniformly at random as targets, until reaching the expected target fraction f.
2. Local selection scheme: Randomly select a seed node, and expand the node based on a breadth-first search (BFS) tree, until reaching the expected target fraction f.  Next, we analyzed n D with different average degrees <k>. Fig 4 shows the results for both the scale-free networks and ER random networks based on local and random target selection schemes. The PM algorithm obtains lower n D than GA in all networks. Note that the variations of n D for the local selection scheme of target nodes are much larger than those variations for the random selection scheme, suggesting that there are many input nodes set to control target nodes that are locally connected.
We also evaluated the performance of the PM algorithm in real networks. The networks are selected based on diversity of topological structure and include food web, transcription, citation, and Internet networks. The results are shown in Table 1 and Fig 5. For all networks and fractions of target nodes in both random and local schemes, the PM algorithm outperforms the GA.

Discussion
The controllability of complex networks is of great importance in many applications. Controlling a small fraction of target nodes is a common task in many real control scenarios. Here we proposed a novel algorithm based on preferential matching to reduce the number of input  For each network, we show its type, name, number of nodes (N) and edges (L), average degree <k>, density of input nodes (n pd ) based on preferential matching, and density of input nodes (n rd ) based on random matching. The fraction of target nodes f = 0.5.
https://doi.org/10.1371/journal.pone.0175375.t001 nodes. Our algorithm has the same main steps as the previous algorithm [13], based on multimaximum matching of the induced bipartite graphs. However, we elaborately arranged the matching order of the nodes, which can significantly reduce the number of resulting input nodes. However, our algorithm still cannot guarantee the optimum result. Future work should focus on finding an efficient and precise method to reduce the number of input nodes.