Structural Controllability of Complex Networks Based on Preferential Matching

Minimum driver node sets (MDSs) play an important role in studying the structural controllability of complex networks. Recent research has shown that MDSs tend to avoid high-degree nodes. However, this observation is based on the analysis of a small number of MDSs, because enumerating all of the MDSs of a network is a #P problem. Therefore, past research has not been sufficient to arrive at a convincing conclusion. In this paper, first, we propose a preferential matching algorithm to find MDSs that have a specific degree property. Then, we show that the MDSs obtained by preferential matching can be composed of high- and medium-degree nodes. Moreover, the experimental results also show that the average degree of the MDSs of some networks tends to be greater than that of the overall network, even when the MDSs are obtained using previous research method. Further analysis shows that whether the driver nodes tend to be high-degree nodes or not is closely related to the edge direction of the network.


Introduction
Controlling complex systems is a critical topic in many applications. A system is called controllable if it can be driven from any initial state to any desired state in a finite time. Previous researches have usually adopted a complex network as the fundamental model to analyze the topological structure [1][2][3], the evolving model [4][5][6], and the dynamic behavior [7][8][9] of complex systems.
However, we still lack a thorough understanding of how to control complex networks. According to the control theory, a linear time-invariant system whose states are determined by the following equation: where the vector x(t) = (x 1 (t), …, x N (t)) T , denotes the state of N nodes in the network at time t, A is the transpose of the adjacency matrix of the network, B is the input matrix that defines how control signals are inputted to the network, and u(t) = (u 1 (t), …, u H (t)) T represents the H input signals at time t. A node whose control signal is directly inputted is called a driver node. The minimum sets of driver nodes to control a network are called the minimum driver nodes sets (MDSs). Lin [10] presented a network representation of linear timeinvariant systems and stated that the system is structurally controllable if and only if the network can be spanned by cacti structures. Commault [11] proved that the minimal signals need to control a network can be obtained by maximal matching [12] of network. Based on above works, Liu [13] developed an analysis tool to study the controllability of an arbitrary complex directed network, and found that MDSs tend to be composed of low-degree nodes in both real and model networks.
However, the maximum matching of a network is usually not unique [14], and thus neither are the MDSs. Previous studies [15][16][17][18][19] have only randomly sampled MDSs and analyzed a small number of the MDSs of a network because enumerating all possible maximum matchings is in the class of #P problem [20]. Therefore, the past researches have not been sufficient to arrive at a convincing conclusion about whether MDSs tend to avoid high degree nodes or not.
In this paper, we propose a preferential matching algorithm to find some MDSs with desired degree properties. To find these MDSs, the algorithm arranges the matching order of the nodes according to their degree rank. Because low-ranking nodes have higher probabilities of being driver nodes, the obtained MDSs tend to be composed of the high-or the medium-degree nodes of the network. The algorithm can also be applied to obtain the MDSs with other topological properties.
By using the preferential matching algorithm, we found that there were some MDSs composed mainly of high-and mediumdegree nodes in some networks. Moreover, in some networks, the average degree of the MDSs tended to be greater than that of the overall network, even if the MDSs were obtained using the previous random-matching method.
We conclude that there are networks that favor low-degree MDSs and other networks that favor high-degree MDSs. To find the underlying reason for this phenomenon, we designed a directed BA model for model networks and a reversal strategy for the edge direction for real networks. The experimental results showed that the MDSs of the network tended to be composed of high-degree nodes if the majority of the edges of a network were pointing from high-degree nodes to low-degree nodes; otherwise, the MDSs of the network tended to be composed of low-degree nodes. Therefore, whether the driver nodes tended to be high degree or not was closely related to the edge direction of the network.

Preferential Matching Algorithm
First, we will briefly introduce the basic concepts of maximum matching. The minimum input theorem [38] proves that if there is a perfect matching in a network, the number of driver nodes is one, otherwise the number of driver nodes is equal to the number of unmatched nodes with respect to any maximum matchings. And the driver nodes are unmatched nodes. The size of the maximum matching M * is denoted |M * |. The minimum number of driver nodes is thus Based on this theorem, the MDSs can be obtained by finding the maximum matchings of a network. Therefore, it is critical to find all of the maximum matchings. Previous maximum matching algorithms, such as Hopcroft-Karp [12] and the Hungarian algorithm [21], are based on the theorem proposed by Berge [22]. That theorem proves that M * is a maximum matching if and only if there is no augmenting path in G relative to M * . Therefore, the basic idea of the maximum matching algorithm is as follows: first, find an augmenting path from each unmatched node by current matching M (initially M = Q), then obtain an expanded matching M'. Repeat the first and the second steps until no augmenting path exists. The final matching is a maximum matching. Using this process, once a node v i becomes a matched node, it will be matched by the final maximum matching and won't be a driver node.
Therefore, if we deliberately arrange the matching order of nodes according to the order of degree, we would find MDSs with a desired degree property such as finding some high-degree MDSs, particularly when a network has many maximum matchings. However, the matching order of nodes is determined by the time when a node first appears in the augmenting path, but the time is hard to be pre-decided. It is possible that a node with a high degree appears very early in an augmenting path, even if it is arrange to be the last one as the start of augmenting paths. For example, we can sort the nodes as {v 0 ,v 1 ,v 2 ,v 3 ,v 4 ,v 5 ,v 6 } in the ascending order by degree and treat this order as the input sequence to select the unmatched start node in finding an augmenting path. But we may find an augmenting path P v 0 Rv 4 Rv 5 Rv 6 at the very first step. Although the path starts from v 0 with the lowest degree, it contains the highest degree nodes v 4 , v 5 and v 6 and these nodes cannot be the driver nodes of the final MDSs. Thus, the matching order of the nodes would be quite different from the degree order of the nodes, and the MDSs with a desired degree property could not be easily found.
To overcome this problem, we designed an iterative preferential matching method. We sort the nodes as {v 0 , v 1 ,…v n } in the ascending order by degree and denote m as the number of preferential matching nodes. The method starts from the sub graph H 0 with the lowest-degree node ranked first; at each   iterative step i, the sub graph H i will be extended by adding the node with the i-th rank, and the maximum matching of H i is calculated based on the previously obtained maximum matching of H i-1 . We repeat this procedure until the sub graph H i is equal to the whole network or until m preferential nodes have been added. Details of the preferential matching method are as follows: An example of the proposed method is shown in Figure 1. We obtain a maximum matching of G in the step 4. And, as with current algorithms [12,21], once v i is matched in the process, it must be matched by the resulting maximum matching. The proposed method ensures that we can find the maximum number of matched nodes of H i from the first i ranking nodes and that a high-degree node will not be matched in early steps because the node is not included in the early sub-graphs. Therefore, we can make the matching order of the nodes as similar as possible to the predefined order of degrees. Thus, high-degree nodes will have a higher probability of being the driver nodes. However, the order of arrangement has no influence on some particular nodes, for instance the nodes with zero in-degree must be driver nodes no matter what the input order is.

Experimental Results and Analysis
To analyze the degree property of MDSs, we selected 21 real networks that belong to 12 categories, including trust networks, food networks, electric networks, neuronal networks, citation networks, the World Wide Web, the internet, social communication networks and social organization networks. Table 1 shows the average degree of a network ,k., the size of the networks' MDSs n D , and the fraction of driver nodes l D = n D /N. First, we find the MDSs with the desired high-degree property based on the preferential matching algorithm. Let ,k D . be the average degree of the MDSs obtained under a different number m of preferential nodes, and let ,k D max . and ,k D min . be the maximum and the minimum ,k D . of all of the obtained MDSs, respectively. Figure 2 shows the variation in ,k D . versus m in the real and model networks. Obviously, the preferential matching method can find MDSs with the preferred high-degree property, if the nodes are sorted in descending order according to degree, , k D . will decrease with m to the lower bound ,k D min .. From Table 1 and Figure 2, a basic observation was that the MDSs were structurally diverse: the ,k D . of many networks varied widely. Thus, the different MDSs of the same network could have quite different degree properties. Moreover, ,k D max . was greater than ,k. in many networks, such as the Grassland, Seagrass, Ythan, and Florida networks. Therefore, we were able to find the MDSs whose ,k D . was greater than the average degree of the network.
To further verify the above observation, we analyzed the degree distribution of driver nodes of the MDSs with high ,k D .. We computed the MDS with the highest average degree ,k D max . by using the preferential matching method. Figure 3 shows the results of some real and model networks. In Figure 3, each point corresponds to the set of nodes with the specific degree k. The black point means that no node with the degree k appears in the result MDS, and the red point means that some nodes with the degree k appear in the result MDS. The inset graph shows the degree distribution of all driver nodes of the MDS with ,k D max .. Therefore, if all red points have high degree, the MDS tends to be composed of high-degree nodes. It can be seen from Figure 3 that there do exist the MDS mainly composed of high-or mediumdegree nodes in some networks. Taking the world-trade 38 network as an example, 66.2% of its nodes have k#20, but none of these low-degree nodes appeared in the result MDS; meanwhile, 88.9% of the rest high-degree nodes with k.20 appeared in the MDS. Similar results can be observed in the BA and ER networks. However, not all networks had the MDS mainly composed of high-degree nodes. The MDS with ,k D max . of some networks was composed of the nodes with degree ranging from the lowest degree to the highest, such as the seagrass [26], florida [27] and c.  elegans [29] networks, while the MDS with ,k D max . of other networks was mainly composed of the low-degree nodes, such as the P2P-1 [33] network.
Second, we tried to prove that the average degree of the MDSs of some networks tended to be greater than that of the overall networks, even if the MDSs were obtained using the previous random matching method. In the experiment, we randomly sampled 10,000 different MDSs of each network. Table 1 shows the average value vk D w of the average degree of all of the sampled MDSs because the average degree of the different MDSs varied. We found that the vk D w of some networks, such as the Zewail, world trade and literature networks, were greater than or equal to ,k. even when using the previous sample method [13].
Finally, these experimental results provoked us to explain why the driver nodes of some networks tended to be low degree while others were not. According to the minimum input theorem, a driver node is not pointed to by any matched edge. Therefore, if the majority of edges of a network point from high-degree nodes to low-degree nodes, the MDSs tend to be composed of high-degree nodes. Otherwise, the MDSs tend to be composed of low-degree nodes. Figure 4 gives an example where two networks have the same topology except that the directions of their edges are opposite. The edges of the network in Figure 4(a) are pointing to the low-degree nodes, while the edges in Figure 4 Therefore, we believe that the node composition of the MDSs is closely related to the direction of the edges in a network. To verify this hypothesis, we designed a revised BA model to generate directed networks. The model was the same as the classical BA model [39] except that the direction of a newly added edge is determined by the following rule: the direction of the new edge points from an existing old node v old to a new node v new with probability p, and the probability of pointing in the opposite direction is 1-p. Therefore, if p is large enough, the edges of a high-degree node v old will have a high probability of pointing to other nodes. The result of this arrangement is that the edges of a generated network tend to point from high-degree nodes to lowdegree nodes, so the high degree nodes are more likely to be the source nodes [40], which must receive the control signal from outside. We calculated the fraction f hi-lo of edges that pointed from high-degree nodes to low-degree nodes in a directed BA network. Figure 5(a) shows the linear relation between f hi-lo and p.
Then, we randomly calculated 10,000 MDSs of several directed BA networks using the Hopcroft-Karp algorithm. Figure 5(b) shows the average degree of the MDSs vk D w increases with p. When p = 0.5, which means that the direction of the edges are randomly decided, vk D w is much less than ,k.; as p increases to close to 1, vk D wgradually becomes greater than ,k.; and in Figure 5(c), when p = 1, the vk D w of all of the directed BA networks is always greater than ,k..
We also verified this hypothesis in the real networks. Due to the complexity of degree correlation in real directed networks [41], there may be no obvious relationship between vk D w and f hi-lo in different real networks. Therefore, we designed the following edgereversal strategy to verify this hypothesis: for an edge v i Rv j , if k i , k j , then reverse the edge direction to v j Rv i with probability R. Similarly to the directed BA model, if R is large enough, the edges of a high-degree node will have a high probability of pointing to a low-degree node. Figure 5(d) shows vk D w versus R. We can see that if the original vk D w of a network is less than ,k., the vk D w increases gradually with the increase of R and becomes greater than or equal to the ,k. of the network. However, for a few networks such as TRN-Yeast-1, the average degree of the MDSs will decrease with R. This finding suggests that other topological factors also influence the degree properties of MDSs, although the direction of the edges may be a major factor.

Discussion
The minimal driver nodes set can be obtained by finding the maximal matching of network. However, the MDSs of a network are not unique, and have very different topological features exist. Thus, one important research direction in the controllability of complex networks is analyzing the topological features of all of the possible MDSs.
However, enumerating all of the MDSs is in the class of #P problem, so we tried to find the MDSs with specific topological features. Our contribution in this paper was twofold. First, we proposed a MDS-discovery method based on preferential matching. This method could effectively find a MDS with a high average degree by arranging the matching sequence of nodes based on the order of their degree. Furthermore, we were able sort nodes by any desired property and found a MDS satisfying that property. The algorithm also showed the promise for finding a MDSs that satisfy application-specific constraints. For instance, if some nodes cannot be driver nodes in practice, we let these nodes be matched with high priority in the preferential matching process; thus, a MDS without these nodes can be obtained if such a MDS exists.
Second, we found that whether driver nodes tended to be low degree was closely related to the direction of edges. If the majority of edges pointed to low-degree nodes, control signals were required to transfer from high-degree nodes to low-degree nodes; thus, the MDSs tended to be composed of high-degree nodes.
Future research will investigate all of the possible MDSs and analyze the degree distribution of the driver nodes of networks. In this manner, we may discover an optimal strategy for finding MDSs that satisfy specific constraints.