Fast Fragmentation of Networks Using Module-Based Attacks

In the multidisciplinary field of Network Science, optimization of procedures for efficiently breaking complex networks is attracting much attention from a practical point of view. In this contribution, we present a module-based method to efficiently fragment complex networks. The procedure firstly identifies topological communities through which the network can be represented using a well established heuristic algorithm of community finding. Then only the nodes that participate of inter-community links are removed in descending order of their betweenness centrality. We illustrate the method by applying it to a variety of examples in the social, infrastructure, and biological fields. It is shown that the module-based approach always outperforms targeted attacks to vertices based on node degree or betweenness centrality rankings, with gains in efficiency strongly related to the modularity of the network. Remarkably, in the US power grid case, by deleting 3% of the nodes, the proposed method breaks the original network in fragments which are twenty times smaller in size than the fragments left by betweenness-based attack.


I. INTRODUCTION
Network theory and its applications pervade many different scientific fields, from physics to sociology, engineering, epidemiology, mathematics and economy -to cite a few.In the context of network science, three important concepts have received much attention recently: interdependent graphs [1,2], communities (or modules) [3], and robustness of networks facing targeted attacks [4].In the present work we address and bring together these last two concepts.
The robustness of networks against failures, targeted attacks to individuals components, and the impact on the performance of the system has become an important issue for practical reasons in the last few years.In this sense, the robustness of a network is often related to the structural functionality of the system as a whole, so information can propagate over the network for example.For instance, the failure of routers in the Internet [5], the vaccination of individuals to prevent the spread of a disease [6], and fighting organized crime or terrorist groups [7] can all be described by a formal model in which a certain number of vertices (edges) in the network are removed [8].Therefore, the robustness of a complex network is directly related to the fraction of nodes (edges) needed to be removed so that the network loses its functionality.Conversely, the less the number of nodes that a method identifies to break down a network, the more efficient it is.In this sense, many centrality indexes have been proposed aimed to measure the structural importance of nodes (edges) [8].For instance, the concept of bridging nodes in the topology of complex networks has been brought to discussion too [10].Hwang et al. [11] define a bridging centrality in order to characterize the location of nodes among high degree nodes.The method succeeds in identifying functional modules but does not show significantly better results than simple betweenness attack when it comes to atomize different complex networks.Nevertheless, recent developments in efficient community extraction algorithms from complex graphs [12,13] show a promising pathway in devising better attack strategies.In effect, communities or modules are topological partitions of graphs with dense internal connections but weakly connected among them.In this sense, in Fig. 1 we depict the community structure for the Western United States Power Grid, illustrating this weak connection among clusters internally dense.Henceforth, a natural question arises: How structurally important are those weak interactions bridging distinct communi-ties and how are they related to the robustness problem?To answer this query is precisely the main objective of this study.The work is organized as follows: section II discuss the generalities of attack in networks and the concept of robustness, section III describes our method to perform the attacks, while in section IV we present the results of the procedure to ten examples of real networks with conclusions summarized in section V.

II. ATTACKS
In order to quantify the effect of the attacks on the networks [14], we define G as an initial network of size N , and G ρ as the network that results after the removal of a fraction ρ of vertices (edges).Then we denote by L ρ the largest component of G ρ , whose size we denote by N L .We define the order parameter σ(ρ)= N L N which allows us to quantify the response of a network as a function of the fraction of nodes (edges) deleted.
An hypothetical way of getting the ordered list of targeted nodes to be removed would be by brute force: try all the possible lists until find the one that reduce the network to a desired size with the minimum number of remotions.However this is useless because it means checking N !possible lists, which is computational prohibited for any network bigger than N ≈ 12. On the other hand, the simplest but no efficient strategy is random deletion of nodes, i.e. make a random list of the nodes and remove them in that order.This generally gives rise to a linear degradation of the network in which the fraction removed is mostly not much than the nodes removed.A more efficient and doable way of attack a graph consists in the deletion of vertices (edges) in order of their importance in the structural functioning of the network.In this sense, traditional attacks focus on sorting nodes (edges) in decreasing order of some centrality indexthe so called Centrality-Based Attack (CBA)-, which perform much better than random attacks.Betweenness centrality, for instance, takes a time, depending on the algorithm used, of only O(N ×E), where E stands for the number of edges in the network.This way, if we choose some method as a null reference, the gain in efficiency can be computed by the normalized ratio which increases as the attack method becomes more efficient than the reference one.Even though most attack methods focus on centrality ranking, real networks tend to group into sparsely connected clusters and the removal of few bridging structures should be able to detach large chunks of densely connected nodes, leading to large values of Γ(ρ), as we shall see in the next section.

III. MODULE-BASED ATTACK
The structural importance of a node (edge) depends both on local and non-local measures.Hence, in the scope of the method proposed in this paper, centrality and community detection are the topics that we address to characterize and sort nodes (edges) in order to develop the attack on networks.As pointed out in the works by Iyer et al. [8] and Holme et al. [15] nodes with high betweenness and high degree are usually strongly correlated and both attacks have similar efficiency.Besides, previous work shows that for real networks the betweennessbased method is in general the most efficient [8].Thence, from now on we take betweenness centrality attack as our reference or null method.
Likewise, vertices connecting different communities generally have high betweenness centrality since many shortest paths pass through them.However, as fewer connections are expected among communities these nodes are not the ones with higher degree.Therefore, in order to detach communities in a very efficient way, we propose a Module-Based Attack (MBA) consisting of sorting all nodes (edges) by betweenness centrality, then choosing only those nodes (edges) that link different communities.One should note that in vertex attack, as we aim to detach previously detected communities, once a node from a bridging edge is deleted, there is no need to detach its counterpart unless it also participates in other inter-communities connections.Besides, at each step of the procedure, the attack will focus on the remaining largest connected component of the network, in order to speed up the fragmentation.This process loosely resembles the original idea of weak ties proposed by Granovetter [16] for social networks and later developed in the framework of topological communities by De Meo, Ferrara et al. [17].Nonetheless, this procedure is slower than traditional methods since fast community extraction takes, depending on the algorithm used, about O(N 2 ), resulting in a lower limit of computation of where N is the total number of nodes and E stands for the total number of edges.
As a preemptive measure of our proposed method we show in Fig. 2 the relation between the fraction of nodes in the interface of communities, i.e. the fraction of nodes that make the connection between the different modules extracted from the networks, and the value of the modularity for each one of the real networks that we present in the next section.As expected, we observe an approximate linear (negative) correlation between the fraction of edges bridging communities and the modularity Q, which is precisely the desired feature that makes the method potentially well posed.As we can see, the infrastructure networks are the ones which better adjust to the linear behavior while the social networks are the worst cases.

IV. RESULTS
We now apply the method to real networks with different topological structures.We investigate the behavior of such systems when topological characteristics are measured only once before the attacking procedure -the so-called simultaneous attack.Besides, the graphs were taken as undirected.For the networks studied here, the different methods for communities extraction, i.e. multilevel [13], fast greedy [18], walktrap [19], infomap [20], and leading eigenvector [12], have all a community membership coincidence higher than 90%.Thence, in these simulations we have used the method proposed by Blondel et al. [13] because it is the quickest in computing time.
We have chosen three distinct groups of networks: in-frastructure (US Power Grid, Euro Road, Open Flights and US Airports) [9, [21][22][23][24][25][26][27], biological (Yeast Protein, C Elegans and H Pylori) [28][29][30][31] and social (Facebook, Google+ and Twitter) [32][33][34][35].In the Euro Road network, nodes represent European cities and edges represent roads.Power Grid stands for the electrical power grid of the Western States of the United States of America.An edge represents a power supply line and a node is either a generator, a transformer, or a substation.The Yeast Protein interaction network is the same as in [28].
In the metabolic network of the roundworm Caenorhabditis elegans nodes are metabolites (e.g., proteins) and edges are interactions between them.The Helicobacter pylori is the same protein-protein interaction map as in [29].In the Facebook user-user friendship network (NIPS) nodes represent users and edges represent friendship.Similarly, in the Google+ network, an edge means that one user (node) has the other user (node) in her/his circles, while in the Twitter network an edge indicates that both users (nodes) follow each other.
Simulations show that vertex MBA always outperforms the traditional betweenness attack.Initially, both methods are similar but, as bridging nodes are deleted, whole communities start to detach from the core of the graph, resulting in large atomization of the network and hence in an abrupt increase of Γ.On the other hand, in edge MBA the situation changes.As we erase solely edges bridging modules, initial attacks result in a plateau in σ until we effectively detach whole modules.After this critical point is reached, σ decreases abruptly, relatively large communities are detached extremely fast, and the whole network falls apart.Attacks usually stop before σ → 0, depending on the particular modular structure of each network, at a point P c = (σ c , ρ c ).This happens precisely when all original communities are detached with no targeted node (edge) left in the remaining clusters, so the network stops functioning as a whole -for instance, in-   I.
formation would be stacked within the communities and these structures would not be able to communicate each other.From this point on, one may continue to strike the network (or what is left of it, which is the remaining largest connected component, generally speaking, of the same size of the biggest original community) by some classical attack based on centrality measures such as degree or betweenness.Looking at the figures of the attacks, we can say that vertex MBA shows a second-order phase transitions type behavior (Fig. 3), while edge MBA exhibits a typical first-order phase transition behavior (Fig. 4).In either case, with σ(ρ) as the order parameter, the critical points shown in Figs. 3 and 4 mark what we call modular percolation, i.e. the point at which the network is modularly disconnected.In general, attacking nodes is more efficient than edges since the removal of a vertex always results in the deletion of all edges attached to it.However, depending on the real system studied, vertex or edge attack may not make sense.For instance, in the case of Euro Road one may envisage blocking the traffic between two cities, while removing a node would mean to erase an entire village.On the other hand, in biological systems for example, node deletion makes sense, since individual metabolites are susceptible to be removed from the network.
With these results we now plot the efficiency Γ of the MBA as compared to CBA as a function of ρ for each network in Fig. 5.It is easy to see that most networks reach a gain in efficiency of more than 50% for less than 7% of nodes removed -the more oustanding case being the US Power Grid with more than 95% of gain with approximately 3% of nodes removed.Even in the worst cases (Yeast Protein, H pylori and US Airports) we get more than 60% of gain for less than 16% of vertices deleted.
The overall efficiency of MBA as compared to CBA may be measured by how fast MBA reaches the modular critical point relative to CBA.In other words, we can define the relative overall efficiency as: where Γ c is calculated at the critical point (ρ c ) and c is defined as 1 − ρ/ρ null where ρ stands for the fraction of nodes removed at the critical σ in the MBA approach and ρ null is evaluated at the same y-axis point but in the CBA curve.As before, the reference or null method of attack is betweenness-based.The results for the present method of attack on the ten networks, in terms of the previous defined η, shown in Fig. 7, tells us that its efficiency strongly depends on the modularity.This is an expected phenomenon since networks with high modularity tend to have a density of edges connecting communities much smaller than the density of internal edges.
However, in the node removal approach, we have a slightly different picture since the attacks also break down the internal structures of the modules, departing from a linear fit.There is also another important effect that should be noticed: some networks have high modularity, but the inner structure of the modules is very weak.In these cases, the MBA approach does not introduce major gains in efficiency, even though it is still more efficient than traditional CBA.The fact is that in these systems the inner community structure is weak so a more simple attack can be as efficient as the one presented here.On the other side, we may have networks with smaller modularity, but for which the impact of the MBA approach is higher than the impact on networks with higher modularity.That is precisely the special case of the Facebook network extract studied here.As we can see from Fig. 6 the community structure of this network is trivial, with most of the bridging nodes corresponding to the ones with higher degree.Besides, the internal structures of modules are extremely weak with all nodes connected only to a few vertices or even to only one central node.
Before arriving at the conclusion, we illustrate on the attack procedure with a case where its performance is remarkably better than previous and well accepted attacking prescriptions.That example is the Power grid of Western USA.In this sense, Fig. 8 summarizes the result of our method of attack as compared to betweenness centrality attack, degree centrality attack, and longest pathway attack [36], along with the snapshots of the network when 1%, 2% and 3% of nodes are removed by betweenness centrality ranking (CBA) and by the module-based method (MBA).Remarkably, the present method breaks the original network of 4941 nodes to many fragments smaller than 197 nodes (4% of the original size) by removing mere 164 nodes (≈ 3%) identified by the procedure.By comparison, in any degree or centrality based procedure, deleting the same amount of nodes, removes only 22% of the original network, i.e. more than 3800 nodes continue to be connected after that.Such extreme atomization of the network is represented graphically by the set of figures on the far right of Fig. 8. Besides, it is promptly seen that the community structure of this network is far from trivial.

V. CONCLUSIONS
We have presented a module-based attack method which consists of erasing only those structures that bridge distinct communities ordered by betweenness centrality.Computational simulations on many real networks show that the MBA method is more efficient in atomizing networks than traditional procedures based only on centrality criteria.Henceforth, one may say that, in general, the most connected or most central nodes are not necessarily the most important for the network survival.Conceptually speaking, nodes (edges) linking distinct communities are structurally more important and crucial for the modular cohesion of the network than nodes (edges) with high degree or centrality indexes.Obviously, networks with high modularity tend to be in general more fragile against module-based attacks, while networks with intramodule weakness can show smaller differences between different attacking methods.
It should be noted that, regarding the aim of the present work, the communities that emerge from the networks by using the module identification algorithms, have in principle no relation with real communities.However, the organization of a network in coarse grain agglomeration may disclose important information about the structural functionality of complex networks.
In discussions of community detection algorithm, the resolution limit is a topic of debate, however in connection with the attack method proposed here, it is not highly relevant.In fact, what is desirable is a compromise solution in terms of the average module size and the network size.Large modules means a network decomposed in few of them, which is good because many nodes are disconnected once a module is detached from the others.The drawback is that the last module could be still a large part of the original network.On the other hand, a decomposition in many small communities has the advantage of ending with a highly fragmented network, but at the expense of being slower than the other scenario.Therefore the optimum is a situation somehow in the middle, and that is the reason the resolution limit of communities is not of high concern here.
As a final remark, we emphasize that eventual applications of module-based attacks to classes of real systems such as terror, crime or disease related networks might lead to groundbreaking procedures to fight these unwanted threats.We acknowledge CNPq and the Brazilian Federal Police for financial support.

FIG. 1 .
FIG. 1. Module based attack scheme: (A) represents the original network, (B) is one possible module representation of A, and (C) shows the internal structure of nodes and edges inside two selected modules and the edges connecting nodes between them; those edges (nodes) are the ones to be selected for deletion.The graph depicted in A corresponds the US power grid [9].

FIG. 2 .
FIG. 2. Fraction of edges bridging communities vs modularityQ for the ten real networks studied in this work.The dash line is the resulting linear fitting: y = 0.7 − 0.8x, with a correlation coefficient R = −0.95.

FIG. 4 .
FIG. 4. Edge MBA applied to real networks.Figure portrays the size of the biggest connected component relative to the original network's size, σ, in function of the fraction of removed links, ρ.Network data are explained in TableI.

FIG. 7 .
FIG. 7. Overall efficiency η of MBA, compared to CBA, as a function of modularity for (a), nodes and (b), edges removal.

TABLE I .
Topological data for several networks consisting of average modularity, mean degree, second momentum, size of the original network, total number of edges, mean number of modules, relative size of the largest community, fraction of edges linking distinct communities, and relative overall efficiency of MBA method as compared to betweennss-based method for edge and node attacks.