Community detection in dynamic networks via adaptive label propagation

An adaptive label propagation algorithm (ALPA) is proposed to detect and monitor communities in dynamic networks. Unlike the traditional methods by re-computing the whole community decomposition after each modification of the network, ALPA takes into account the information of historical communities and updates its solution according to the network modifications via a local label propagation process, which generally affects only a small portion of the network. This makes it respond to network changes at low computational cost. The effectiveness of ALPA has been tested on both synthetic and real-world networks, which shows that it can successfully identify and track dynamic communities. Moreover, ALPA could detect communities with high quality and accuracy compared to other methods. Therefore, being low-complexity and parameter-free, ALPA is a scalable and promising solution for some real-world applications of community detection in dynamic networks.

Community structure is a prominent feature of networks and has received much attention in recent years. It deepens our understanding of the underlying structure of many real-world networks [5][6][7][8][9], and promises a variety of practical applications ranging from the determination of functional modules within neural networks to the analysis of communities on the Internet. A network is deemed to have community structure if it can be easily divided into groups of nodes with denser connections internally and sparser connections between groups [12][13][14]. Detecting community structure is a challenging task and many algorithms have been developed in the last decade, such as modularity optimization [15,16], dynamic label propagation [17][18][19], statistical inference [20][21][22], spectral clustering [23], information-theoretic methods [24,25], and topology based [26,27] methods.
However, most of the methods treat the network as a static one which is derived from aggregating data during a long period of time. In this way, the evolutionary information of the network and its communities is lost because real-world networks are always evolving, either by adding or removing nodes or edges over time. These static methods cannot tell us how communities evolve over time by neglecting intrinsic evolution of the network. Moreover, if one would like to monitor the communities of a network in real time, the static methods are commonly time-consuming as they have to compute the whole community decomposition even if a very small modification of the network occurs, especially when the network evolves rapidly. One way to analyze communities in a dynamic or evolving network is to slice the network into many snapshots, whichever is a static network. Algorithms along this line first analyze snapshots of the dynamic network at different time steps more or less independently, and then compare communities of different snapshots with each other so that one can monitor the evolution of each community [28][29][30][31][32][33]. For example, one of such algorithms, FacetNet [33] detects dynamic communities by optimizing a quality function which considers both the quality and the stability of communities. While another one, DSBM [34] fits the evolving network to a dynamic version of the stochastic block model, and determines the community assignment by estimating the parameters of the model. A main disadvantage of these algorithms is that they are commonly time-consuming when the network evolves rapidly and the time slices are extremely small, i.e., the network has a lot of snapshots to be computed. Moreover, it is difficult to find the appropriate time window for dividing the dynamic network into static snapshots.
Another way is to adaptively update the current community structure based on previous ones according to modifications of the network. These algorithms quickly adapt their results when the network undergoes a slight modification rather than compute the whole community decomposition from scratch [35][36][37][38]. For example, Nguyen NP et al. [37] proposed a modularity-based algorithm named quick community adaptive (QCA) which greedily changes memberships of nodes by optimizing a local modularity function whenever a small modification occurs in the network. A similar kind of algorithm LabelRankT [39] adjusts its detecting results according to the network modifications through a stabilized label propagation process by taking advantage of what is already obtained in previous snapshots. Another algorithm iLCD [36] first determines whether or not the new node joins existing communities according to two adaptive threshold conditions, then decides whether a new edge is able to form a minimal community or not, and finally merges all communities that are very close to each other (i.e. they have more than a certain ratio of common nodes). If updates are computed efficiently, these adaptive methods are commonly more efficient than computing communities on each snapshot separately when used to monitor large dynamic networks in real time.
Although these methods have been developed to analyze communities in dynamic networks, most of them are not applicable to analyzing real-world networks because they either need to know the prior information of communities (e.g., the number of them) which is usually unknown in advance, or require some user-defined parameters which are difficult to be set in practice.
The present paper proposes an adaptive label propagation algorithm (ALPA) for analyzing dynamic communities with no need for the prior information of communities or user-defined parameters. It detects communities by taking into account their evolutions, and updates the current community structure through a local label propagation process, which only affects a small portion of the network. Therefore, ALPA can efficiently respond to network modifications at low computational cost. Moreover, ALPA is an incremental algorithm, and naturally works in a streaming manner. We evaluated the proposed method on both synthetic and realworld networks. Experimental results show that our method detects communities with high quality and successfully tracks their evolution over time. ALPA has been implemented in a freely available Julia package released under the MIT License (https://github.com/afternone/ ALPA.jl), and we believe it would be a helpful tool in the analysis of dynamic networks.
The rest of this paper is organized as follows. The next section depicts the details of the proposed method. Then, the method is tested on both synthetic and read-world networks. Finally, we summarize our findings in the last section.

Local label propagation (LLP)
The LLP process uses LPA's [17] label propagation technique to propagate labels only throughout part of the network. It maintains an active node list that contains all currently active nodes and finishes execution when the list is empty. An active node is the one whose label is not the majority one among its neighbors and potentially changes its label if it was to attempt an update. The LLP process asynchronously updates each label of the nodes in the active node list in a random order according to the generalized update rule proposed by Xie and Szymanski [40], in which the positive neighborhood strength is taken into account when a node considers a new label. During the process, if a node changes its label after an update (i.e., the node is still active), all its neighbors will be inserted into the active node list. If the node turns inactive (i.e., it does not change its label after an update), it will then be removed from the active node list. As this process goes on, the active node list will eventually become empty.
To analyze the convergence behavior of the LLP process, we perform it on some snapshots of AS-Internet and AS-Oregon datasets [41]. For simplicity, we assume that all nodes are active at the beginning. During the LLP process, we record the number of active nodes per step and show the convergence history in Fig 1. As one can see, the LLP process converges quickly. The number of active nodes decreases dramatically in the first few steps.

Updating existing communities according to network modifications
We show the details of the procedures concerning different modifications of the network. When a new edge connecting two existing nodes is added, there are two cases: intra-community edge or inter-community edge. An intra-community edge's addition will tighten up the community and should not change the current partition (see Fig 2 (c!d)). However, an inter- community edge's addition could potentially move one of the two endpoints from the current community into another, or merge the two communities into a new and large one (see Fig 2 (e!f) for an example). To handle this, we first relabel all nodes in the two corresponding communities which are connected by the new edge. For convenience, we just relabel them with their IDs. Then we insert all nodes of the two target communities into the active node list. Finally, we apply LLP process to update the community structure. In order to avoid unnecessary updates, after adding a new edge, if each node of the two endpoints still has more connections within its community than its connections with other nodes which do not belong to its commnunity, we will not carry out the LLP process.
However, these operations alone are still not enough to guarantee good performance in practice. This is because when nodes in the border of these two communities update their labels at the initial stage, they tend to adopt more frequent labels from other adjacent communities. Consequently, communities tend to merge during the evolution of the network. To deal with this issue, prior to the LLP process, we perform a "warm-up" step in which the target communities are treated as a subgraph for labels to propagate, i.e., during the label propagation process, when we update the label of a node, we only consider labels of its neighbors that are in the subgraph. Note that nodes in the target community (communities) need to reinitialize their labels before the warm-up step, and the LLP process is then based on the labels obtained in the warm-up step. This strategy allows us to preserve detected communities. Obviously, the number of affected nodes of warm-up step should not be larger than that of the LLP process. There is generally a certain relationship among the warm-up step, the LLP process, and the whole network, as shown in Fig 3. When an existing edge is deleted, there are also two cases: the edge is either inter-community or intra-community. For the former case, the deletion will make the current community structure clearer (see Fig 2 (b!a) for an example). Herewith we leave the partition intact. For the latter case, the deletion may break the community into small pieces (see Fig 2 (f!e) for an example), and these pieces could join in other communities. To deal with this, we first insert all nodes of the target community into the active node list, and then perform the warm-up and LLP processes to update the community structure.
If an isolate node is inserted, we simply create a new community for it (see Fig 2 (h!i)). While, if the node comes with some adjacent edges connecting to one or more existing communities (see Fig 2 (h!g)), we split the process into two steps, i.e., first add an "isolate" node and then add its adjacent edges one after another. If an isolated node is removed, the current community structure will be unchanged (see Fig 2 (i!h)). However, when a node with degree larger than or equal to two is removed (see Fig 2 (g!h)), all its adjacent edges will be destroyed, and the community containing the node could remain unchanged, or break into a number of small pieces which could join in other communities. To efficiently deal with this case, we first remove all the node's adjacent edges one by one, and then remove the node itself.
Finally, combining all these cases, our adaptive label propagation algorithm is summarized as follows.
1. Initialize an empty graph and an empty partition.

For each modification:
(a) If a new edge is added, we update the current partition according to the procedure of adding a new edge.
(b) If an existing edge is removed from the network, we update the current partition according to the procedure of removing an existing edge.
(c) If an isolated node is added, we simply create a new community for it.
(d) If a node with some associate edges is added, we first create a new community for it (i.e., step 2c), and then add all its associate edges one by one according to step 2a.
(e) If an isolated node is removed, we just delete it from the current partition, and leave other communities intact.
(f) If a node with degree larger than one is removed, we first remove all its adjacent edges one by one according to step 2b, and then remove the node itself according to step 2e.
3. Output the graph and its partition at each time step.
Note that, ALPA can also start with any given snapshot network instead of an empty one. The initial community structure can be obtained with any of the available static methods, or with ALPA itself (i.e., start with an empty graph and treat the network as a collection of new nodes or edges).

Results
In this section, we first evaluate our method on different synthetic networks with known community structures, and then show the results on two popular real-world datasets: AS-Internet and AS-Oregon. In order to verify the performance of our method, we compare it with two public available methods FacetNet [33] and iLCD [36]. Moreover, a widely used static method Infomap [25] is also involved in the comparisons.

Synthetic networks
We start with the first synthetic network which is a static network generated by the wellknown Lancichinetti-Fortunato-Radicchi (LFR) benchmark [42], to show that our algorithm can handle incremental inputs. The network contains 1000 nodes which are naturally grouped into nine communities. There are around 10000 edges in the network. Starting with an empty network, we add these edges one by one and use our algorithm to update the community structure after each edge's addition. Fig 4(a) shows the evolution of communities in the network. It is shown that ALPA identifies the nine communities correctly. In addition, we are interested in knowing how many nodes are involved in ALPA after each edge's addition, which can be used to estimate the time complexity of our algorithm. We record the number of involved nodes (i.e., those are activated at least once) in each time step and plot them in Fig 4(b). As one can see, most of the modifications only affect a few nodes. The average number of involved nodes in each time step is 23.7, which is tiny compared to the network size, so our algorithm can efficiently respond to the changes in network topology.
The second synthetic network is also an LFR network, but with embedded community events inside it. The network is constructed according to the following steps. Starting with a static LFR network, at each time step, we make a slight modification to the network, such as nodes or edges' addition or removal. In this way we produce a dynamic network which contains some community events. For the experiment conducted here, community events are sequentially embedded as follows: birth, expansion, shrinkage, death, separation and combination. As shown in Fig 5(a), our algorithm recognizes all these major changes of communities and successfully tracks the evolution of each community. To demonstrate the evolution of each community clearer, we select eight snapshots of the network and visualize them using Netgram tool developed by Mall R et al [43]. As shown in Fig 5(b), each circle represents a community, and its size is proportional to the number of nodes inside the community at that time step. The dashed line represents the evolution of communities between two consecutive time steps. We can see that a new community NewC8 is born at snapshot T 3 , C6 expands at T 4 , C2 disappears at T 5 , C1 shrinks at T 4 , C5 is divided into two small communities at T 6 , C3 and C4 are merged at T 8 , while community C7 remains intact throughout the whole evolution.
In order to compare ALPA with other methods, we employ the dynamic benchmark model proposed by Granell et al [44] to generate three standard benchmarks: grow-shrink, mergesplit and mixed. The first one contains communities that grow and shrink periodically in size, while the second one considers communities that merge and split periodically. The third one is a mixed version of the previous two and consists of a combination of all the four operations. Each of the benchmark network consists of 100 time steps, and is divided into 4 communities, where each community has 32 nodes initially (therefore the total size of the network is 128). The nodes of the same community are connected with a probability p in = 0.05, whereas nodes of different communities are connected with a probability p out = 0.5. For the grow-shrink process, the maximum fraction of nodes moving from one community to another is 0.5, i.e., there are at most 16 nodes switching between communities. Fig 6 shows the planted partitions and the results from different algorithms. It can be seen that the results of ALPA are mostly correct, except for some extreme time steps, whereas the partitions detected by FacetNet and iLCD are very different from the planted ones. Moreover, the partitions detected by ALPA have higher consistency through time than those detected by the other two algorithms.
To quantitatively evaluate the results, we calculate the normalized mutual information (NMI) [13], the normalized variation of information (NVI) and the Jaccard index between the planted partitions and the detected ones. As shown in Fig 7, in most snapshots, the values of NMI and Jaccard index of ALPA are higher than those of the other two algorithms, while the values of NVI of ALPA are lower than those of the other two algorithms. These results indicate that ALPA outperforms FacetNet and iLCD on these dynamic benchmark networks.
In order to compare the performance of different algorithms for community detection on general settings, we test them on the LFR networks with four scenarios: two different network sizes (1000 and 5000 nodes) and two different ranges of community sizes ( [10,50] and [20,100]). The following parameters are the same for all the LFR networks used here: the average and maximum degrees are 20 and 50 respectively, the power-law exponent of the degree distribution and the community size distribution are -2 and -1 respectively, and the mixing parameter increases from 0 to 1 with step size being 0.05. For iLCD, if a node belongs to multiple communities, we assign it to the one with maximum size to output disjoint partition. Since FacetNet requires the number of communities as its input parameter, we assign its value with the number of planted communities. We use the NMI to measure the consistency between the planted partition and the detected partition. It can be found (see Fig 8) that ALPA detects communities correctly and outperforms the other two algorithms up to μ * 0.6 in all cases.

Real-world networks
In this section, we tested the performance of ALPA on two real-world networks from the Stanford network analysis project datasets [45]. AS-Internet [41] and AS-Oregon [41] were chosen from the available datasets, since they have a varying number of snapshots. The description of the two datasets is as follows.
AS-Internet dataset is a communication network of who-talks-to-whom from the border gateway protocol logs of routers in the Internet. The dataset contains 733 daily instances of autonomous systems (AS) graph from November 8, 1997 to January 2, 2000. The largest graph (dataset from January 2, 2000) has 6474 nodes and 13859 edges. The nodes and edges are added or removed over time . Fig 9(a) shows the number of edges added and deleted, as well as the number of nodes involved in these changes. It is shown that the network topology can change dramatically at some snapshots.
AS  Since the real community structures of both datasets are unavailable, it is impossible to use NMI to evaluate the performance of different algorithms. Hence, we use modularity [46] to evaluate different algorithms on the datasets. In particular, we will show modularity values and processing times of ALPA in comparison with other methods. For each dataset, dynamic algorithms like ALPA (also iLCD) run on the network modifications, whereas the static method Infomap and snapshot method FacetNet have to be performed on the whole network snapshot at each time step. Fig 10(a) shows the modularity values of ALPA and three other algorithms on the AS-Internet dataset. It is shown that ALPA and FacetNet have similar performance, and both of them achieve competitively higher modularity values than Infomap does for most of the snapshots. While iLCD fails to find strong community structure at all. In particular, the modularity values obtained by ALPA are more stable over time, since our method keeps preserving the community structure of the previous snapshots and only considers current network changes. Retaining the historical information is a great advantage of ALPA because it avoids the expense of recomputing from scratch and makes the algorithm run faster. As shown in Fig 10(b), the computational cost is significantly reduced in ALPA. The running time of iLCD and Infomap Dynamic community detection via adaptive label propagation is close. FacetNet requires a little more time. In fact, ALPA is three times faster than iLCD, two times faster than Infomap, and 250 times faster than FacetNet on the AS-Internet dataset. These results indicate that on the AS-Internet dataset, both ALPA and FacetNet are able to identify high quality community structure with high modularity. However, only our method significantly reduces the processing time.
Compared with AS-Internet dataset, AS-Oregon has fewer snapshots. However, the number of nodes and edges is large enough for an extensive analysis. In Fig 11(a), we compare modularity values obtained by ALPA at each network snapshot with those of iLCD and Infomap. FacetNet does not appear to complete the tasks due to the overflow in memory, and is thus excluded from the plots. It is shown that the modularity values obtained by ALPA are close to those obtained by Infomap and are far higher than those obtained by iLCD. Fig 11(b) shows that regarding the running time ALPA outperforms Infomap as well as iLCD. In conclusion, high modularity values and low computational cost on this dataset confirm the effectiveness of our method. Four network scenarios are shown, which correspond to two different network sizes (N = 1000,5000) and, for a given size, to two different ranges for the community sizes (C = [10,50], [20,100]). Each point on the curves corresponds to the average value of the NMI value over 100 network realizations. https://doi.org/10.1371/journal.pone.0188655.g008

Discussion
In this work, we proposed an adaptive algorithm ALPA to detect communities in dynamic networks. It processes a sequence of modifications on the network and tries to maintain a fairly good community structure by updating a few existing communities through a local label propagation process, rather than computing the whole community decomposition from scratch. The advantages of our approach are as follows. Firstly, it requires neither any user-defined parameters nor the prior information of communities. Secondly, it is fast, scalable and incremental, i.e., it can work in a streaming fashion: whenever there is a modification of the network, ALPA adapts its result according to the modification by taking advantage of the Dynamic community detection via adaptive label propagation historical information. Thirdly, it can monitor the evolution of each community at low computational cost. We have tested our method on synthetic networks and have shown that it identifies the planted communities with a high degree of success. We have also tested it on two real-world networks and have shown that it detects community structures with relatively high modularity scores and low computational costs.
It is difficult to accurately determine the time complexity of ALPA due to its randomness. We can roughly analyze the time complexity of ALPA as follows. The LLP or warm-up process is a local version of the original LPA. The time complexity is roughly O(hm c i), where hm c i is the mean number of edges in each community. Generally, hm c i is tiny compared with the number of edges of the whole network. For each network modification, ALPA updates the current community structure with time complexity roughly O(Ihm c i), where I is the number of iterations (usually is a small constant). The total time complexity of the ALPA for a dynamic network with T time steps is O(TIhm c i). Therefore, our method is applicable to analyzing communities of large dynamic networks which evolve rapidly, especially when sizes of communities are small.
As noticed, ALPA is not deterministic due to the random update order of nodes and a lot of tie-breaks. However, in our experiments, it is shown that ALPA is generally able to obtain the same partition in most runs. Only when community structure is not clear enough, may ALPA produce many similar partitions in multiple runs. We could obtain more robust and stable results by adopting a more deterministic update rule, e.g., considering similarities between adjacent node pairs when updating the labels of nodes.
In the forthcoming works, we plan to improve the current method by employing other more deterministic update rules and extend it to overlapping community detection. We also plan to apply our method to some practical applications, for example, constructing efficient distributed social-based message routing policy for wireless Ad hoc network.